My question is about the implementation details of different parallel runtime systems, e.x. OpenMP, Cilk Plus, and TBB runtime systems.
Where can I find a detailed information? For example:
* How do they do runtime task scheduling?
* What are the differences between GCC implementation and ICC implementation of the OpenMP runtime system?
* How is work stealing done in these approaches?
* Do OpenMP and Cilk Plus use busy waiting?
Is there any source to find answers to such questions?
Many Thanks in advance!