About OpenMP workqueue or tasking model

About OpenMP workqueue or tasking model

Hello, all

I am wondering how intel compilers model OpenMP 3.0 (3.1) tasking. According to the standard, private task pools are initiated for the execution of descendant (child) tasks. A global task pool is established and shared by all threads. The question is how the execution order is built for those task pool. Is it LIFO ? or FIFO ? or a cutoff strategy ?  I searched everywhere but have not found a clue for this. This matter is actually important when a developer want to guide out-of-order task scheduling in a specific way. 

Thank you


5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You can't control execution of OpenMP threads by default.

However, a "trick" could be used and I did a small R&D some time ago. In essence the processing is based on a number returned from omp_get_thread_num OpenMP function and this is how it looks like:
for( RTint i = 0; i < NUM_OF_ITERATIONS; i++ )
RTint iThreadNum = omp_get_thread_num();

CrtPrintf( RTU("Processing started for Thread %d - Iteration %d\n"), iThreadNum, i+1 );

if( g_pFunction != RTnull )
( *g_pFunction[ iThreadNum ] )();

CrtPrintf( RTU("Processing done for Thread %d\n"), iThreadNum );
If, for example, some application is using 8 OpenMP threads then 8 g_pFunction functions are needed and some "internal-scheduler" could control their execution.

I mean the default scheduling option for the following pragmas

#pragma omp task, #pragma omp taskwait.

When a task is suspended by "taskwait", nested paralelism is scheduled in BFS as taskwait enforce it (child tasks are sent to local pool as tasks are tied to the executing thread by default). Question is when tasks are generated without synchronization

for (auto i=list.begin();i!=list.end();++it) {

#pragma omp task



#pragma ompt taskwait

In this case, tasks can be executed immediately or deffered. When task execution is deffered, how are they scheduled ? If LIFO is used, it possibly reuse the recent memory (or cache). FIFO has no such chance. Thus, my guess is LIFO is a kind of default in the workqueue model. However, I have a chance to know internal implementation.


>>... If LIFO is used, it possibly reuse the recent memory (or cache). FIFO has no such chance...

I understood why you need that and thanks for clarification.

A note to a Moderator: Shouldn't we move the thread to software.intel.com/en-us/forums/threading-on-intel-parallel-architectures forum?

Actually I do some recursion inside such kind of loop (so I can fake task scheduling as I wish, but I should know at least how the workqueue is modeled. Any runtime command that I can tweak ? 

Leave a Comment

Please sign in to add a comment. Not a member? Join today