Does ifort support OpenMP nested parallel regions ?
If so, what version do I need. I am porting a code to multiple machines. I need to know if Ineed to purchase a new compiler for particular systems.
Thank YouGene W
the latest compiler should support OpenMP specification 3.0. You can try evaluation version of compiler to find out whether it works for you.
The Intel Composoer XE does support nested parallelism. Yoou might need to explicitly enable nesting by calling omp_set_nested(). Please see the OpenMP specification on how use this runtime call.
Please be adviced that nested parallelism is supported, but might cause trouble when actually used in an application. This is related to the fact that nested parallelism in OpenMP has some flaws that need to taken care of. The main problem is that a nested region needs to also create threads for parallel execution. Given, for instance, an out region that already runs with 8 threads, you need atleast 16 cores in your machine to have 2 threads running in the nested parallel region. Having more than one level of parallel regions essentially exposes you to exponential grows of thread counts or limits parallel region to only one thread. You have the choice :-).
My advice would be to check if the OpenMP tasking model does work for you. Tasking does not exhibit the same nesting issues as parallel region. The idea is that you create a single parallel region right at the beginning of your application. Parallelism then comes from the tasks you create and fire up for execution. The task then will be schedule to run on the thread team that you've created.
You can also start an outer parallel region with fewer than the full compliment of threads. e.g. pick numbers who's product produces the thread count.
On system with 8 hardware threads you could choose
8 threads without nesting2 outer level threads, 4 inner (only 2 levels nested)4 outer level, 2 inner (only 2 levels nested)2 outer level, 2 middle level, 2 inner level (3 nested levels)
If some of these threads are performing I/O (IOW stalls) then consider oversubscription (dependent on frequency and duration of thread stall for I/O).
Blog: The Parallel Void