openmp and fortran question

openmp and fortran question

I'm creating a dll that was originally developed in gfortran. I'm using openmp for the outermost loop and have all the useful data in 3d arrays with the 3rd dimension being the outermost loop integer.

All of the code is in fortran. I'm only switching from gfortran to ifortran. The results I'm getting are identical, but the behaviour of openmp is not.

In the gfortran version, if I set my number of outer loop iterations to one, I get one processor core loaded to 100%. However, in ifort for the same case, I have all four cores at 100%, with no apparent speed advantage over a serial run.

I've tried adding schedule(static) and private(i) (i being my outer loop integer), to no avail. I have also tried "set OMP_NUM_THREADS=1" at the compilation stage.

I'm not sure if the compiler is trying to be smart and not have a data race, but this is not a concern to me as each thread should be accessing a different 3rd dimension of each array. I did even try to automated parallel command, however, due to all the subroutines it seemed to make none of the loops parallel, are these any other commands that override this?

The main question is: How do I force the dll to create a number of threads equal to the number of iterations of my outermost loop (ie. behave in the same way as gfortran with openmp, where the number of processor cores that are fully utilised corresponds to my number of outermost loop iterations)?

This seems like it should be a very simple problem to fix, I've tried searching the forums and am probably not using the right keywords as I haven't found anything useful to me.


here's the code at the outermost loop:

    !$omp parallel do
    do n = 1, n_lines



5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I have a very simple OpenMP example using ifort, which can be run as either single or multiple threads.
It runs in a command window, using do_test.bat.
If you have task manager running, you will see the different runs.
do_test.bat gives an example of compiler options for using /Qopenmp or not, which I think is the basis of your question.
It is very useful to compare the elapse time between single and multiple options to see how effective OpenMP is.

From what you have posted and at the simplest level, if you don't want multiple threads, don't use /Qopenmp.
If you do, the default is to use all available CPU's.
If you are using a multi level approach and managing the threads at each level, I'm sorry but someone else will need to help.



Downloadapplication/zip OpenMP_sample.zip4.64 KB

OMP_NUM_THREADS has no effect during compilation; you should set it at run time.

Did the ifort compilation invoke an automatic collapse with the effect of making more parallelism available in the outer loop?

>>>How do I force the dll to create a number of threads equal to the number of iterations of my outermost loop

!$omp parallel do num_threads(n_lines)

   do n = 1, n_lines

But if n_lines .eq. 1, this may be substantially slower than the serial equivalent, due to OpenMP overhead.


Note, it is bad practice to force the number of threads to be equal to the number of iterations in the output most loop

Assume loop with 100 iterations, and CPU with 8 hardware threads. It is more efficient to have 8 threads working, 4 with 12 iterations and 4 with 13 iterations, than to create/use 100 software threads (only 1:8 of which can run at any one time).

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today