Can anyone suggest why OpenMP would not thread the following (simple) double loop?
CALL OMP_SET_NUM_THREADS(4) MemForEachStack = 16000000 call kmp_set_stacksize_s(MemForEachStack) !$omp parallel do default(firstprivate), shared(ReceiverDistribution) xloop: & do i = NumXLimit1,NumXLimit2 do j=NumYLimit1,NumYLimit2 ReceiverDistribution(j,i) = sqrt(float(i*j)) end do end do xloop !$OMP end parallel do
I am experimenting with OpenMP multi-thread in our routines and have managed to multi-thread the smaller one. But I have been unable to multi-thread our largest routine (large for us: ~10,000 lines of code with ~5,000 lines of supporting subroutines). I am attempting to declare a parallel do section involving a set of nested loops. The code involved (~500 lines) within the inner loop is rather elaborate -- with calls to subroutines. I have "use omp_lib" at the start of the routine. I eventually worked through the process of getting the correct list of variables in the shared clause, and got the code to run. But OpenMP would not multi-thread the outer loop. I get no compiler errors. The code links and runs correctly, but with only one thread. A write of omp_get_thread_num() to the standard output within the loop always shows zero. That is, one thread.
If I turn on the vector optimization report, it shows the 1000's of instances where the code has been successfully vectorized. But I can get no information about (non)threading.
Finally, I commented out ALL the original double loop-code and substituted the simple double-loop listed above. Still no multi-threading. I get no messages from OpenMP, though I have the report level set to 2. (But I don't think that will help, since I think it only reports success, not the reason for failure). The code runs without incident, thought (of course) the results are different since I'm putting nonsense into the array ReceiverDistribution, rather than doing the more elaborate work.
There is still a lot of code before and after the simple double loop listed above. But I cannot understand how it could be suppressing threading; however complex it is.
I am using compiler release 220.127.116.11. Worringly, I get exactly the same behavior with beta 15.0.070