We are converting a stochastic simulation fortran program to OpenMP as the outputs of the program can be summed. In the simplest mode, we have just made the main loop a parallel region with firstprivate. No matter how many threads we launch, the wall time consumed is roughly the time for a single thread times the number of threads. The problem seems to be _kmp_launch_monitor which is having 200ms waits for ManualResetEvents. Eliminating atomic and critical sections has little effect on the outcome. Using OMP DO likewise.
Reading a bit on ManualResetEvents has not helped. Where should we be looking for the cause of the ManualResetEvents? Can we make the wait time shorter? Make them go away?
I gather that the launch monitor will always be there in an Intel OpenMP solution? Otherwise the code is working as desired.
thanks for any suggestions.