I'm creating a dll that was originally developed in gfortran. I'm using openmp for the outermost loop and have all the useful data in 3d arrays with the 3rd dimension being the outermost loop integer.
All of the code is in fortran. I'm only switching from gfortran to ifortran. The results I'm getting are identical, but the behaviour of openmp is not.
In the gfortran version, if I set my number of outer loop iterations to one, I get one processor core loaded to 100%. However, in ifort for the same case, I have all four cores at 100%, with no apparent speed advantage over a serial run.
I've tried adding schedule(static) and private(i) (i being my outer loop integer), to no avail. I have also tried "set OMP_NUM_THREADS=1" at the compilation stage.
I'm not sure if the compiler is trying to be smart and not have a data race, but this is not a concern to me as each thread should be accessing a different 3rd dimension of each array. I did even try to automated parallel command, however, due to all the subroutines it seemed to make none of the loops parallel, are these any other commands that override this?
The main question is: How do I force the dll to create a number of threads equal to the number of iterations of my outermost loop (ie. behave in the same way as gfortran with openmp, where the number of processor cores that are fully utilised corresponds to my number of outermost loop iterations)?
This seems like it should be a very simple problem to fix, I've tried searching the forums and am probably not using the right keywords as I haven't found anything useful to me.
here's the code at the outermost loop:
!$omp parallel do
do n = 1, n_lines