I seem to have difficulties to utilize all 8 cores that I have access to on a cluster. Essentially, I have three nested loops. The two inner most loops can be worked on in parallel, so I have put an OpenMP statement between the first and the second. It looks as follows:
!$OMP PARALELL DO ....
!$OMP END PARALELL DO
Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz
Thank you for any hints and help.