One of my recent tasks is to parallelize a fairly large program in Fortran 90. The subroutine I am targeting spends a great time doing a computationally extensive loop. I have tried the following:
!$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED (array2D_1, array2D_2, .., scalar1, scalar2,.., scalarN)
<the calculations >
!$OMP END PARALLEL DO
where there are several two-dimensional arrays to be shared whereas whereas some more 2D arrays need to keep as private. These arrays are of the same size yet the size is really big. It will be several thousand by several thousand later on, yet for now it is kept as 200x200. The data type is double precision, and all arrays are defined as allocatable. Before the loop is some initialization, and there is no recursive procedure whatsoever in the subroutine.
The parallelized version, developed using MVS 2010 and openMP, is first tried on my Thinkpad T420si notebook which runs Windows 7 and has four threads and 4 GB memory. It works perfect. Then I move the code to our 64bit server which runs Windows Server 2008 R2 Enterprise operating system and has 32 threads and 128 GB memory. On the server with the same configurations for MVS 2010 and opneMP and same input data as that for my notebook, the program fails. Using the debug mode, the program complains about stack overflow. The stack reserve size and commit size are 0, so are that of the heap.
I am confused here. Why with much bigger memory, the server has problem with the parallelized program? If the issue is partly due to the zero values of the stack and heap, why the program works fine on the notebook? And what is a good strategy here for stack management, since as long as the parallelized program works for the smaller array size, it is expected to work with 2D arrays whose size will be enlarged as much as 100 times?
I look forward and appreciate any suggestions!
Best regards, Bill