I am trying to run a program on a computer cluster. The structure of the program is the following:
PROGRAM something ... CALL subroutine1(...) ... END PROGRAM SUBROUTINE subroutine1(...) ... DO i=1,n CALL subroutine2(...) ENDDO ... END SUBROUTINE SUBROUTINE subroutine2(...) ... CALL subroutine3(...) CALL subroutine4(...) ... END SUBROUTINE
The idea is to parallelize the loop that calls subroutine2. Main program basically only makes the call to subroutine1 and only its arguments are declared. I use two alternatives. On the one hand, I write OpenMP clauses arround the loop. On the other hand, I add an IF conditional branch arround the call and I use MPI to share the results.
In the OpenMP case, I add CALL KMP_SET_STACKSIZE(402653184) at the beginning of the main program and I can run it with 8 threads on an 8 core machine. When I run it (on the same 8 core machine) with MPI (either using 8 or 1 processors) it crashes just when makes the call to subroutine3 with a segmentation fault (signal 11) error. If I comment subroutine4, then it doesn't crash (notice that it crashed just when calling subroutine3 and it works when commenting subroutine4).
I compile with mpif90 using MPICH2 libraries and the following flags: -O3 -fpscomp logicals -openmp -threads -m64 -xS. The machine has EM64T architecture and I use a Debian Linux distribution. I set ulimit -s hard before running the program.
Any ideas on what is going on? Has it something to do with stack size?
Thanks in advance