I am experiencing a very weird segmentation fault problem with a parallel MPI code compiled with the Intel fortran compiler. The mpi compiler uses ifort version 9.0.026. I have been running this code using 256 processors in parallel on the NCSA Intel Xeon Linux Cluster. Here is the link for this cluster: http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/XeonCluster/
Here is my problem: The code sometimes runs just fine on this cluster, but it gives the segmentation fault error right at the beginning of the run at other times. This is very strange. I am running the same exact executable code, so I don't understand why the code would run just fine sometimes and would give segmentation fault at other times. It appears this problem happens only when the job gets assigned to certain nodes of the cluster. The cluster contains a total of 1280 2-processor nodes, so my executable runs on different nodes every time I do a run. I have tried increasing the stacksize to large values but that did not help. I am using the "-O3 -ip -auto -nothreads" flags with the compiler. Also, the mpi compiler I am using is linking my executable with the libpthreads library even though I am not using any threads or OpenMP. The mpi compiler is called cmpif90c which is also known as the ChaMPIon/Pro MPI. This implementation of the MPI is claimed to be thread-safe. I tried to remove the link to the libpthreads library during compilation but that caused an error. Apparently the mpi library needs to be linked to libpthreads. Could the linking with the libpthreads be somehow responsible for the segmentation fault problem? Any suggestions about what else I can try? Thanks!