I use GDB to debug and got the error message :
Program received signal SIGFPE, Arithmetic exception.
34: 0x00002aaac33327e0 in ADIOI_LUSTRE_Get_striping_info ()
34: from /apps/intel/impi/4.0.2.003/intel64/lib/libmpi_lustre.so
It seems to be the same with http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-September/007947.html.
Is it a bug in Intel mpi library v4.0.2, And has it been fixed in new version?




MPI-IO error when running on lustre with a high number of stripes and processes
Hi,
I'm trying to run pNetCDF on lustre. The test code and pNetCDF library are both compiled with intel mpi library v4.0.2. Our lustre file system has 40 OSTs.
When running with stripes = 1 or processes = 32, the test codes works well and can output data correctly.
However, when I set stripe = 40 and run with 64 processes, the test code crashed as :
rank 19 in job 1 c25b09_39645 caused collective abort of all ranks
exit status of rank 19: killed by signal 9
The test code is attacted. Thank you in advance.