we are working in an Infiniband DDR cluster with MM5. We are using the latest Intel MPI and Fortran, and our mm5.mpp has been compiled with the configuration suggested in this website.
This is the way we launch :
[c2@ Run]$ time /intel/impi/3.2.1.009/bin64/mpiexec -genv I_MPI_PIN_PROCS 0-7 -np 32 -env I_MPI_DEVICE rdma ./mm5.mpp
Everything seems to be ok, but when we launch np 16 with mpiexec it's 75% better performance than gigabit, but whe we are using more than 16 np, the scaling is worst. We have noticed the main difference between the way of processing of gigabit and infiniband is :
- Infiniband only uses all the cores when the np is 16 or lower, when it grows, it only uses 3 cores in a machine
- Gigabit always uses all the cores in all machines.
We have try a lot Intel MMPI variables in the execution , for example I_MPI_PIN, but there is no way to manage the situation. The MPI universe is working ok with the infiniband networks, and we use the MPI_DEVICE rdma. The infiniband network is working ok ( performance and so on) because we have passed some benchmarks and the results are working fine.
What do you think about it ? Could it be consequence of the model we are using to compare performance ?
Thanks a lot and best regards