What could be a cause of the MPI program to run slower on multinode cluster?

What could be a cause of the MPI program to run slower on multinode cluster?

Hello everyone,

I've got a question about MPI program performance: I've developed an MPI program that processes large amounts of data (about 10^9) elements, and running this program I've noticed that as many processes I create using mpiexec utility as longer the duration of the program execution. What could be a cause of the following issue ?? When I run this program in a single computational node, it works faster rather running that using two computational nodes. Please, help.

Regards, Arthur.

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Normally I use the following computational platform: 2 x Intel Core i7 - 4970 4.00 GHZ, 32GB RAM, Network: 1 Gbps.

My MPI program actually sorts a huge array containing 10^9 elements by splitting the entire array into chunks sorted by each process created by mpiexec utility. The actual sorting is performed using tbb::parallel_sort routine which is a part of Threading Building Blocks (TBB).

You should probably ask that question in the HPC forum: https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology . They deal with MPI issues.

I'd start with the Intel MPI Library Troubleshooting Guide: https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog... .

There's also a TBB forum: https://software.intel.com/en-us/forums/intel-threading-building-blocks

If anyone who is going to answer my question needs an executable to test it on his side, I'm ready provide one.

Barry Tannenbaum (Intel),

Thanks for your reply and guidance what particular forum to use for this question. Also, thanks for the links which I can refer to.

Leave a Comment

Please sign in to add a comment. Not a member? Join today