Slowdown p2p MPI calls

Slowdown p2p MPI calls

Dear MPi users,

I'm using IntelMPI cs-2011. My code (OpenMP + MPI)  does for each time step some send and receive MPI calls after a kernel computation. MPI calls are used for ghost cell exchange. (few kbytes)

I've noted a significative slowdown during the computation. I suppose the problem is in some low level MPI setting because by using OpenMPI that problem disappear. I'm using Inifiniband and 12 cores on 1 node, so just intranode communication is used.

I disabled shared memory inside a node, used dapl for intranode, decreased I_MPI_INTRANODE_THRESHOLD, set I_MPI_DAPL_TRANSLATION_CACHE to 0, without any good improvement. 

Do you have an idea because the p2p calls slowdown in running?

Thanks a lot. 

4 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Hey unrue,

Thanks for posting.  Unfortunately, performance issues are notoriously hard to track down.  Based on your original post, can I assume you're mostly using p2p messages in your application?

The first thing I would suggest is grabbing the latest Intel® MPI Library and giving that a try.  We have Intel MPI Library 4.1 Update 1 that was released not too long ago.  The beauty of it is you can install the new runtimes and re-run your application without having to recompile (Intel MPI 4.0 - which is probably what you have - is binary compatible with Intel MPI 4.1 - the latest).

You can grab the latest package from the Intel® Registration Center - just login using your e-mail address and the password you created when you originally downloaded the library.

Ideally, we'd like to have a reproducer that we can test out locally.  If that's not possible, can you provide some debug output (I_MPI_DEBUG=5) when running your application, as well as the full set of env variables you're setting?  What's the nature of the performance slowdown between Intel MPI and OpenMPI - 10% or 90% slowdown?

Looking forward to hearing back soon.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Hi Georgana,

thanks for your reply. Mostly I use MPI_Send and MPI_Receive. I attach a profile of one process of 12. The performance slowdown compared to OpenMPI is about 50%.

Allegati: 

AllegatoDimensione
Download tracking.txt1.2 KB

Quote:

Gergana Slavova (Intel) wrote:

The first thing I would suggest is grabbing the latest Intel® MPI Library and giving that a try.  We have Intel MPI Library 4.1 Update 1 that was released not too long ago.  The beauty of it is you can install the new runtimes and re-run your application without having to recompile (Intel MPI 4.0 - which is probably what you have - is binary compatible with Intel MPI 4.1 - the latest).

Regards,

~Gergana

Dear Georgana,

I tried latest IntelMPi version as you suggested, but the problem still remain .. :(

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi