Slowdown p2p MPI calls

Slowdown p2p MPI calls

Dear MPi users,

I'm using IntelMPI cs-2011. My code (OpenMP + MPI)  does for each time step some send and receive MPI calls after a kernel computation. MPI calls are used for ghost cell exchange. (few kbytes)

I've noted a significative slowdown during the computation. I suppose the problem is in some low level MPI setting because by using OpenMPI that problem disappear. I'm using Inifiniband and 12 cores on 1 node, so just intranode communication is used.

I disabled shared memory inside a node, used dapl for intranode, decreased I_MPI_INTRANODE_THRESHOLD, set I_MPI_DAPL_TRANSLATION_CACHE to 0, without any good improvement. 

Do you have an idea because the p2p calls slowdown in running?

Thanks a lot. 

4 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
Gergana Slavova (Intel)的头像

Hey unrue,

Thanks for posting.  Unfortunately, performance issues are notoriously hard to track down.  Based on your original post, can I assume you're mostly using p2p messages in your application?

The first thing I would suggest is grabbing the latest Intel® MPI Library and giving that a try.  We have Intel MPI Library 4.1 Update 1 that was released not too long ago.  The beauty of it is you can install the new runtimes and re-run your application without having to recompile (Intel MPI 4.0 - which is probably what you have - is binary compatible with Intel MPI 4.1 - the latest).

You can grab the latest package from the Intel® Registration Center - just login using your e-mail address and the password you created when you originally downloaded the library.

Ideally, we'd like to have a reproducer that we can test out locally.  If that's not possible, can you provide some debug output (I_MPI_DEBUG=5) when running your application, as well as the full set of env variables you're setting?  What's the nature of the performance slowdown between Intel MPI and OpenMPI - 10% or 90% slowdown?

Looking forward to hearing back soon.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Hi Georgana,

thanks for your reply. Mostly I use MPI_Send and MPI_Receive. I attach a profile of one process of 12. The performance slowdown compared to OpenMPI is about 50%.

附件: 

附件尺寸
下载 tracking.txt1.2 KB

引文:

Gergana Slavova (Intel) 写道:

The first thing I would suggest is grabbing the latest Intel® MPI Library and giving that a try.  We have Intel MPI Library 4.1 Update 1 that was released not too long ago.  The beauty of it is you can install the new runtimes and re-run your application without having to recompile (Intel MPI 4.0 - which is probably what you have - is binary compatible with Intel MPI 4.1 - the latest).

Regards,

~Gergana

Dear Georgana,

I tried latest IntelMPi version as you suggested, but the problem still remain .. :(

登陆并发表评论。