Porting code from MPI/Pro 1.7 to Intel MPI 3.1

Porting code from MPI/Pro 1.7 to Intel MPI 3.1

I am in the process of switching from MPI/Pro 1.7 to Intel MPI 3.1 and I am seeing very strange (and poor) performance issues that have stumped me.

I am seeing poor performance throughout the entire code, but the front end is a good illustration of some of the problems I am seeing. The front end consists of two processes, process 0 (or I/O process) reads in a data header and data and passes them to process 1 (or compute process). Process 1 then processes the data and sends the output field(s) back to process 0 which saves them to disk.

Here is the outline of the MPI framework for the two processes for the simple case of 1 I/O process and 1 compute process:

Process 0:

For (ifrm=0; ifrm <= totfrm; ifrm++;) {

if (ifrm != totfrm) {
data_read (..., InpBuf, HD1,...);
MPI_Ssend (HD1,...);
MPI_Ssend (InpBuf,...);
}

if (ifrm > 0) {
MPI_Recv (OutBuf,...);
sav_data (OutBuf,...);
}

} // for (ifrm=0...

// No more data, send termination message
MPI_Send (MPI_BOTTOM, 0, ...);

Process 1:

// Initialize persistent communication requests
MPI_Recv_init (HdrBuf, ..., req_recvhdr);
MPI_Recv_init (InpBuf, ..., req_recvdat);
MPI_Ssend_init (OutBuf, ..., req_sendout);

// Get header and data for first frame
MPI_Start (req_recvhdr);
MPI_Start (req_recvdat);

while (1) {

MPI_Wait (req_recvhdr, status);
MPI_Get_Count (status, count);
if (count = 0) {
execute termination code
}

MPI_Wait (req_recvdat);

// Start receive on next frame while processing current one
MPI_Start (req_recvhdr);
MPI_Start (req_recvdat);

(...)
process data
(...)

if (curr_frame > start_frame) {
MPI_Wait (req_sendout);
}

(...)
process data
(...)

// Send output field(s) back to I/O process
MPI_Start (req_sendout);

} // while (1)

The problem I am having is that the MPI_Wait calls are chewing up a lot CPU cycles for no obvious reason and in a very erratic way. When using MPI/Pro, the above MPI framework works in a very reliable and predicable way. However, with Intel MPI, the code can spend almost no time (expected) or several minutes (very unexpected) on one of the MPI_Wait calls. The two waits that are giving me the most problems are the ones associated with req_recvhdr and req_sendout.

The code is compiled using the 64-bit versions of the Intel compiler 10.1 and Intel MKL 10.0 and is run on RHEL4 nodes. Both processes are run on the same core.

Like I have already said, this framework works well under MPI/Pro and I am stumped in terms of locating the problem(s) or what things I should try in order to fix the code. Any insight or guidance you could provide would be greatly appreciated.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi jburri,

Thanks for posting to the Intel HPC forums and welcome!

You probably need to use wait-mode. Please try to set environment variable I_MPI_WAIT_MODE to 'on'.
Also you could try to set env variable I_MPI_RDMA_WRITE_IMM to 'enable'.
And you could play with I_MPI_SPIN_COUNT variable setting different values.

Best wishes,
Dmitry

Thanks Dmitry. I will play around with those paramter and see what the impact is on performance.

-Jeremy

Leave a Comment

Please sign in to add a comment. Not a member? Join today