Intel MPI scales terribly on new Broadwell System

Intel MPI scales terribly on new Broadwell System

We have a user reporting that intelMPI scales horribly on our new Broadwell Cluster.  I'm wondering if this is a known issue, if there are configuration changes I need to make or whether this is expected behavior of Intel MPI 5.1.3?

 

--------------------------------------------------------------------

Intel MPI takes long initialization time when job size is large.

Below is the table of Number of MPI ranks vs MPI initialization time.
As you can see, the MPI initialization time increases by a factor of 4 as
the number of MPI ranks is doubled.
Taking extrapolation, it predicts that MPI initialization will take about
an hour when I use about 700 nodes on the Grizzly with intel-mpi, and 4
hours when I use the whole machine.

=============================
# NumRanks InitTime(seconds)

== intel-mpi (Grizzly)
36 0.12
72 0.43
144 0.72
288 1.27
576 2.53
1152 10.05
2304 34.23
4608 150.26
9216 595.88

==openmpi (Grizzly)
36 1.34
72 2.33
144 2.01
288 3.62
576 6.84
1152 8.67
2304 7.75
4608 17.93
9216 26.84

== Cray (Trinitite)
32 0.82
64 0.86
128 1.11
256 1.15
512 1.12
1024 1.11
2048 1.28
=============================

Above time is measured as follows.
Below is a simple MPI code that prints out the system time in nano seconds.

== mph_test.c
=============================
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int rank;
char hostname[256];

MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gethostname(hostname,255);

if(rank == 0) system("date +%s%N");

MPI_Finalize();

return 0;
}
=============================

After compile the code with some MPI library, I ran following

=============================

for x in 36 72 144 288 576 1152 2304 4608 9216; do
echo "$x "|tr -d "\n"
(date +%s%N |tr -d '\n'; echo " "|tr -d '\n'; mpirun -n $x
./mpi_test_intel-mpi) | awk '{print ($2-$1)/1e9}'
Done
=============================

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

There are multiple factors that can explain a slow startup.
Could you please provide a description of the cluster, the Intel MPI version that you are using and the result of the output when you set I_MPI_DEBUG=5 ?

Leave a Comment

Please sign in to add a comment. Not a member? Join today