ICS 2011 is 15% slower than ICT 2010 on the same cluster with "basic install"

ICS 2011 is 15% slower than ICT 2010 on the same cluster with "basic install"

Hi,

On our little cluster (12 nodes, 144 cores) I have installed the new Intel Cluster Studio (ICS) 2011. I don't have uninstall Intel Cluster Toolkit (ICT) 2010. All of our programs (Fortran or C++ codes) are 15% slower when we start them with the mpirun of ICS 2011. I don't understand why... I did a normal installation and I didn't noticed any problem during the installation. I did the ICS 2011 installation with the same method that I used for ICT 2010.

About our Hardware/Sofwares:
Master: Intel Xeon CPU E5620
nodes: Intel Xeon CPU X5650
OS: CentOS 5.5
we are using Infiniband DDR (driver OFED-1.5.1); I_MPI_FABRIC set to shm:ofa; pinning is disabled.

We have recompiled our programs with Intel 12.0 or Intel 11.1 Compilers and the problem appears in the both case...so it is not a compiler problem.

What can I do ?

Best regards

9 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Guillaume,

First of all, could you check perfomance with default parameters. Please run IMB-MPI1 for both cases and compare. Let me know if the difference is still so big.
Pay attention that there is 'S' at the end of I_MPI_FABRICS.
Please run with I_MPI_DEBUG=9 and compare fabrics selected at run-time for both cases. Also you can compare settings for collective operations.
If you cannot find out the reason of different behaviour please submit a tracker at premier.intel.com and attached log files you got with I_MPI_DEBUG=9.

Regards!
Dmitry

Hi Dmitry!

Thank for the S at the end of I_MPI_FABRICS...it is now corrected. but it doesn't seem to solve the performance problem. Before I copy/paste the results with IMB1 Benchmark I would like to be sure that I'm using all the defaults of intel mpi. How can I be sure that I don't use mpitune optimized data files ?

Regards!
Guillaume

Hi Guillaume,

Mpitune settings will be used only in case of '-tune' option passed to mpiexec.
Also, please check enviroment: 'set | grep I_MPI_' . Ideally you should see I_MPI_ROOT only.

Regards!
Dmitry

I've just found your question about 3 programs (I start 3 programs with mpirun) in MKL forum...
I should mention that Intel MPI Library uses internal pinning (it's ON by default) so all 0 processes from all programs will be pinned to the processor #0 and you can get performance degradation.
If you run more than 1 MPI tasks you need to switch pinning OFF by 'export I_MPI_PIN=0'.

Hi Dmitry,

So here is my environment for ICS 2011:
set | grep I_MPI_ gives:
I_MPI_CC=icc
I_MPI_CXX=icpc
I_MPI_F77=ifort
I_MPI_F90=ifort
I_MPI_FABRICS=shm:ofa
I_MPI_FC=ifort
I_MPI_MPD_RSH=ssh
I_MPI_PIN=1
I_MPI_ROOT=/opt/intel/impi/4.0.1.007
I_MPI_TUNER_DATA_DIR=/opt/intel/impi/4.0.1/etc64/

With ICT 2010 here is the output:
I_MPI_CC=icc
I_MPI_CXX=icpc
I_MPI_F77=ifort
I_MPI_F90=ifort
I_MPI_FABRICS=shm:ofa
I_MPI_FC=ifort
I_MPI_MPD_RSH=ssh
I_MPI_PIN=0
I_MPI_ROOT=/opt/intel/impi/4.0.0.028
I_MPI_TUNER_DATA_DIR=/opt/intel/impi/4.0.0/etc64/

The big difference is the pinning. With ICT I have to disable it, I had problem with the pinning, when I started many job on he same node. With ICS this problem seems to be solve so the pinning is activated.

So I have started the intel IMB Benchmark on 12 nodes with ppn=4, so 48 processes. I can't use the whole cluster...we have important simulations, which are running. I attach the 2 logs.

We can see that ICS 2011 has the best results almost everywhere (not in barrier). But the problem is always here:
with ICT2010 with pinning deactivated our simulation go faster (between 10 and 15 %) than with ICS2011, pinning activated!?!

Do you see a problem in log ?

Thx for your help,
Best regards
Guillaume

Anlagen: 

Guillaume,

It seems to me that you have attached 2 identical files. Have you done it by mistake?

Regards!
Dmitry

Damned I was tired! I have edited the post...If you see something wrong in the logs, tell me, please.

Best regards,
Guillaume

Hi Dmitry,

I think the problem is solved. it was not a problem with Intel Softwares. We had problems with the "master" of the cluster. So my tests were running with a degraded master (12Gb RAM instead 48 Gb). The tests were on the nodes and not on the master, but it seems that the master memory has a influence on the results (The disks are mounted on the master)...at the moment all is OK.

Sorry for that,

Best regards,
Guillaume

Thank you Guillaume for letting me know. I haven't started the investigation yet. :-)

Feel free to ask your questions here if you have any.

Regards!
Dmitry

Melden Sie sich an, um einen Kommentar zu hinterlassen.