ICS 2011 is 15% slower than ICT 2010 on the same cluster with "basic install"

ICS 2011 is 15% slower than ICT 2010 on the same cluster with "basic install"

Ritratto di Guillaume De Nayer

Hi,

On our little cluster (12 nodes, 144 cores) I have installed the new Intel Cluster Studio (ICS) 2011. I don't have uninstall Intel Cluster Toolkit (ICT) 2010. All of our programs (Fortran or C++ codes) are 15% slower when we start them with the mpirun of ICS 2011. I don't understand why... I did a normal installation and I didn't noticed any problem during the installation. I did the ICS 2011 installation with the same method that I used for ICT 2010.

About our Hardware/Sofwares:
Master: Intel Xeon CPU E5620
nodes: Intel Xeon CPU X5650
OS: CentOS 5.5
we are using Infiniband DDR (driver OFED-1.5.1); I_MPI_FABRIC set to shm:ofa; pinning is disabled.

We have recompiled our programs with Intel 12.0 or Intel 11.1 Compilers and the problem appears in the both case...so it is not a compiler problem.

What can I do ?

Best regards

9 post / 0 new
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di Dmitry Kuzmin (Intel)

Hi Guillaume,

First of all, could you check perfomance with default parameters. Please run IMB-MPI1 for both cases and compare. Let me know if the difference is still so big.
Pay attention that there is 'S' at the end of I_MPI_FABRICS.
Please run with I_MPI_DEBUG=9 and compare fabrics selected at run-time for both cases. Also you can compare settings for collective operations.
If you cannot find out the reason of different behaviour please submit a tracker at premier.intel.com and attached log files you got with I_MPI_DEBUG=9.

Regards!
Dmitry

Ritratto di Guillaume De Nayer

Hi Dmitry!

Thank for the S at the end of I_MPI_FABRICS...it is now corrected. but it doesn't seem to solve the performance problem. Before I copy/paste the results with IMB1 Benchmark I would like to be sure that I'm using all the defaults of intel mpi. How can I be sure that I don't use mpitune optimized data files ?

Regards!
Guillaume

Ritratto di Dmitry Kuzmin (Intel)

Hi Guillaume,

Mpitune settings will be used only in case of '-tune' option passed to mpiexec.
Also, please check enviroment: 'set | grep I_MPI_' . Ideally you should see I_MPI_ROOT only.

Regards!
Dmitry

I've just found your question about 3 programs (I start 3 programs with mpirun) in MKL forum...
I should mention that Intel MPI Library uses internal pinning (it's ON by default) so all 0 processes from all programs will be pinned to the processor #0 and you can get performance degradation.
If you run more than 1 MPI tasks you need to switch pinning OFF by 'export I_MPI_PIN=0'.

Ritratto di Guillaume De Nayer

Hi Dmitry,

So here is my environment for ICS 2011:
set | grep I_MPI_ gives:
I_MPI_CC=icc
I_MPI_CXX=icpc
I_MPI_F77=ifort
I_MPI_F90=ifort
I_MPI_FABRICS=shm:ofa
I_MPI_FC=ifort
I_MPI_MPD_RSH=ssh
I_MPI_PIN=1
I_MPI_ROOT=/opt/intel/impi/4.0.1.007
I_MPI_TUNER_DATA_DIR=/opt/intel/impi/4.0.1/etc64/

With ICT 2010 here is the output:
I_MPI_CC=icc
I_MPI_CXX=icpc
I_MPI_F77=ifort
I_MPI_F90=ifort
I_MPI_FABRICS=shm:ofa
I_MPI_FC=ifort
I_MPI_MPD_RSH=ssh
I_MPI_PIN=0
I_MPI_ROOT=/opt/intel/impi/4.0.0.028
I_MPI_TUNER_DATA_DIR=/opt/intel/impi/4.0.0/etc64/

The big difference is the pinning. With ICT I have to disable it, I had problem with the pinning, when I started many job on he same node. With ICS this problem seems to be solve so the pinning is activated.

So I have started the intel IMB Benchmark on 12 nodes with ppn=4, so 48 processes. I can't use the whole cluster...we have important simulations, which are running. I attach the 2 logs.

We can see that ICS 2011 has the best results almost everywhere (not in barrier). But the problem is always here:
with ICT2010 with pinning deactivated our simulation go faster (between 10 and 15 %) than with ICS2011, pinning activated!?!

Do you see a problem in log ?

Thx for your help,
Best regards
Guillaume

Allegati: 

AllegatoDimensione
Scarica IMB-test_ICS.o56097189.27 KB
Scarica IMB-test_ICT.o56503183.88 KB
Ritratto di Dmitry Kuzmin (Intel)

Guillaume,

It seems to me that you have attached 2 identical files. Have you done it by mistake?

Regards!
Dmitry

Ritratto di Guillaume De Nayer

Damned I was tired! I have edited the post...If you see something wrong in the logs, tell me, please.

Best regards,
Guillaume

Ritratto di Guillaume De Nayer

Hi Dmitry,

I think the problem is solved. it was not a problem with Intel Softwares. We had problems with the "master" of the cluster. So my tests were running with a degraded master (12Gb RAM instead 48 Gb). The tests were on the nodes and not on the master, but it seems that the master memory has a influence on the results (The disks are mounted on the master)...at the moment all is OK.

Sorry for that,

Best regards,
Guillaume

Ritratto di Dmitry Kuzmin (Intel)

Thank you Guillaume for letting me know. I haven't started the investigation yet. :-)

Feel free to ask your questions here if you have any.

Regards!
Dmitry

Accedere per lasciare un commento.