WRFVAR hang up if compiled with Intel MPI Library 4.0.0.025 using MPI

WRFVAR hang up if compiled with Intel MPI Library 4.0.0.025 using MPI

Hi,

I'm on a CentOS 5.2box with 2 Intel XeonL5420 CPUs. I compiled WRFVAR with dmpar(distribute share memery)options by Intel Compiler11.1.069 and Intel MPI Library 4.0.0.025. There is NO error complaint for the compiling. It always hangup after launching by mpirun, no error messages and no complaint.

1. The example codeunderIMPI 4 direcotry works fine.

2. Itworks fine if compiled WRFVAR by Intel Compiler withIMPI Lib using OpenMP.

3. It works fine ifcompiled WRFVAR by Intel Compiler with MPICH21.2.1p1(compiledby Gfortran 4.3.3) using MPI

4. It works fine if compiled WRFVAR by Gfortran 4.3.3 with MPICH2 1.2.1p1 using MPI or OpenMP.

It just simply hang up if using MPI with IMPI Lib. I have no clue.

Could anyone can tell me the possible reason or how to debug it? Thanks.

Here are compiling options

icc -I. -w -O3 -ip -DDM_PARALLEL -DMAX_HISTORY=25

ifort -c -r8 -i4 -w -ftz -align all -fno-alias -fp-model precise -FR -convert big_endian

mpicc -cc=icc -DMPI2_SUPPORT -DFSEEKO64_OK -w -O3 -ip -DDM_PARALLEL -DMAX_HISTORY=25

mpif90 -f90=ifort -O3 -w -ftz -align all -fno-alias -fp-model precise -FR -convert big_endian -r8 -i4 -convert big_endian

Here are my steps for the Compiler 11.1.069 and Intel MPI Library

1. Set up environment variable
source ~/intel/Compiler/11.1/069/bin/ia32/ifortvars_ia32.csh
source ~/intel/Compiler/11.1/069/bin/ia32/iccvars_ia32.csh
source ~/intel/impi/4.0.0.025/bin/mpivars.csh

2. Launching mpd
mpd &

3. Launching WRFVAR by mpirun
mpirun -np 8 ./da_wrfvar.exe

6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi Jerry,

First of all I have some questions to you.
1. Why do you use 32 bit version of compiler and Intel MPI library if you have 64 bit processor? You can set env by:
source ~/intel/Compiler/11.1/069/bin/iccvars.csh intel64
source ~/intel/Compiler/11.1/069/bin/ifortvars intel64
source ~/intel/impi/4.0.0.025/bin64/mpivars.csh
2. It would be better to use mpiicc and mpiifort for Intel compilers.
3. If your application is multi-threaded (or openMP) you need to add '-mt_mpi' compiler option.
4. You are working in Linux-like system but your application have .exe extension...
5. Why do you start 'mpd' and use 'mpirun'? mpirun creates mpd ring, starts application and kills mpd ring when application is complete.

Well, about additional information. You can get a lot of debug information from MPI library - you just need to set I_MPI_DEBUG env variable to appropriate level. You can start from 10 and increase it up to 1000.
mpirun -np 8 -genv I_MPI_DEBUG 10 ./da_wrfvar.exe

But it seems to me that this issue is related to compiler or optimization level. Could you tried to use '-O2' instead of '-O3' (or may be even '-O1') and remove '-ip'?
Let me know the results.

Regards!
Dmitry

Hi Dmitry,

Thanks for your reply.

>>1. Why do you use 32 bit version of compiler and Intel MPI library if you have 64 bit processor?

I will get Segmentation faul at runtime if I compiled with 64-bit version of compiler. But it works fine with 32-bit compiler. I have no idea.

>>3. If your application is multi-threaded (or openMP) you need to add '-mt_mpi' compiler option

No. I don't enable openMP.

>>5. Why do you start 'mpd' and use 'mpirun'? mpirun creates mpd ring, starts application and kills mpd ring when application is complete.

I tried to lauch mpd first and use 'mpiexec', or only use 'mpirun'. It sill didn't work.

>>2. It would be better to use mpiicc and mpiifort for Intel compilers.
>>tried to use '-O2' instead of '-O3' (or may be even '-O1') and remove '-ip'?

I tried all of them. It still didn't work.

Jerry,

Could you get the output with '-genv I_MPI_DEBUG 10'?
Could you run this application under a debugger?

I'll try to reproduce the issue locally but I'm afraid that it will take a lot of time as soon as I'm quite busy with another tasks.

Regards!
Dmitry

Hi Dmitry,

Here the output with -env I_MPI_DEBUG 10

[0] MPI startup(): Intel MPI Library, Version 4.0 Build 20100224
[0] MPI startup(): Copyright (C) 2003-2010 Intel Corporation. All rights reserved.
[1] MPI startup(): shm data transfer mode
[3] MPI startup(): shm data transfer mode
[0] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[1] MPI startup(): set domain to {2,3} on node bay-mmm
[1] MPI startup(): Recognition level=1. Platform code=1. Device=1
[1] MPI startup(): Parent configuration:(intra=1 inter=1 flags=0), (code=1 ppn=2)
[3] MPI startup(): set domain to {6,7} on node bay-mmm
[3] MPI startup(): Recognition level=1. Platform code=1. Device=1
[3] MPI startup(): Parent configuration:(intra=1 inter=1 flags=0), (code=1 ppn=2)
[0] MPI startup(): I_MPI_DEBUG=10
starting wrf task 1 of 4
[0] MPI startup(): set domain to {0,1} on node bay-mmm
[0] MPI startup(): Recognition level=1. Platform code=1. Device=1
[0] MPI startup(): Parent configuration:(intra=1 inter=1 flags=0), (code=1 ppn=2)
[0] Allgather: 1: 0-1024 & 0-4
[0] Allgather: 1: 4096-16384 & 0-4
[0] Allgather: 1: 524288-2147483647 & 0-4
[0] Allgather: 1: 0-256 & 5-8
[0] Allgather: 1: 0-256 & 9-16
[0] Allgather: 1: 0-256 & 17-32
[0] Allgather: 1: 0-512 & 33-64
[0] Allgather: 1: 0-256 & 65-2147483647
[0] Allgather: 3: 0-2147483647 & 0-2147483647
[0] Allgatherv: 0: 0-2147483647 & 0-2147483647
[0] Allreduce: 1: 0-2048 & 0-4
[0] Allreduce: 3: 2097152-2147483647 & 0-4
[0] Allreduce: 1: 0-1024 & 5-8
[0] Allreduce: 3: 2097152-2147483647 & 5-8
[0] Allreduce: 1: 0-512 & 9-16
[0] Allreduce: 3: 2097152-2147483647 & 9-16
[0] Allreduce: 1: 0-512 & 17-32
[0] Allreduce: 3: 2097152-2147483647 & 17-32
[0] Allreduce: 1: 0-512 & 33-64
[0] Allreduce: 3: 2097152-2147483647 & 33-64
[0] Allreduce: 1: 0-512 & 65-2147483647
[0] Allreduce: 2: 0-2147483647 & 0-2147483647
[0] Alltoall: 2: 512-1048576 & 0-4
[0] Alltoall: 4: 1048576-2097152 & 0-4
[0] Alltoall: 2: 2097152-2147483647 & 0-4
[0] Alltoall: 2: 0-2147483647 & 5-8
[0] Alltoall: 2: 0-4 & 9-16
[0] Alltoall: 1: 4-16 & 9-16
[0] Alltoall: 2: 16-8192 & 9-16
[0] Alltoall: 1: 0-64 & 17-32
[0] Alltoall: 2: 64-128 & 17-32
[0] Alltoall: 1: 128-512 & 17-32
[0] Alltoall: 4: 512-1024 & 17-32
[0] Alltoall: 2: 1024-8192 & 17-32

[2] MPI startup(): set domain to {4,5} on node bay-mmm
[2] MPI startup(): Recognition level=1. Platform code=1. Device=1
[2] MPI startup(): Parent configuration:(intra=1 inter=1 flags=0), (code=1 ppn=2)
[0] Alltoall: 1: 0-32 & 33-64
[0] Alltoall: 4: 32-64 & 33-64
[0] Alltoall: 1: 64-512 & 33-64
[0] Alltoall: 4: 512-4096 & 33-64
[0] Alltoall: 1: 0-32 & 65-2147483647
[0] Alltoall: 4: 32-4096 & 65-2147483647
[0] Alltoall: 4: 65536-131072 & 65-2147483647
[0] Alltoall: 3: 0-2147483647 & 0-2147483647
[0] Alltoallv: 2: 0-2147483647 & 33-2147483647
[0] Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] Barrier: 2: 0-2147483647 & 0-2147483647
[0] Bcast: 1: 0-1048576 & 0-4
[0] Bcast: 1: 0-256 & 5-8
[0] Bcast: 1: 2048-16384 & 5-8
[0] Bcast: 7: 0-2147483647 & 0-2147483647
[0] Exscan: 0: 0-2147483647 & 0-2147483647
[0] Gather: 2: 1024-2147483647 & 0-4
[0] Gather: 2: 0-1024 & 33-64
[0] Gather: 2: 0-512 & 65-2147483647
[0] Gather: 3: 0-2147483647 & 0-2147483647
starting wrf task 2 of 4
[0] Gatherv: 1: 0-2147483647 & 0-2147483647
[0] Reduce_scatter: 1: 0-2048 & 0-4
[0] Reduce_scatter: 1: 0-32768 & 5-8
[0] Reduce_scatter: 1: 0-32768 & 9-16
[0] Reduce_scatter: 4: 262144-524288 & 9-16
[0] Reduce_scatter: 1: 0-32 & 17-32
[0] Reduce_scatter: 5: 32-64 & 17-32
[0] Reduce_scatter: 1: 64-32768 & 17-32
[0] Reduce_scatter: 5: 0-128 & 33-64
[0] Reduce_scatter: 1: 128-65536 & 33-64
[0] Reduce_scatter: 5: 65536-262144 & 33-64
[0] Reduce_scatter: 4: 0-4 & 65-2147483647
[0] Reduce_scatter: 5: 4-256 & 65-2147483647
[0] Reduce_scatter: 1: 256-131072 & 65-2147483647
[0] Reduce_scatter: 5: 131072-524288 & 65-2147483647
[0] Reduce_scatter: 2: 0-2147483647 & 0-2147483647
[0] Reduce: 2: 0-128 & 0-4
[0] Reduce: 1: 0-2147483647 & 0-2147483647
[0] Scan: 0: 0-2147483647 & 0-2147483647
[0] Scatter: 2: 1024-2147483647 & 0-4
[0] Scatter: 1: 0-512 & 33-64
[0] Scatter: 2: 512-1024 & 33-64
[0] Scatter: 1: 0-256 & 65-2147483647
[0] Scatter: 3: 0-2147483647 & 0-2147483647
[0] Scatterv: 1: 0-2147483647 & 0-2147483647
[0] Rank Pid Node name Pin cpu
[0] 0 7302 bay-mmm {0,1}
[0] 1 7300 bay-mmm {2,3}
[0] 2 7301 bay-mmm {4,5}
[0] 3 7303 bay-mmm {6,7}
starting wrf task 0 of 4
starting wrf task 3 of 4

>>Could you run this application under a debugger?

Do you mean that run it under Intel Debugger? I'm sorry that I have no experiance on running Intel Debugger. But I can try it.

PS. I tried compile with -O0, -O1 and -O2, only -O0 works with Intel MPI Library 4.

Thanks

Jerry,

Does it mean that your application works Ok (see example in previuous message)?

>Do you mean that run it under Intel Debugger?
Doesn't matter - any debugger you are familiar with. ('mpirun -gdb' will start application in gdb)

If this application works with -O0 it means that Intel MPI Library works fine - library is the same for any option.

There are 2 areas for errors:
1. Compiler.
2. Application itself. Imagine that after optimization one thread works faster than another and MPI_Recv happens before MPI_Send - you'll get deadlock.

Anyway, analyzing such applications (errors) requires a lot of time and I cannot spend so much time. I just notice that Intel MPI Library works correctly.

Regards!
Dmitry

Connectez-vous pour laisser un commentaire.