SLURM PMI with Intel MPI 4.0.0.028

SLURM PMI with Intel MPI 4.0.0.028

Аватар пользователя ddcr@lcc.ufmg.br

Hi, We have installed in our cluster both the resource manager SLURM v.2.0.5 and Intel MPI 4.0.0.28 versions. I was experimenting withSLURM's implementation of the PMI interface but so far I am getting strange results. Here is the situation. The code I am testing is quite simple: #------------------------------------------------------simplest.f90-------------------------------------------------------------------------------------

program parsec_mpi implicit none include 'mpif.h' character(len=4) :: idstring character(len=80) :: name integer :: mpinfo, iam, procs_num, namelen ! Initialise MPI, get size and my id, create communicator call MPI_INIT(mpinfo)
call MPI_COMM_SIZE(MPI_COMM_WORLD, procs_num, mpinfo) call MPI_COMM_RANK(MPI_COMM_WORLD, iam, mpinfo) call MPI_GET_PROCESSOR_NAME(name, namelen, mpinfo)
print *, ' Process ', iam,'/', procs_num, ' (', trim(name), ') says "Hello, world!"' write(idstring,'(I4.4)') iam open(16,file='out.'//idstring,form='formatted',status='unknown') write(16,*) 'Processor No ', iam,' has started' write(16,*) 'Number of processors: ', procs_num

write(16,*) write(16,*) 'Closing file on PE #', iam write(16,*) close(16)
call MPI_FINALIZE(mpinfo)
end program parsec_mpi

#---------------------------------------------------------------------------------------------------------------------------------------------- I compile it,/opt/intel/impi/4.0.0.028/bin64/mpif90 -o simplest.x simplest.f90, and submit the following batch job: #-----------------------------------simplest.srm------------------------------------- #!/bin/bash # # Copyright (C) 2011 Domingos Rodrigues
# # Created: Sun Oct 2 22:44:13 2011 (BRT) # # $Id$ # # 2 nodes with 8-cores each (Infiniband) #
#SBATCH -o teste-%N-%j.out #SBATCH -J teste #SBATCH --ntasks=16 #SBATCH --nodes=2 #SBATCH --cpus-per-task=1
source /opt/intel/impi/4.0.0.028/intel64/bin/mpivars.sh export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so export I_MPI_FABRICS=shm:dapl
srun ./simplest.x #-----------------------------------simplest.srm------------------------------------- Well, the job runs sucessfully but the output is the most unexpected: Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" Process 0 / 1 (veredas4) says "Hello, world!" Process 0 / 1 (veredas3) says "Hello, world!" It seems that the processes are not getting the right rank (# processes is wrong). The submission goes well if I go through the old traditional way with the sequence of stepsmpdboot+mpiexc+mpdallexit. Could someone shed some light on this? Any help would be most appreciated! Best regards, Domingos

2 posts / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Dmitry Kuzmin (Intel)

Hi Domingos,

It seems to me that PMI virtualization should not work with 4.0.0.028 library. It's quite old. Could you please download 4.0 Update 2 and give it a try.
Also please read this article.

Regards!
Dmitry

Зарегистрируйтесь, чтобы оставить комментарий.