I am running a large program called Vienna ab-initio Simulation Program (VASP) under parallel studio and intel mpi. I have compiled the program without problem and it runs apparenly correctly on all of the examples and produces correct results when run under mpi, however, a slightly larger job, which is what I bought the program for repeatedly crashes with an PMPI_Allgatherv error. As no other users of (this fairly widely used program) report similar errors, I am concerned that it is either an intel mpi bug.
I have a cluster single node of 4 socket total 32 corers. The systems running Redhat 6.3, and intelmpi 4 update 3. I am using Slurm to start mpi jobs. It seems that whenever I try to run multiple MPI jobs to a single node all the jobs end up running on the same processors. Moreover i notice that the job use all the cores in the node. For example: i started with the first mpi job using Slurm on the node with 8 cores; and i notice that the first mpi task run on 0 to 3 cpus, the2-ndmpi task on 4-7 cpus, and so on the last task on 28-31. Each mpi task used 4 cores instead 1.
We have installed cluster suite, but unable to locate the wrapper scripts mpiicc, mpiifort,
what to do?
the MPI compilation of my code results in the following error:
ld: MPIR_Thread: TLS definition in /opt/apps/intel13/impi/4.1.0.030/intel64/lib/libmpi_mt.so section .tbss mismatches non-TLS definition in /opt/apps/intel13/impi/4.1.0.030/intel64/lib/libmpi.so.4 section .bss
/opt/apps/intel13/impi/4.1.0.030/intel64/lib/libmpi.so.4: could not read symbols: Bad value
I have sent the systems admin. several e-mails but he just can't seem to figure it out. Any help on how to fix this problem would be appreciated.
I bought/installed cluster studio for Windows back in early 2012. I am just now compiling our MPI Fortran code in parallel on Windows. I had it compiled and running, I successfully ran hundreds of check-out problems.
I wanted to see how easy it would be to allow our users to run in parallel on Windows. I downloaded the MPI runtime, installed it on another machine, brought my executable over, and it crashed on a SCATTERV call-- "memcpy arguments alias each other".
I have been using a mixture of MPICH2 and TBB very successfully:
MPICH2 for machine-to-machine communication and TBB for inter-machine thread management.
Now, I am trying the very same code in the system which uses Intel MPI instead of MPICH2,
and I am observing a very odd behavior; some messages sent with MPI_Ssend is not being received
in the destination, and I am wondering whether it is because Intel MPI and TBB does not work well
The following document
It looks like there is a bug in the way Intel MPI interacts with SLURM. I had the following hostlist in SLURM_JOB_NODELIST
Other versions of MPI such as OpenMPI have had no problems interpreting this. However Intel MPI when it used that node list it tried to find itc017. That isn't even a valid hostname let alone at that hostlist.
I wrote a script to bypass this and generate the correct host list and explicitly pass it to Intel MPI. However, it would be better to fix this inside of Intel MPI itself.
i am using the ifort compiler v. 13.0.1 20121010 together with Intel MPI v.4.1.0.024 on an x86_64 Linux cluster. Using 64-bit integers as default (ILP64 model) in my little Fortran program i obtain wrong results when i use MPI_IN_PLACE in MPI_REDUCE calls (both for integer and real(8)):
my code is as follows:
I'm running a WRF with over 300 processes. There are situation that one of the processes crashes, but the other processes are still keep burning the cpu's. Is there any way that Intel's mpi can terminate the program automatically whenever one of the processes exits?
Thank you very much
Dear a good man;
I use cluster pack compiler of evaluation editon.
When i type "mpirun -np 2 -machinefile machinefile a program"
I got a messgae following that
bash: /export/apps/intel/impi/4.1.0.024/intel64/bin/pmi_proxy :No such file or directory
I already added the path on my own path on master and each nodes
but it was not working properly
please give me any adivce
Thanks in advance