Hello, I am experiencing an issue where MPI only executes the work on remote hosts when I am connected via RPD to those hosts.
I have a large processing job which has been split into parcels and farmed out to hosts. I can see from the log files generated by each host and written back to my central repository (mapped drive passed with the -mapall switch) that the hosts receive the instructions but then appear to stall. If I stop the job, connect to the hosts via RDP and then restart the job the hosts will happily process away.
I run the program by the following command:
mpiexec -wdir z:\directional -mapall -hosts 10 n01 5 n02 5 n03 5 n04 5 n05 5 n06 5 n07 5 n08 5 n09 5 n10 5 test
I have a simple test MPI job that I'm having trouble running on IMPI 4.1.036 with large node counts (>1500 processes). This is using Hydra as the process manager. It gets stuck at the following place in a verbose debug output:
[proxy:0:160@cf-sb-cpc-223] got pmi command (from 42): barrier_in
I'm trying to get Intel MPI v. 4.1.1.036 up and running over infiniband. Unfortunately, I'm getting odd slowdowns or in most cases complete hangs when using DAPL. I'm fairly new to all of this so it is most likely a configuration error, but I'm not sure where it is. Nor how to find it, so I'm hoping someone here can help.
To test things I'm using the MPI hello world example from http://mpitutorial.com/mpi-hello-world. I can run this over the ethernet interface without any problems :
I installed intel MPI Library 4.1.0.024 and tried compiling a MPI program. If I compile it without -mmic, there's no problem. But when I add -mmic, here's the error message:
/usr/bin/ld: skipping incompatible /opt/intel/impi/4.1.0.024/mic/lib/libmpigc4.so when searching for -lmpigc4
/usr/bin/ld: skipping incompatible /opt/intel/impi/4.1.0.024/mic/lib/libmpigc4.a when searching for -lmpigc4
/usr/bin/ld: cannot find -lmpigc4
Hi, What is the difference in implementation of dual-rail IB HCA vs a dual-port IB HCA? Is it anything to do with High Availability or High-Performance? Regards Girish Nair
Hi, We're running Compute Nodes with dual-port IB HCA QLE7342 and QLogic 12200-18 IB switch. We're using Intel OFED on CentOS 6.2 x64. We could run IB on single port without issues. To get additional performance using ib-bonding, please suggest the way to do so. When we connect the second port to the switch, the link LED is not coming ON. Where do I download the additional package ib-bonding, if that will solve? Please advise. Regards Girish Nair
I'm very happy to announce that a new version of Intel® Premier Support is about to be released. For details, check out the New Intel Premier Support.
Please expect some down-time: Thursday August 15th ~6:00pm PDT to Sunday August 18th ~5:00 PDT. During this period, please use the forums for issue submission.
Is it possible to run task on compute nodes having InfiniBand HCA from a master node that lacks IB HCA using Torque/Grid Engine?
Please guide if it is possible.
Intel MPI 4.1.1.036 is installed on all cluster machines.
The network configuration is as follows:
Master Node: (2xXeon E5-2450/96GB/CentOS 6.2/NFS Services over Ethernet) - 1 No.
Compute Nodes: (2xXeon E5-2450/96GB/TrueScale QDR Dual-port QLE7342/CentOS 6.2/NFS Client over GbE) - 4 Nos.