Hybrid mode execution(Xeon and MiC)

Hybrid mode execution(Xeon and MiC)

Hi,
I am trying to run a hybrid mode(Xeon and MiC) of my application using MPI. However I keep getting this error.

HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

I am using this command to execute the binaries on Xeon and MiC
mpirun -genvnone -genv I_MPI_FABRICS shm:tcp -n 24 -machinefile machinefile executable_Xeon -parallel : -n 60 -machinefile machinefile executable_MiC -parallel

15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Ravi, 

Your command line is incorrect. The -machinefile option is a global flag so you cannot use it for each argument set as you're doing. Is the machine file same on the host and the coprocessor? 

If yes, you could use the following command: 

mpirun -genvnone -genv I_MPI_FABRICS shm:tcp -machinefile machinefile –n 24 executable_Xeon -parallel : -n 60 executable_MiC -parallel

else, you can specify the hosts like this: 

mpirun -genvnone -genv I_MPI_FABRICS shm:tcp -n 24 -hosts Xeon1,Xeon2 executable_Xeon -parallel : -n 60 -hosts MIC1,MIC2 executable_MiC -parallel

Finally, I assume that the -parallel option is an argument for your executables. Is that correct? 

Hi Sumedh,

I tried the changes you recommended but still got the same error.

Yes -parallel is an argument to the executable.

Hi Ravi, 

At this point, I am unsure of what is causing this error. Could you give us some more information about the environment: 

1) Are you running the application under a job scheduler? Do you have any other MPI libraries installed? 

2) Could you please verify the functionality of the Intel MPI library by running a test example that is provided with the MPI library. You can find the test code in <install_dir>/test.

3) Can you please share the results of the following commands: 

    $ which mpirun

    $env | grep I_MPI

4) Try collecting more debug info by adding I_MPI_DEBUG=5 to your run command:

    mpirun –genvnone –genv I_MPI_DEBUG5 –genv I_MPI_FABRICS shm:tcp

 

 

Hi Ravi,

An additional question and some remarks:

Did you export the environment variable I_MPI_MIC=1? This is necessary to run in symmetric mode. If not you should also see a message (with $I_MPI_ROOT expanded to the installation directory):
bash: $I_MPI_ROOT/intel64/bin/pmi_proxy: cannot execute binary file

W.r.t. your initial command line, you can easily use a global machinefile if the executable names differ only in a postfix or a prefix by using the environment variables I_MPI_MIC_POSTFIX or I_MPI_MIC_POSTFIX, e.g.:

Executable on host: executable_Xeon
Executable on MIC: executable_Xeon.mic
export I_MPI_MIC=1
mpirun -genvnone -genv I_MPI_MIC_POSTFIX .mic -genv I_MPI_FABRICS shm:tcp -n 84 -machinefile machinefile executable_Xeon -parallel

NB: Because of the -genvnone you have to put the I_MPI_MIC_POSTFIX setting onto the command line. Please consider whether -genvnone is really needed, otherwise you could just: export I_MPI_MIC_POSTFIX=.mic

Intel MPI will add the postfix on the fly to the executable name on MIC. The prefix is in particular useful if your MIC executable has an identical name but is located in another directory, e.g.:

Executable on coprocessor: local_sub_dir/executable_Xeon
export I_MPI_MIC_PREFIX=local_sub_dir/

Hi Sumedh,

Let me answer your queries.

1. There is no job scheduler.

2. I have been successful in executing the binaries on Xeon and MiC individually.

3. which mpirun  : /opt/intel//impi/4.1.3.048/mic/bin/mpirun

env | grep I_MPI : I_MPI_MIC=1, I_MPI_ROOT=/opt/intel//impi/4.1.3.048

Hi Klaus,

I did export I_MPI_MIC=1 on MiC before running the executable.

 

 

Hi Ravi, 

It would be very helpful if you can also share the output with the I_MPI_DEBUG5. 

Thanks, 

Sumedh

Hi Sumedh,

The thing is I am not able to run/spawn the binaries on the host and the co-processor in hybrid mode. Hence I am unable to collect info from I_MPI_DEBUG.

Do you think InfiniBand setup might be a cause for this error?

In my experience, a broken infiniband setup didn't prevent running with I_MPI_FABRICS shm:tcp, although performance could be much worse than with no infiniband.

Hi Ravi, 

I agree with Tim. Since you are using shm:tcp which will enable running over ethernet, infiniband is not used at all. 

However, I did notice that you have not sourced the correct MPI variables. I am guessing that you are running the MPI command on the host and hence you need to source the host MPI variables. 

[user@host1] source /opt/intel/impi/4.1.3.048/bin64/mpivars.sh
[user@host1] which mpirun
/opt/intel/impi/4.1.3.048/bin/intel64/mpirun

Also, can you please confirm that you have setup passwordless ssh on the coprocessor i.e. whenever you execute "ssh mic0" you should be able log into the coprocessor without being prompted for a password. 

Lastly, could you please share the contents of machinefile. The machine file should contain just the host and the MIC hostnames. For example, 

[user@host1] cat machinefile
host1
mic0

 

 

Hi Sumedh,

Let me tell you the steps I followed, I guess that would paint a clear picture. I am spawining the executables from MiC using I_MPI_MIC=1 and sourcing all the paths/exports in the bash of mic before launching the executables. For the host I have modified the .bashrc to include the paths/exports. Modifying the bashrc was necessary because I have to take care of application specific exports.the content of my machine file is as follows:

Host-hostname:24

MiC-hostname:60

Yes you are correct I can do a passwordless ssh into MiC

Hi Ravi, 

For the symmetric mode of execution, I generally spawn the executable from the host with I_MPI_MIC=1. I source just the paths for the host using "/opt/intel/impi/4.1.3.048/bin64/mpivars.sh". You can also set any additional environment variables for the host and the coprocessor separately by using the '-env' clause. 

Lastly, If you specify the number of ranks in the machine file, you do not need specify the ranks in the command line using the '-n' switch. 

Check for any firewalls which may be preventing the host and coprocessor from communicating.

Hi Sumedh,

Pardon me for the delayed response.I was successful in launching the symmetric mode from Host machine.

Here is my mpirun command

mpirun -genv I_MPI_FABRICS shm:tcp -n 24 -machinefile machinefile <XEON_executable> parameters: -genvnone -n 60 -machinefile machinefile sh script.sh

I exported all the paths for Xeon along with I_MPI_MIC=1 . Content of script.sh

source /opt/intel/impi/latest/mic/bin/mpivars.sh 
export MPI_ROOT=$I_MPI_ROOT/mic 
export LD_LIBRARY_PATH=/opt/intel/compiler/latest/lib/mic:$LD_LIBRARY_PATH
<MiC_executable> parameters

Thanks for all your help.

Ravi,

Thanks for documenting this for the community.

Regards
---
Taylor
 

Leave a Comment

Please sign in to add a comment. Not a member? Join today