Intel MPI and DAPL

Intel MPI and DAPL

I am migrating MPI application written for Linux to Windows (Windows 2008 server). I downloaded the MPI
Library Kit for eval. I am also using Mellanox WinOF for Windows.

I am getting "DAPL provider is not found and fallback device is not enabled" when I
run:

mpiexec -n 4 -env I_MPI_DEBUG 100 -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEVICE rdma::ibnic0v2 HelloWorld.exe

I would like to know how to configure; so that I can run with I_MPI_DEVICE rdma.

Thanks
Warenne Casano

================================================================
Here is the output from the run.
================================================================

C:\IntelMPIEval\HelloWorld>mpiexec -n 4 -env I_MPI_DEBUG 100 -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEVICE rdma::ibnic0v2 HelloWorld.exe
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.2.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD 3/12/2009 11:42:10 AM
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE winconfigure.wsf
[0] MPI startup(): I_MPI_VERSION_MACHINENAME SVSMPIW03
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.2.1 3/12/2009
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.2.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD 3/12/2009 11:42:10 AM
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE winconfigure.wsf
[0] MPI startup(): I_MPI_VERSION_MACHINENAME SVSMPIW03
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.2.1 3/12/2009
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.2.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD 3/12/2009 11:42:10 AM
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE winconfigure.wsf
[0] MPI startup(): I_MPI_VERSION_MACHINENAME SVSMPIW03
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.2.1 3/12/2009
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.2.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD 3/12/2009 11:42:10 AM
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE winconfigure.wsf
[0] MPI startup(): I_MPI_VERSION_MACHINENAME SVSMPIW03
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.2.1 3/12/2009
[0] MPIDI_CH3I_SHM_recv_alarm_msg(): enable generic copy routine for short messages
[0] MPIDI_CH3_Init(): number of shm buffers = 16
[0] MPIDI_CH3_Init(): size of shm buffer = 16384
[0] MPIDI_CH3_Init(): size of shm buffer structure = 16400
[0] MPIDI_CH3_Init(): size of shm queue structure = 262408
[0] MPIDI_CH3_Init(): can not use fallback device
[0] MPIDI_CH3_Init(): failover flags = 0x5
[0] MPIDI_CH3_Init(): wait timeout = 0
[3] MPIDI_CH3I_SHM_recv_alarm_msg(): enable generic copy routine for short messages
[3] MPIDI_CH3_Init(): number of shm buffers = 16
[3] MPIDI_CH3_Init(): size of shm buffer = 16384
[3] MPIDI_CH3_Init(): size of shm buffer structure = 16400
[3] MPIDI_CH3_Init(): size of shm queue structure = 262408
[3] MPIDI_CH3_Init(): can not use fallback device
[0] MPIDI_CH3I_RDMA_init(): entering
[0] I_MPI_init_dat_regestry_info(): trying to load dat library dat.dll
[3] MPIDI_CH3_Init(): failover flags = 0x5
[3] MPIDI_CH3_Init(): wait timeout = 0
[3] MPIDI_CH3I_RDMA_init(): entering
[3] I_MPI_init_dat_regestry_info(): trying to load dat library dat.dll
[1] MPIDI_CH3I_SHM_recv_alarm_msg(): enable generic copy routine for short messages
[1] MPIDI_CH3_Init(): number of shm buffers = 16
[1] MPIDI_CH3_Init(): size of shm buffer = 16384
[1] MPIDI_CH3_Init(): size of shm buffer structure = 16400
[1] MPIDI_CH3_Init(): size of shm queue structure = 262408
[1] MPIDI_CH3_Init(): can not use fallback device
[1] MPIDI_CH3_Init(): failover flags = 0x5
[1] MPIDI_CH3_Init(): wait timeout = 0
[2] MPIDI_CH3I_SHM_recv_alarm_msg(): enable generic copy routine for short messages
[2] MPIDI_CH3_Init(): number of shm buffers = 16
[2] MPIDI_CH3_Init(): size of shm buffer = 16384
[2] MPIDI_CH3_Init(): size of shm buffer structure = 16400
[2] MPIDI_CH3_Init(): size of shm queue structure = 262408
[2] MPIDI_CH3_Init(): can not use fallback device
[2] MPIDI_CH3_Init(): failover flags = 0x5
[2] MPIDI_CH3_Init(): wait timeout = 0
[2] MPIDI_CH3I_RDMA_init(): entering
[2] I_MPI_init_dat_regestry_info(): trying to load dat library dat.dll
[1] MPIDI_CH3I_RDMA_init(): entering
[1] I_MPI_init_dat_regestry_info(): trying to load dat library dat.dll
[0] I_MPI_init_dat_regestry_info(): trying to load dat library dat2.dll
[1] I_MPI_init_dat_regestry_info(): trying to load dat library dat2.dll
[3] I_MPI_init_dat_regestry_info(): trying to load dat library dat2.dll
[2] I_MPI_init_dat_regestry_info(): trying to load dat library dat2.dll
[0] MPIDI_CH3I_RDMA_init(): exiting
[0] MPI startup(): Intel MPI Library, Version 3.2.1 Build 20090312
[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation. All rights reserved.
[3] MPIDI_CH3I_RDMA_init(): exiting
[1] MPIDI_CH3I_RDMA_init(): exiting
[2] MPIDI_CH3I_RDMA_init(): exiting

job aborted:
rank: node: exit code[: error message]
0: WINDOWS-BXEERKH: -1073741819: process 0 exited without calling finalize

1: WINDOWS-BXEERKH: -1073741819: process 1 exited without calling finalize
2: WINDOWS-BXEERKH: -1073741819: process 2 exited without calling finalize
3: WINDOWS-BXEERKH: -1073741819: process 3 exited without calling finalize

================================================================
Here is the content of dat.conf (in C:\Dat)
================================================================
#
# DAT (DAPL) configuration file
#
# Entries scanned sequentially - first entry to open is used.
#
# Each entry requires the following fields:
#
# \
#
#
# DAT v1.1 dapl provider configuration for HCA0 port 1
ibnic0 u1.1 threadsafe default dapl.dll ri.1.1 "IbalHca0 1" ""
IbalHca0 u1.1 threadsafe default dapl.dll ri.1.1 "IbalHca0 1" ""
#
# DAT 1.1 debug
ibnic0d u1.1 threadsafe default dapld.dll ri.1.1 "IbalHca0 1" ""
#
# DAT 2.0
ibnic0v2 u2.0 nonthreadsafe default dapl2.dll ri.2.0 "IbalHca0 1" ""
ibnic1v2 u2.0 nonthreadsafe default dapl2.dll ri.2.0 "IbalHca1 1" ""
IbalHca0v2 u2.0 nonthreadsafe default dapl2.dll ri.2.0 "IbalHca0 1" ""
#
# DAT 2.0 (debug)
ibnic0v2d u2.0 nonthreadsafe default dapl2d.dll" ri.2.0 "IbalHca0 1" ""
#
# DAT 2.0 [socket-cm] InfiniBand QPs setup by passing QP info over a socket
# connection; supports DAT Windows <==> Linux over IB connections.
ibnic0v2-scm u2.0 nonthreadsafe default apl2-scm.dll ri.2.0 "IbalHca0 1" ""
#
# Socket-CM (debug)
ibnic0v2-scmd u2.0 nonthreadsafe default dapl2-scmd.dll" ri.2.0 "IbalHca0 1" ""

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - wcasano
I am migrating MPI application written for Linux to Windows (Windows 2008 server). I downloaded the MPI
Library Kit for eval. I am also using Mellanox WinOF for Windows.

I am getting "DAPL provider is not found and fallback device is not enabled" when I
run:

mpiexec -n 4 -env I_MPI_DEBUG 100 -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEVICE rdma::ibnic0v2 HelloWorld.exe

I would like to know how to configure; so that I can run with I_MPI_DEVICE rdma.

Hi Warenne,

The command line is correct, but why do you use 2 colons in "rdma::ibnic0v2"?

Is dat.dll accessible through %PATH% variable?

I've check dat.conf on our server and I see the full path to any dll in this file.

Best wishes,
Dmitry

Hi Dimitri,

Thank you foryour response. I have several more questions:

(1) Which interface adapter in the dat.conf should I use for Mellanox ConnectX WinOF?

(2)What doesI_MPI_NETMASK and I_MPI_DEVICE doif I specify both of them?

(3) Can you provide mea checklist to verify whether my configuration is correct or not?

(4) I retried the command with the following changes:
(a) Add path for both "dat.conf" and "dat.dll" to %PATH%
(b) Remove "::ibnic0v2" from the command
(c) Modified "c:Datdat.conf" to include path to the dlls.

I still am getting the same error. Here is the output from the run.

C:IntelMPIEvalHelloWorld>mpiexec -n 1 -env I_MPI_DEBUG 100 -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEVICE rdma HelloWorld.exe
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.2.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD 3/12/2009 11:42:10 AM
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE winconfigure.wsf
[0] MPI startup(): I_MPI_VERSION_MACHINENAME SVSMPIW03
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.2.1 3/12/2009
[0] MPIDI_CH3I_SHM_recv_alarm_msg(): enable generic copy routine for short messages
[0] MPIDI_CH3_Init(): number of shm buffers = 16
[0] MPIDI_CH3_Init(): size of shm buffer = 16384
[0] MPIDI_CH3_Init(): size of shm buffer structure = 16400
[0] MPIDI_CH3_Init(): size of shm queue structure = 262408
[0] MPIDI_CH3_Init(): can not use fallback device
[0] MPIDI_CH3_Init(): failover flags = 0x5
[0] MPIDI_CH3_Init(): wait timeout = 0
[0] MPIDI_CH3I_RDMA_init(): entering
[0] I_MPI_init_dat_regestry_info(): trying to load dat library dat.dll
[0] I_MPI_init_dat_regestry_info(): trying to load dat library dat2.dll
[0] MPIDI_CH3I_RDMA_init(): exiting
[0] MPI startup(): Intel MPI Library, Version 3.2.1 Build 20090312
[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation. All rights reserved.

job aborted:
rank: node: exit code[: error message]
0: WINDOWS-BXEERKH: -1073741819: process 0 exited without calling finalize

Thanks again for your help.

Warenne

Hi Dimitri,

Thank you foryour response. I have several more questions:

(1) Which interface adapter in the dat.conf should I use for Mellanox ConnectX WinOF?

(2)What doesI_MPI_NETMASK and I_MPI_DEVICE doif I specify both of them?

(3) Can you provide mea checklist to verify whether my configuration is correct or not?

(4) I retried the command with the following changes:
(a) Add path for both "dat.conf" and "dat.dll" to %PATH%
(b) Remove "::ibnic0v2" from the command
(c) Modified "c:Datdat.conf" to include path to the dlls.

I still am getting the same error. Here is the output from the run.

C:IntelMPIEvalHelloWorld>mpiexec -n 1 -env I_MPI_DEBUG 100 -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEVICE rdma HelloWorld.exe
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.2.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD 3/12/2009 11:42:10 AM
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE winconfigure.wsf
[0] MPI startup(): I_MPI_VERSION_MACHINENAME SVSMPIW03
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.2.1 3/12/2009
[0] MPIDI_CH3I_SHM_recv_alarm_msg(): enable generic copy routine for short messages
[0] MPIDI_CH3_Init(): number of shm buffers = 16
[0] MPIDI_CH3_Init(): size of shm buffer = 16384
[0] MPIDI_CH3_Init(): size of shm buffer structure = 16400
[0] MPIDI_CH3_Init(): size of shm queue structure = 262408
[0] MPIDI_CH3_Init(): can not use fallback device
[0] MPIDI_CH3_Init(): failover flags = 0x5
[0] MPIDI_CH3_Init(): wait timeout = 0
[0] MPIDI_CH3I_RDMA_init(): entering
[0] I_MPI_init_dat_regestry_info(): trying to load dat library dat.dll
[0] I_MPI_init_dat_regestry_info(): trying to load dat library dat2.dll
[0] MPIDI_CH3I_RDMA_init(): exiting
[0] MPI startup(): Intel MPI Library, Version 3.2.1 Build 20090312
[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation. All rights reserved.

job aborted:
rank: node: exit code[: error message]
0: WINDOWS-BXEERKH: -1073741819: process 0 exited without calling finalize

Thanks again for your help.

Warenne

Quoting - wcasano

Hi Dimitri,

Thank you foryour response. I have several more questions:

(1) Which interface adapter in the dat.conf should I use for Mellanox ConnectX WinOF?

(2)What doesI_MPI_NETMASK and I_MPI_DEVICE doif I specify both of them?

(3) Can you provide mea checklist to verify whether my configuration is correct or not?

(4) I retried the command with the following changes:
(a) Add path for both "dat.conf" and "dat.dll" to %PATH%
(b) Remove "::ibnic0v2" from the command
(c) Modified "c:Datdat.conf" to include path to the dlls.

Warenne

Hi Warenne,

1. ibnic0v2 should be OK.
2. try to disable I_MPI_NETMASK for a while.
3. could you try 'dapltest.exe'? This utility should show you a correct adapter.
4. dat.conf on our server looks like:

# DAT v1.1 dapl provider configuration for HCA0 port 1
ibnic0 u1.1 threadsafe default C:Windowsdapl.dll ri.1.1 "IbalHca0 1" ""
IbalHca0 u1.1 threadsafe default C:Windowsdapl.dll ri.1.1 "IbalHca0 1" ""

# DAT 1.1 debug
ibnic0d u1.1 threadsafe default "C:\Program Files (x86)\WinOF\dapld.dll" ri.1.1 "IbalHca0 1" ""

# DAT 2.0
ibnic0v2 u2.0 nonthreadsafe default C:Windowsdapl2.dll ri.2.0 "IbalHca0 1" ""
IbalHca0v2 u2.0 nonthreadsafe default C:Windowsdapl2.dll ri.2.0 "IbalHca0 1" ""

# DAT 2.0 (debug)
ibnic0v2d u2.0 nonthreadsafe default "C:\Program Files (x86)\WinOF\dapl2d.dll" ri.2.0 "IbalHca0 1" ""

# DAT 2.0 [socket-cm] InfiniBand QPs setup by passing QP info over a socket
# connection; supports DAT Windows <==> Linux over IB connections.
ibnic0v2-scm u2.0 nonthreadsafe default C:Windowsdapl2-scm.dll ri.2.0 "IbalHca0 1" ""

# Socket-CM (debug)
ibnic0v2-scmd u2.0 nonthreadsafe default "C:\Program Files (x86)\WinOF\dapl2-scmd.dll" ri.2.0 "IbalHca0 1" ""

5. Have you installed Intel MPI? Did you use 'mpivars.bat'

6. To test the installation:
- Verify through the Computer Management that the smpd service is started. It calls the Intel MPI Process Manager.
- Verify that ia32bin (em64tbin for the Intel 64 architecture in the 64-bit mode) is in your path:
> echo %PATH%
You should see the correct path for each node you test.
> mpiexec.exe -hosts 2 host1 1 host2 1 a.bat
where a.bat contains
echo %PATH%

Run the test program with all available configurations on your cluster.
+ Test the sock device using:
> mpiexec.exe -n 2 -env I_MPI_DEBUG 2 -env I_MPI_DEVICE sock test.exe You should see one line of output for each rank, as well as debug output indicating that the sock device is used.
+ Test the ssm devices using:
> mpiexec.exe -n 2 -env I_MPI_DEBUG 2 -env I_MPI_DEVICE ssm test.exe
+Test the rdma device using:
> mpiexec.exe -n 2 -env I_MPI_DEBUG 2 -env I_MPI_DEVICE rdma test.exe

Please let me know which configuration works for you.

Try to start your HelloWorld.exe without I_MPI_FALLBACK_DEVICE.

Seems that something wrong with your environment, but it's hard to find out what it is exactly.

Regards!
Dmitry

Leave a Comment

Please sign in to add a comment. Not a member? Join today