IB PKEY and Service_Level

IB PKEY and Service_Level

Hello.

Does anyone knoe ifthere isa way to define a IB PKEY and Service_Level for intel MPI?

For open MPI you can set the env var:

OMPI_MCA_btl_openib_ib_service_level

OMPI_MCA_btl_openib_ib_pkey

Thanks.

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

What are you going to do and what is it for?

Regards!

Dmitry

Iam doing some IB networking tests.

so farI used open MPI andI used the parametersI mentioned.

Iam basiclly running a very simpe MPI to test the BW.

I want to try it using Itnel MPI as well.

As you probably understand, OMPI_MCA_btl_openib_ib_service_level and OMPI_MCA_btl_openib_ib_pkey are OpenMPI parameters and cannot be applied directly to the Intel MPI Library. Personally, I don't know what these parameteres ared used for.

Why not to try out-of-the-box performance of the Intel MPI Library. And if you are not happy with performance we can take a look and give you some advice how to improve it.

If you have Infiniband interconnect you just need to install OFED stack which can be downloaded from OpenFabrics

The fastest available fabric will be chosen automatically by Intel MPI Library. To check correctness you can set env variable I_MPI_DEBUG=2 and check the output. (rdssm mode will be used by default).

If you are going to test multi-rail performance you need at least version 4.0 of the Intel MPI Library and some special settitgs.

Regards!

Dmitry

I think i havent explained myself well.

The important parts for me are the network parts, not the performance. Those parameters are used in openMPI to set the infiniband pkey and service level. I want different MPI jobs to get different pkeys and service levels. I am testing the relative performance of those jobs.

So i need a way to enforce IntelMPI to run on a specific pkey and service level (in openMPI one need to set the values of those parameters).

Regards,

Itay

Hi Itay,

You cannot control these parameters in Intel MPI Library directly. Might be you could changed some settings for InfiniBand interconnect. I've asked a guru about this but he is on vacation meantime. Probably I'll be able to reply on Mar, 9.

Regards!

Dmitry

Thanks,

I am looking forward for a reply.

Itay

Hi Itay,

I've got the following answer:

"These parameters are used during the QP modify. The pkey_index is used during the QP init (ep_create)

and the service level is used during the RTR transistion (duringCONN_EST event).

pkey_index has to queried for given the pkey value via the verbs device calls.

So the best way to handle this short term is with environment variables, most likely DAPL_IB_PKEY

and DAPL_IB_SERVICE_LEVEL.

Another way, would be to extendthe definition ofDAT_QOS to support per connection but this would

require DAT specification change."

I hope it helps.

Regards!

Dmitry

Hello Dmirty.

I just now got to check it out.

I am afraid that those environment variables are not the ones to use.

Running Intel MPI with DAPL_IB_PKEY=0x8001, while the port does not have this pkey configured stills run the job:

[root@dodly0 tmp]# mpiexec -ppn 1 -n 2 -env DAPL_IB_PKEY 0x8001 -env I_MPI_DEBUG 2 -env I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none /tmp/osu

dodly4:15817: open_hca: device mthca0 not found

dodly4:15817: open_hca: device mthca0 not found

[0] MPI startup(): DAPL provider OpenIB-mthca0-1

[0] MPI startup(): dapl data transfer mode

[1] MPI startup(): DAPL provider OpenIB-mlx4_0-1

[1] MPI startup(): dapl data transfer mode

[0] MPI startup(): static connections storm algo

# OSU MPI Bandwidth Test v3.1.1

# Size Bandwidth (MB/s)

1 0.44

2 0.87

4 1.74

8 3.48

16 6.96

32 13.86

64 27.41

128 54.12

256 104.99

512 199.89

1024 356.42

2048 561.62

4096 672.56

8192 737.60

16384 770.65

32768 677.97

65536 795.04

131072 874.64

262144 915.74

524288 940.58

1048576 955.51

2097152 963.36

4194304 967.37

I checked the DAPL code and didnt find any environment variable related to infinibad service level or pkey:

> [root@dodly0 dapl-2.0.25]# pwd

> /usr/src/debug/dapl-2.0.25

> [root@dodly0 dapl-2.0.25]# rpm -qf /usr/src/debug/dapl-2.0.25

> dapl-debuginfo-2.0.25-1

> root@dodly0 dapl-2.0.25]# grep -rn dapl_os_get_env_val . | grep DAPL_IB

> ./dapl/openib_scm/device.c:354: dapl_ib_mtu(dapl_os_get_env_val("DAPL_IB_MTU", SCM_IB_MTU));

> ./dapl/openib_ucm/device.c:272: dapl_ib_mtu(dapl_os_get_env_val("DAPL_IB_MTU", DCM_IB_MTU));

> [root@dodly0 dapl-2.0.25]# grep -rn dapl_os_get_env_val .

> ./dapl/openib_cma/device.c:403: dapl_os_get_env_val("DAPL_MAX_INLINE",

> ./dapl/openib_cma/device.c:407: dapl_os_get_env_val("DAPL_MAX_INLINE",

> ./dapl/openib_cma/device.c:412: dapl_os_get_env_val("DAPL_MAX_CM_RESPONSE_TIME",

> ./dapl/openib_cma/device.c:415: dapl_os_get_env_val("DAPL_MAX_CM_RETRIES", IB_CM_RETRIES);

> ./dapl/openib_cma/cm.c:176: conn->arp_timeout = dapl_os_get_env_val("DAPL_CM_ARP_TIMEOUT_MS",

> ./dapl/openib_cma/cm.c:178: conn->arp_retries = dapl_os_get_env_val("DAPL_CM_ARP_RETRY_COUNT",

> ./dapl/openib_cma/cm.c:180: conn->route_timeout = dapl_os_get_env_val("DAPL_CM_ROUTE_TIMEOUT_MS",

> ./dapl/openib_cma/cm.c:182: conn->route_retries = dapl_os_get_env_val("DAPL_CM_ROUTE_RETRY_COUNT",

> ./dapl/openib_scm/device.c:338: dapl_os_get_env_val("DAPL_MAX_INLINE", INLINE_SEND_DEFAULT);

> ./dapl/openib_scm/device.c:340: dapl_os_get_env_val("DAPL_ACK_RETRY", SCM_ACK_RETRY);

> ./dapl/openib_scm/device.c:342: dapl_os_get_env_val("DAPL_ACK_TIMER", SCM_ACK_TIMER);

> ./dapl/openib_scm/device.c:344: dapl_os_get_env_val("DAPL_RNR_RETRY", SCM_RNR_RETRY);

> ./dapl/openib_scm/device.c:346: dapl_os_get_env_val("DAPL_RNR_TIMER", SCM_RNR_TIMER);

> ./dapl/openib_scm/device.c:348: dapl_os_get_env_val("DAPL_GLOBAL_ROUTING", SCM_GLOBAL);

> ./dapl/openib_scm/device.c:350: dapl_os_get_env_val("DAPL_HOP_LIMIT", SCM_HOP_LIMIT);

> ./dapl/openib_scm/device.c:352: dapl_os_get_env_val("DAPL_TCLASS", SCM_TCLASS);

> ./dapl/openib_scm/device.c:354: dapl_ib_mtu(dapl_os_get_env_val("DAPL_IB_MTU", SCM_IB_MTU));

> ./dapl/udapl/dapl_init.c:73: g_dapl_dbg_type = dapl_os_get_env_val("DAPL_DBG_TYPE",

> ./dapl/udapl/dapl_init.c:76: g_dapl_dbg_dest = dapl_os_get_env_val("DAPL_DBG_DEST",

> ./dapl/udapl/linux/dapl_osd.c:140: * dapl_os_get_env_val

> ./dapl/udapl/linux/dapl_osd.c:151:int dapl_os_get_env_val(char

> *env_str, int def_val) ./dapl/udapl/linux/dapl_osd.h:152:int dapl_os_get_env_val (

> ./dapl/openib_ucm/device.c:256: dapl_os_get_env_val("DAPL_MAX_INLINE", INLINE_SEND_IB_DEFAULT);

> ./dapl/openib_ucm/device.c:258: dapl_os_get_env_val("DAPL_ACK_RETRY", DCM_ACK_RETRY);

> ./dapl/openib_ucm/device.c:260: dapl_os_get_env_val("DAPL_ACK_TIMER", DCM_ACK_TIMER);

> ./dapl/openib_ucm/device.c:262: dapl_os_get_env_val("DAPL_RNR_RETRY", DCM_RNR_RETRY);

> ./dapl/openib_ucm/device.c:264: dapl_os_get_env_val("DAPL_RNR_TIMER", DCM_RNR_TIMER);

> ./dapl/openib_ucm/device.c:266: dapl_os_get_env_val("DAPL_GLOBAL_ROUTING", DCM_GLOBAL);

> ./dapl/openib_ucm/device.c:268: dapl_os_get_env_val("DAPL_HOP_LIMIT", DCM_HOP_LIMIT);

> ./dapl/openib_ucm/device.c:270: dapl_os_get_env_val("DAPL_TCLASS", DCM_TCLASS);

> ./dapl/openib_ucm/device.c:272: dapl_ib_mtu(dapl_os_get_env_val("DAPL_IB_MTU", DCM_IB_MTU));

> ./dapl/openib_ucm/device.c:472: tp->retries =

> dapl_os_get_env_val("DAPL_UCM_RETRY", DCM_RETRY_CNT);

> ./dapl/openib_ucm/device.c:473: tp->rep_time =

> dapl_os_get_env_val("DAPL_UCM_REP_TIME", DCM_REP_TIME);

> ./dapl/openib_ucm/device.c:474: tp->rtu_time =

> dapl_os_get_env_val("DAPL_UCM_RTU_TIME", DCM_RTU_TIME);

> ./dapl/openib_ucm/device.c:476: tp->qpe =

> dapl_os_get_env_val("DAPL_UCM_QP_SIZE", DCM_QP_SIZE);

> ./dapl/openib_ucm/device.c:477: tp->cqe =

> dapl_os_get_env_val("DAPL_UCM_CQ_SIZE", DCM_CQ_SIZE);

Thanks for your assistance.

Itay.

Hi Itay,

I've got some clarifications about these variables - they will be supported in upcoming release of DAPL library:

This new feature is targeted forthe
dapl-2.0.29 packageand OFED 1.5.2 release comingin June. You will
be able to override defaults (0 for all)with environment
variablesDAPL_IB_SL, DAPL_IB_PKEY, and DAPL_IB_PKEY_INDEX. If you provide
PKEY then uDAPL will locate correct index. If you provide INDEX then uDAPL will
use pkey at that index.

Please download either dapl-2.0.029 or ofed-1.5.2 when they are available at http://openfabrics.org/download_linux.htm

I hope that it will help you to resolve your issue.

Regards!
Dmitry

Thanks.

I will follow the releases and check it out.

I will post my results.

Itay.

Hi Itay,

Official dapl-2.0.28 is available!
Please try it out.

Regards!
Dmitry

Hi Itay,

Dapl 2.0.29 is available at http://www.openfabrics.org/downloads/dapl/
You will be able to override defaults (0 for all) with environment variables DAPL_IB_SL, DAPL_IB_PKEY, and DAPL_IB_PKEY_INDEX. If you provide PKEY then uDAPL will locate correct index. If you provide INDEX then uDAPL will use pkey at that index.

Best wishes,
Dmitry

Hello Dmitry.

The new DAPL didnt work:

[root@dodly0 dapl-2.0.29]# mpiexec -ppn 1 -n 2 -env I_MPI_FABRICS dapl:dapl -env I_MPI_DEBUG 2 -env I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env DAPL_IB_PKEY 0x8001 -env DAPL_DBG_TYPE 0xffff /tmp/osu

dodly0:22305: dapl_init: dbg_type=0xffff,dbg_dest=0x1

dodly4:1298: dapl_init: dbg_type=0xffff,dbg_dest=0x1

dodly4:1298: open_hca: device mthca0 not found

dodly4:1298: open_hca: device mthca0 not found

[0] MPI startup(): DAPL provider OpenIB-mthca0-1

[0] MPI startup(): dapl data transfer mode

[1] MPI startup(): DAPL provider OpenIB-mlx4_0-1

[1] MPI startup(): dapl data transfer mode

[0] MPI startup(): static connections storm algo

# OSU MPI Bandwidth Test v3.1.1

# Size Bandwidth (MB/s)

1 0.42

2 0.85

4 1.70

8 3.36

16 6.74

32 13.45

64 26.60

128 52.51

256 101.99

512 195.37

1024 346.71

2048 563.39

4096 681.00

8192 739.74

16384 765.02

32768 676.85

65536 795.06

131072 874.41

262144 916.78

524288 940.26

1048576 955.43

2097152 963.34

4194304 967.38

Again, the PKEY of 0x8001 is not set, so the packages were suppose to fall. But instead I guess DAPL is running over the default PKEY,

I posted a new post regarding DAPL debug under Intel MPI in order to get more information from DAPL, I would be glad if you will have a look.

Itay.

Hi Itay,

I've got some clarifications from Arlin!
The changes are for v2.0 scm/ucm providers (ofa-v2-*) only and not v1.2 (OpenIB-*). For connectx (mlx4) you should use ofa-v2-mlx4_0-1 (scm) or ofa-v2-mlx4_0-1u (ucm).

set DAPL_DBG_TYPE=0x21 to see the util messages (query_hca) that show the SL and PKEY values.

You should see the following message among others:

cstnh-9:4839: query_hca:(b0.0) eps 260032, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 0 p_idx 0 sl 1

I hope this helps!

Regards!
Dmitry

Ok, it finally works.

However the pkey_query is network order and the consumer variable is assumed host order as quoted from Arlin.

e.g. pkey 0x8002 need to be set as DAPL_IB_PKRY=0x0280.

Arlin issued a patch to fix this.

Dmitry, thanks for your help.

Itay.

Leave a Comment

Please sign in to add a comment. Not a member? Join today