Enabling IP over InfiniBand* on the Intel® Xeon Phi™ Coprocessor

Introduction

InfiniBand (IB) networking offers a high throughput and low latency. To use IB, network applications must use IB verb APIs. Traditional IP network applications cannot run on an IB network directly, hence you must specify a layer for encapsulating and transmitting IP packets over InfiniBand (IPoIB) networks so those IP applications still run without modifying the code. Note that since IPoIB emulates the IP layer, the performance of the IP applications is lower than if they were written to use InfiniBand natively.

This document describes how to configure the IPoIB layer on IB systems equipped with Intel® Xeon Phi™ coprocessors. You will need the Intel® Manycore Platform Software Stack (Intel® MPSS) to work with the coprocessors. In addition, you will also need to install OpenFabrics Enterprise Distribution* (OFED*) software to configure IPoIB.

The IPoIB driver in the OFED stack allows TCP/IP applications to run over the IB network. This driver implements IP over the IB protocol (RFC4391, RFC4392, and RFC4755). The IPoIB driver supports two operation modes: Unreliable Datagram (UD) or Reliable Connected (RC) mode. The UD mode matches the IP protocol, which is also an unreliable datagram. In the RC mode, a connection must be established before the transmission can start.

Preparation

The following tests were run on two systems equipped with the Intel® Xeon® processor E5-2670 2.6 GHz and two Intel Xeon Phi coprocessors 7120 connected to each host. Both systems were running Red Hat Enterprise Linux* 64-bit 6.6 (kernel 2.6.32-504). On each system, a Mellanox ConnectX*-3 VPI IB adapter was installed into a PCIe* slot. The ports of the adaptors were connected directly with no intervening switch.

After the system is rebooted, verify that the Mellanox ConnectX-3 VPI IB adapter is properly identified:

# lspci | grep Mellanox
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

On both systems (called knightscorner4 and kinghtscorner5), I downloaded and installed Intel MPSS 3.6.1 from the Intel MPSS download page and the OFED stack OFED-3.18-1 from the open source OpenFabrics Alliance.

Instructions for installing the OFED stack are included in the “Intel® MPSS User’s Guide,” Section 3.6. After the OFED stack is installed successfully, you can verify the version of the OFED installed:

# ofed_info -s
OFED-3.18-1:

The following instructions (see the “Intel® MPSS User Guide,” Section 3.6.9) are necessary in order to bring OFED up.

As root, bring the MPSS service up:

# service mpss start
Loading MIC module:                                        [  OK  ]

 

Configuring IPoIB on the Intel® Xeon® Processor Host and Intel Xeon Phi Coprocessor

This section shows which configuration files need to be modified to enable the IPoIB interface.

In this example, I create four IPoIB nodes on the subnet 192.168.100.0/255. The host knightscorner4 and the connected coprocessor knightscorner4-mic0 are assigned the IP addresses 192.168.100.1 and 192.168.100.100 respectively. Similarly, host knightscorner5 and the connected coprocessor knightscorner5-mic0 are assigned the IP addresses 192.168.100.2 and 192.168.100.200 respectively.

You can use a configure file to configure the IPoIB interface on the hosts.

To configure the device ib0 in knightscorner4, edit the /etc/sysconfig/network-scripts/ifcfg-ib0 host configuration file. The following configuration shows that the IP address 192.168.100.1 is assigned to the ib0 device on the host.

[knightscorner4 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
TYPE=InfiniBand
UUID=296da2e8-6193-4ec2-9122-4a7ca1f3fcc0
ONBOOT=yes
BOOTPROTO=none
NETWORK=192.168.100.0
NETMASK=255.255.255.0
IPADDR=192.168.100.1

On host knightscorner4, edit the /etc/mpss/ipoib.conf coprocessor configuration file to configure the interface ib0 on mic0. The ib0 interface on knightscorner4-mic0 is assigned the IP address 192.168.100.100:

[knightscorner4 ~]# cat /etc/mpss/ipoib.conf
# to start ipoib on the mic automatically, uncomment the following
#
ipoib_enabled=yes
#
# to assign ip addresses to ib devices on the mic, specify the ip address
# using the following example for setting ib0 on mic0 address
#
mic0_ib0=192.168.100.100
#
# if netmask needs to tbe set or other ifconfig option, add them to
# the ip address (quoted)
#
# mic0_ib1="192.168.100.101 netmask 255.255.0.0"
#
# to pass options to ib_ipoib module on the mic, use the following line
#
# ipoib_parms="send_queue_size=2048 recv_queue_size=4096"

Similarly, to configure the device ib0 on host knightscorner5, edit the /etc/sysconfig/network-scripts/ifcfg-ib0 host configuration file. A typical configuration file for the device ib0 looks similar to the one below. This configuration file assigns the IP address 192.168.100.2 to device ib0:

[knightscorner5 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
TYPE=InfiniBand
UUID=a371c52b-fa6c-4666-b060-ec04ceaa2382
ONBOOT=yes
BOOTPROTO=none
NETWORK=192.168.100.0
NETMASK=255.255.255.0
IPADDR=192.168.100.2

On host knightscorner5, edit the file /etc/mpss/ipoib.conf to configure the interface ib0 on mic0. The ib0 interface on knightscorner5-mic0 is assigned the IP address 192.168.100.200:

[knightscorner5 ~]# cat /etc/mpss/ipoib.conf
# to start ipoib on the mic automatically, uncomment the following
#
ipoib_enabled=yes
#
# to assign ip addresses to ib devices on the mic, specify the ip address
# using the following example for setting ib0 on mic0 address
#
mic0_ib0=192.168.100.200
#
# if netmask needs to tbe set or other ifconfig option, add them to
# the ip address (quoted)
#
# mic0_ib1="192.168.100.101 netmask 255.255.0.0"
#
# to pass options to ib_ipoib module on the mic, use the following line
#
# ipoib_parms="send_queue_size=2048 recv_queue_size=4096"

Bringing the OFED Stack Up

After changing the configuration files to enable the IPoIB protocol, you can bring the OFED stack up on both knightscorner4 and knightscorner5.

 

1. First, start the OFED stack on the host systems:

# service openibd start
Loading HCA driver and Access Layer:                       [  OK  ]

 

You may verify the IPoIB driver is now loaded:

[knightscorner5 ~]# lsmod | grep ib_ipoib
ib_ipoib               80814  0
ib_cm                  36932  3 rdma_cm,ib_ipoib
. . . . . . . . .

 

2. Start the IB subnet manager to configure the fabric:

# service opensmd start
Starting IB Subnet Manager.                                [  OK  ]

 

At this point, the interface ib0 is created. To verify the operation mode of the interface ib0 in your system, type the following command:

# cat /sys/class/net/ib0/mode
connected

You can change the mode to datagram by typing:

# echo datagram > /sys/class/net/ib0/mode

Or switch to connected mode by typing:

# echo connected > /sys/class/net/ib0/mode

The default mode can be configured by editing the SET_IPOIB_CM parameter in /etc/infiniband/openib.conf. That is, setting SET_IPOIB_CM=yes will set the default mode to connected.

To check whether the network interface ib0 on host knightscorner5 is available:

[knightscorner5 ~]# ifconfig ib0
Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib0       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
    inet addr:192.168.100.2  Bcast:192.168.100.255 Mask:255.255.255.0
    inet6 addr: fe80::f652:1403:7d:2b91/64 Scope:Link
    UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
    RX packets:56 errors:0 dropped:0 overruns:0 frame:0
    TX packets:59 errors:0 dropped:10 overruns:0 carrier:0
    collisions:0 txqueuelen:256
    RX bytes:3692 (3.6 KiB)  TX bytes:5008 (4.8 KiB)

 

To show the complete MAC address of the network, use the following command:

[root@knightscorner5 ~]# ip addr show ib0
11: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 256
    link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7d:2b:91 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.100.2/24 brd 192.168.100.255 scope global ib0
    inet6 fe80::f652:1403:7d:2b91/64 scope link
       valid_lft forever preferred_lft forever

3. Start the ofed-mic service. This also loads the ib_ipoib driver on the Intel Xeon Phi coprocessor:

# service ofed-mic start
Starting OFED Stack:
host                                                       [  OK  ]
mic0 : ib0                                                 [  OK  ]
mic1                                                       [  OK  ]

Verify that the 192.168.100 subnet is configured and routed through the interface ib0 on the host:

[knightscorner5 ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.100.0   *               255.255.255.0   U     0      0        0 ib0
10.23.3.0       *               255.255.255.0   U     0      0        0 eth0
172.31.1.0      *               255.255.255.0   U     0      0        0 mic0
192.0.2.0       *               255.255.255.0   U     0      0        0 mic0
172.31.2.0      *               255.255.255.0   U     0      0        0 mic1
link-local      *               255.255.0.0     U     1002   0        0 eth0
link-local      *               255.255.0.0     U     1004   0        0 mic0
link-local      *               255.255.0.0     U     1005   0        0 mic1
link-local      *               255.255.0.0     U     1010   0        0 ib0
default         jf311-lfw-a_vl5 0.0.0.0         UG    0      0        0 eth0

 

Verify that the 192.168.100 subnet is configured and routed through the interface ib0 on the coprocessor:

[knightscorner5 ~]# ssh mic0 route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.100.0   *               255.255.255.0   U     0      0        0 ib0
172.31.1.0      *               255.255.255.0   U     0      0        0 mic0
192.0.2.0       *               255.255.255.0   U     0      0        0 mic0
default         host            0.0.0.0         UG    0      0        0 mic0

You may have to bring the interface ib0 down and back up if necessary:

# ifconfig ib0 down
# ifconfig ib0 up
# ifconfig ib0

Verification

To verify that IPoIB is working, you can use the ping utility to ping to all IPoIB devices. For example, from host knightscorner5, you can pingknightscorner4 (192.168.100.1), knightscorner4-mic0 (192.168.100.100), and knightscorner5-mic0 (192.168.100.200):

[knightscorner5 ~]# ping -c 3 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data.
64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=0.142 ms
64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=0.176 ms
64 bytes from 192.168.100.1: icmp_seq=3 ttl=64 time=0.178 ms

--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.142/0.165/0.178/0.019 ms

 

[knightscorner5 ~]# ping -c 3 192.168.100.100
PING 192.168.100.100 (192.168.100.100) 56(84) bytes of data.
64 bytes from 192.168.100.100: icmp_seq=1 ttl=64 time=14.0 ms
64 bytes from 192.168.100.100: icmp_seq=2 ttl=64 time=2.24 ms
64 bytes from 192.168.100.100: icmp_seq=3 ttl=64 time=0.943 ms

--- 192.168.100.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.943/5.761/14.094/5.916 ms

 

[knightscorner5 ~]# ping -c 3 192.168.100.200
PING 192.168.100.200 (192.168.100.200) 56(84) bytes of data.
64 bytes from 192.168.100.200: icmp_seq=1 ttl=64 time=18.9 ms
64 bytes from 192.168.100.200: icmp_seq=2 ttl=64 time=7.56 ms
64 bytes from 192.168.100.200: icmp_seq=3 ttl=64 time=5.89 ms

--- 192.168.100.200 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2008ms
rtt min/avg/max/mdev = 5.896/10.817/18.994/5.822 ms

Similarly, from coprocessor knightscorner5-mic0, you can pingknightscorner4 (192.168.100.1), knightscorner4-mic0 (192.168.100.100), and knightscorner5 (192.168.100.2):

[knightscorner5 ~]# ssh knightscorner5-mic0
[knightscorner5-mic0 ~]# ping -c 3 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data.
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=4.89 ms
64 bytes from 192.168.100.1: icmp_req=2 ttl=64 time=9.99 ms
64 bytes from 192.168.100.1: icmp_req=3 ttl=64 time=9.77 ms

--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2025ms
rtt min/avg/max/mdev = 4.899/8.220/9.991/2.352 ms

 

[knightscorner5-mic0 ~]# ping -c 3 192.168.100.100
PING 192.168.100.100 (192.168.100.100) 56(84) bytes of data.
64 bytes from 192.168.100.100: icmp_req=1 ttl=64 time=14.8 ms
64 bytes from 192.168.100.100: icmp_req=2 ttl=64 time=9.99 ms
64 bytes from 192.168.100.100: icmp_req=3 ttl=64 time=9.67 ms

--- 192.168.100.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2025ms
rtt min/avg/max/mdev = 9.676/11.496/14.817/2.351 ms

 

[knightscorner5-mic0 ~]# ping -c 3 192.168.100.2
PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data.
64 bytes from 192.168.100.2: icmp_req=1 ttl=64 time=5.11 ms
64 bytes from 192.168.100.2: icmp_req=2 ttl=64 time=9.98 ms
64 bytes from 192.168.100.2: icmp_req=3 ttl=64 time=9.66 ms

--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2025ms
rtt min/avg/max/mdev = 5.115/8.256/9.989/2.224 ms

 

Conclusion

Configuring IPoIB on Intel Xeon Phi coprocessors requires the Intel MPSS stack and OFED stack. This article showed the necessary steps to configure IPoIB and assign IP addresses on two host systems equipped with Intel Xeon Phi coprocessors and connected directly via IB host channel adapters. Finally, a simple test was done to verify that IPoIB is working correctly.

 

References

“Intel® Manycore Platform Software Stack (Intel® MPSS) User’s Guide,” December 2015, Revision 3.6.1

Open Fabrics Enterprise Distribution (OFED) Version 3.18-1

Notices

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

Intel, the Intel logo, Intel Xeon Phi, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


This sample source code is released under the Intel Sample Source Code License Agreement.

 

For more complete information about compiler optimizations, see our Optimization Notice.