Understanding the InfiniBand Subnet Manager

Published:01/28/2010   Last Updated:01/28/2010

The InfiniBand subnet manager (OpenSM) assigns Local IDentifiers (LIDs) to each port connected to the InfiniBand fabric, and develops a routing table based off of the assigned LIDs.

There are two types of subnet managers, software based and hardware based. Hardware based subnet managers are typically part of the firmware of the attached InfiniBand switch. A software subnet manager is not necessary if a hardware based subnet manager is active.

A typical InfiniBand installation using the OFED package will run the OpenSM subnet manager at system start up after the OpenIB drivers are loaded. This automatic OpenSM is resident in memory, and sweeps the InfiniBand fabric approximately every 5 seconds for new InfiniBand adapters to add to the subnet routing tables. This usage will be sufficient for most installations, and can be controlled using the following commands:

/etc/init.d/opensmd start
/etc/init.d/opensmd stop
/etc/init.d/opensmd restart
/etc/init.d/opensmd status

There are several instances where the default usage will not be sufficient, however. If the head node is used as a compute node, and resources are at a premium, the OpenSM subnet manager can be set to run once, configure the LIDs and routing tables, and then exit:

opensm –o

For InfiniBand adapters with two ports, a second instance of the subnet manager must be active to enable a subnet on the second port.  To begin, enable the subnet manager as above:

/etc/init.d/opensmd start

Next, discover the GUID of the second port:

ibstat –p

This command will output two numbers, one for each port. Use the second number to start up a new OpenSM instance in daemon mode:

opensm –g <0xguid number> -B

There may also be an instance where the head node does not have InfiniBand hardware, but the compute nodes do. In this case, provided a hardware subnet manager is not used, one of the compute nodes must act as the subnet manager.

If there is already a subnet manager is running on the cluster, either a hardware based version or an OpenSM instance, then running OpenSM on another node will cause the new instance to be put in a STANDBY state. In this state, the instance listens for the existing OpenSM instance to fail, and will take over subnet manager duties once a failure state has been detected.



Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804