Understanding the InfiniBand Subnet Manager

The InfiniBand subnet manager (OpenSM) assigns Local IDentifiers (LIDs) to each port connected to the InfiniBand fabric, and develops a routing table based off of the assigned LIDs.

There are two types of subnet managers, software based and hardware based. Hardware based subnet managers are typically part of the firmware of the attached InfiniBand switch. A software subnet manager is not necessary if a hardware based subnet manager is active.

A typical InfiniBand installation using the OFED package will run the OpenSM subnet manager at system start up after the OpenIB drivers are loaded. This automatic OpenSM is resident in memory, and sweeps the InfiniBand fabric approximately every 5 seconds for new InfiniBand adapters to add to the subnet routing tables. This usage will be sufficient for most installations, and can be controlled using the following commands:

/etc/init.d/opensmd start
/etc/init.d/opensmd stop
/etc/init.d/opensmd restart
/etc/init.d/opensmd status

There are several instances where the default usage will not be sufficient, however. If the head node is used as a compute node, and resources are at a premium, the OpenSM subnet manager can be set to run once, configure the LIDs and routing tables, and then exit:

opensm –o

For InfiniBand adapters with two ports, a second instance of the subnet manager must be active to enable a subnet on the second port.  To begin, enable the subnet manager as above:

/etc/init.d/opensmd start

Next, discover the GUID of the second port:

ibstat –p

This command will output two numbers, one for each port. Use the second number to start up a new OpenSM instance in daemon mode:

opensm –g <0xguid number> -B

There may also be an instance where the head node does not have InfiniBand hardware, but the compute nodes do. In this case, provided a hardware subnet manager is not used, one of the compute nodes must act as the subnet manager.

If there is already a subnet manager is running on the cluster, either a hardware based version or an OpenSM instance, then running OpenSM on another node will cause the new instance to be put in a STANDBY state. In this state, the instance listens for the existing OpenSM instance to fail, and will take over subnet manager duties once a failure state has been detected.



For more complete information about compiler optimizations, see our Optimization Notice.