Intel MPSS 3.3 release features
The Intel® Manycore Platform Software Stack (Intel® MPSS) version 3.3 was released on 14 July, 2014. This page lists the prominent features in this release.
- Support for non-blocking version of scif_connect
- Support for querying MYO capabilities
- Performance enhancement by allowing COIBufferRelease to be run from the host
- Support for multiple Intel® Xeon Phi™ coprocessors with a single Intel(R) True Scale HCA
- Symmetric mode support with Intel® True Scale HCA
- Intel® MPSS now supports Mellanox* OFED 2.1 and 2.2 with the Mellanox* HCA
- Support for IPoIB on the Intel® Xeon Phi™ coprocessor with Mellanox* HCAs
- Support for RDMA communication between two Intel® Xeon Phi™ coprocessors for cluster and storage traffic with Mellanox* HCA
- Microsoft® Windows host OS
- MPSS Tools
- Reduced number of localizations for micsmc for consistency with the rest of Intel® MPSS
- micctrl’s new verbose options
- miccheck now includes a test to verify the version of the SMC firmware
- Improvement to micsmc's -ecc option
- Enhancements to libmicmgt’s APIs for errors reported by the RAS module running on the Intel® Xeon Phi™ coprocessor
Note: Please check the release notes for late-breaking errata for new features introduced in this release of MPSS.
- Intel® MPSS adding support for RHEL 7.0: Support for RHEL 7.0 will be added with the Intel® MPSS 3.3 release. Please note that in alignment with Red Hat’s move to the systemd/systemctl model with RHEL 7.0, the recommendation is that users replace the use of the ‘service’ command with the use of the ‘systemctl’ command when using Intel® MPSS installed on a RHEL 7.0 host.
- Intel® MPSS discontinuing support for RHEL 6.0 and 6.1: Support for RHEL 6.0 and 6.1 will be discontinued with the Intel® MPSS 3.3 release. The recommendation is to move to a recent version of RHEL 6.x to ensure continued support. Please note that Intel® MPSS 3.4 will discontinue support for RHEL 6.2.
- A single distribution-agnostic tarball for Linux*: Previous Intel® MPSS 3.X releases provided distribution-branded tarballs (for example: mpss-3.2-rhel-6.4.tar) for the stack, including the user-space rpms. Starting with this release a single distribution-agnostic tarball will be available, which will include kernel-space rpms that are precompiled for specific kernel versions. (As with previous releases, kernel source rpms will be available to support other kernel versions). The new tarball is called “mpss-3.3-linux.tar”.
- Support for a non-blocking version of scif_connect(): This feature allows scif_connect() to be non-blocking (similar to a non-blocking socket connect() system call) which allows applications to establish dynamic connections instead of during startup. Dynamic connections mean establishing a connection between two sides only when needed, not in advance. In such a case, the situation where both sides decide to connect each other at the same time (head-to-head situations) is quite likely. When this happens, both sides call scif_connect() and block without being able to accept each other’s connection request (via scif_accept()) resulting in a deadlock.
Turning a blocking scif_connect to a non-blocking is achieved by setting the endpoint using the fcntl() system call. The file descriptor needed for fcntl() can be obtained via scif_get_fd() and the non-blocking flag can be set as follows:
fcntl(fd, F_SETFL, O_NONBLOCK)
When set to non-blocking mode, the caller can call scif_connect() in a loop as long as the errno returned is EINPROGRESS. Alternatively, the caller of scif_connect() can call the poll or select system call with POLLOUT as the requested event. Calling scif_connect() again will then return either 0, for a successful connection or -1 with the errno set appropriately.
- New API to query capabilities/feature availability in the MYO library: This release introduces a new API, myoiSupportsFeature(), that supports the ability to query the MYO installation, at runtime, to determine if a specific feature is supported. The expected usage of this API is for clients of MYO (for example: compilers) to use features of MYO depending on their availability. For additional information see, /usr/include/myoimpl.h and /usr/include/myotypes.h, both part of the mpss-myo-dev-* RPM.
- Allow COIBufferRelease to be run from host side: In the Intel® MPSS 3.2 release the buffer reference count system in COI was rewritten to provide improved performance to the already included API’s used to manage reference counts on COIBuffers.
Building on this new infrastructure, two new API’s, COIBufferAddRefcnt() and COIBufferReleaseRefcnt(), are now available to provide the user the ability to modify buffer reference counts per COIProcess from the host process instead of requiring the user to modify these counts from the sink process. These new API’s allow buffer reference count manipulation to be processed without incurring the overhead of extra communication to or from the host which greatly increases the performance of these operations while still maintaining the same functionality as their sink side AddRef/ReleaseRef counterparts.
Additional details and a description of the APIs is available in the header file /usr/include/intel-coi/source/COIBuffer_source.h
- Support multiple Intel™ Xeon Phi™ coprocessors with a single Intel® True Scale HCA: This release adds support for using multiple Intel® Xeon Phi™ coprocessors with one or more Intel®True Scale HCAs. Previous MPSS releases required the system to contain one Intel® True Scale HCA for every Intel® Xeon Phi™ coprocessor.
Starting with this release, software will automatically pair Intel Xeon Phi cards with any available Intel® True Scale HCAs to best use the available hardware in the server. If the server contains multiple Intel® Xeon Phi™ coprocessors and multiple Intel® True Scale HCAs, the software will pair the coprocessors with a dedicated HCA (taking NUMA locality into consideration) in order to provide best resource spreading and performance. If, on the other hand, the server contains only one Intel® True Scale HCA, all Intel Xeon Phi coprocessors will be paired with that HCA. This will limit the available HW resources but will allow both Phi cards to be used in native mode.
This feature is enabled by default and cannot be disabled. Every Intel® Xeon Phi™ coprocessor is assigned to one of the available Intel® True Scale HCAs. However, dynamic re-assignment of HCAs and resources is not supported. Once an Intel Xeon Phi coprocessor is paired with a True Scale HCA during driver initialization, this pairing is static and maintained until the device drivers are unloaded.
The above characteristic requires that any administrative changes to HCA availability will have to be preceded by stopping the "ofed-mic" and "openibd" services and restarting them after the changes have been done.
- Symmetric mode support added with Intel® True Scale HCA: Symmetric mode is now supported by Intel® True Scale PSM. MPI jobs using PSM are now capable of running with ranks located on any combination of the Intel® Xeon and/or the Intel® Xeon Phi™ coprocessors. SCIF is used to establish connections and communicate across memory domains using memory mapped IO and, for large messages, using the DMA engine on the Intel® Xeon Phi™ coprocessor.
With MPI, symmetric mode jobs are most flexibly specified using mpirun’s MPMD syntax. For example, to run one rank on the host processor and one rank on attached Intel® Xeon Phi™ coprocessor called “localhost-mic0”, the mpirun command might look like:
mpirun –np 1 –host localhost program.host : -np 1 –host localhost-mic0 program.mic
The maximum number of PSM processes supported across one system is limited by the number of Intel® True Scale hardware contexts (16). Context sharing works, but a context can only be shared among processes in a single memory domain. For example, 4 processes within a single Intel® Xeon Phi™ coprocessor can share a context, but two processes on separate coprocessors cannot share the same context.
- Intel® MPSS now supports Mellanox* OFED 2.1 and 2.2 with the Mellanox* HCA Based on requests from customers and OEMs, Intel® MPSS now supports Mellanox OFED 2.1 as an option. In addition to a few patches in Mellanox* OFED, there are new source RPMs that can be compiled against this version of OFED to add the new CCL host side drivers and daemon for Intel® Xeon Phi™ support.
- Support for IPoIB on the Intel® Xeon Phi™ coprocessor with Mellanox* HCAs: This feature uses kernel mode client access to CCL-Direct to enable IP over IB (IPoIB) on the Intel® Xeon Phi™ using the Infiniband HCA attached to the host as a PCIe peer device. The IPoIB module is not enabled by default, but can be started by:
- Manually load/unloading the ib_ipoib on the Intel® Xeon Phi™ coprocessor or,
- Modifying /etc/mpss/ipoib.conf on the host prior to starting the OFED MIC service
- Support for RDMA communication between two Intel® Xeon Phi™ coprocessors for cluster and storage traffic with Mellanox* HCA: Current versions of Intel® MPSS support a feature called CCL-Direct that allows user mode clients, like MPI, to use RDMA communication between coprocessors on two different host systems (or nodes in a cluster). Support for kernel mode clients such as IPoIB (see above), SCSI RDMA protocol (SRP) and Lustre LNET to use similar capabilities on the Intel® Xeon Phi™ coprocessor has now been enabled via kernel mode CCL-Direct.
- Support for relocatable MSI for Microsoft® Windows hosts: Prior to this release, changing the target location path for the Intel® MPSS installation package to anything other than the default path (C:\Program Files\Intel\MPSS\) was not supported. Starting in this release, support has been added to allow changing the target location path for our Windows installation. For example, the installation of Intel® MPSS can be relocated from "C:\Program Files\Intel\MPSS\" (default) to "D:\Intel\MPSS\". Please refer to readme-windows.pdf for details about installation.
- Windows Xeon Phi Entry in Windows Device Manager lists SKU/Model Number: When Intel® MPSS is installed on a Microsoft® Windows host, an entry is made into the "System devices" section of the Windows Device Manager for each coprocessor attached to the host. In previous releases, such entries used to be titled "Intel(R) Xeon Phi(TM)" independent of the SKU/model of the Intel® Xeon Phi™ coprocessor it corresponded to. Starting in this release, the device manager entry will also display the SKU/Model number of the coprocessor in the system, for example "Intel(R) Xeon Phi(TM) coprocessor - 7120P". This change provides consistency with how the other Intel devices are displayed in the Windows Device Manager.
- Reduction in the number of localizations for miscmc to make it consistent with the rest of MPSS: To drive consistency with the rest of MPSS, the number of localizations for the control panel (micsmc) graphical user interface (GUI) was dropped from eight to two (Japanese and Simplified Chinese). It must be noted that this change does not impact the control panel command line interface (CLI) which has never been localized.
The two languages still supported are enabled or disabled by setting the LANG environmental variable on the host or are enabled by default on a localized host. Setting the LANG environmental variable to one of the unsupported languages will result in the micsmc GUI continuing to present all text in English.
- Micctrl’s new verbose options: micctrl added new verbosity options, -vv and -vvv, to the previously available -v option.
They are as follows:
- When no verbosity option is specified, micctrl reports only warnings and errors
- When the -v option is specified, micctrl reports the configuration files that it is accessing and the resulting value of configuration parameters – i.e. “info”
- Using the -vv option (also “-v -v”) additionally reports files that micctrl has changed – i.e. “info” and “filesys”
- Finally, -vvv option (also “-v -v -v”) additionally reports calls that micctrl makes to the host’s networking utilities like ifup, ifdown and brctl – i.e. “info”, “filesys” and “network”
Additionally, all micctrl commands support a --destdir=<destdir> global option. When specified, micctrl prepends all file paths to be accessed with the specified 'destdir' directory path. One use of this feature is to enable system administrators to preview changes which micctrl will make to a particular configuration. This can be achieved by first copying an existing configuration into some destdir directory followed by specifying the chosen destination directory (destdir) as an option to micctrl with the above option. This option also allows the system administrator to create and maintain multiple configurations.
Additional help and details can be found by invoking help via micctrl -h.
- miccheck now includes a test to verify the version of the SMC firmware: The miccheck utility now has an additional test to verify the version of the SMC firmware image of each Intel® Xeon Phi™ coprocessor. This feature was requested by customers and avoids situations where the SMC firmware and the MPSS installation are version mismatched. This test will run by default when the miccheck utility is executed.
The following shows the output of running miccheck when the version of the SMC is correct and when it is not.
user@localhost ~/mpss> miccheck.py ... Copyright 2013 Intel Corporation All Rights Reserved Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... pass Test 2: Check number of devices driver sees in the system ... pass Test 3: Check mpssd daemon is running ... pass Executing default tests for device: 0 Test 4 (mic0): Check device is in online state and its postcode is FF ... pass Test 5 (mic0): Check ras daemon is available in device ... pass Test 6 (mic0): Check running flash version is correct ... pass Test 7 (mic0): Check running SMC firmware version is correct ... pass Executing default tests for device: 1 Test 8 (mic1): Check device is in online state and its postcode is FF ... pass Test 9 (mic1): Check ras daemon is available in device ... pass Test 10 (mic1): Check running flash version is correct ... pass Test 11 (mic1): Check running SMC firmware version is correct ... pass Status: OK user@localhost ~/mpss> miccheck.py ... Copyright 2013 Intel Corporation All Rights Reserved Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... pass Test 2: Check number of devices driver sees in the system ... pass Test 3: Check mpssd daemon is running ... pass Executing default tests for device: 0 Test 4 (mic0): Check device is in online state and its postcode is FF ... pass Test 5 (mic0): Check ras daemon is available in device ... pass Test 6 (mic0): Check running flash version is correct ... pass Test 7 (mic0): Check running SMC firmware version is correct ... fail device SMC firmware version does not match, should be '1.15.5078', it is '1.16.5078'. Executing default tests for device: 1 Test 8 (mic1): Check device is in online state and its postcode is FF ... pass Test 9 (mic1): Check ras daemon is available in device ... pass Test 10 (mic1): Check running flash version is correct ... pass Test 11 (mic1): Check running SMC firmware version is correct ... fail device SMC firmware version does not match, should be '1.15.5078', it is '1.16.5078'. Status: FAIL Failure: A device test failed
- Improvements to micsmc's --ecc option: This feature enhanced how the micsmc command line processes the ECC option and improved its efficiency and response time by executing the requested ECC option for each Intel® Xeon Phi™ coprocessor in parallel. This is done by processing the command for each coprocessor in a separate thread on the host.
The micsmc CLI tool provides an option to set enable ECC on an Intel® Xeon Phi™ coprocessor. Enabling or disabling ECC requires that the coprocessor be in the ready state as it needs to enter maintenance mode to set the required ECC state. When completed, the command resets the device and switches it from maintenance mode back to the ready state. This process takes approximately 30 seconds, and on some occasions it is possible for the device to become unresponsive. Because of this, the micsmc utility also allows the user to specify a timeout value to wait for the device to become ready after exiting maintenance mode. In a system with multiple Intel® Xeon Phi™ coprocessors, the micsmc utility executes these operations serially –switching from ready state to maintenance mode, setting the ECC state, exiting maintenance mode, and wait until the device becomes ready for each device - before proceeding to the next Intel® Xeon Phi™ coprocessor.
While the previous implementation performed the required actions correctly, it was not efficient in systems with a large number of Intel® Xeon Phi™ coprocessors. This enhancement allows the utility to execute the requested operation on all Intel® Xeon Phi™ coprocessors in parallel improving end user experience. Given this change, there are a couple of things to keep in mind:
- While ECC can be enabled or disabled per coprocessor, this enhancement cannot be disabled and is now the default behavior for the micscmc utility.
- Since the implementation uses a thread per coprocessor (for parallel execution on the host) and the commands are executed in parallel on every coprocessor, the output generated by the tool has been changed to a batch style of output, which reports the devices for which ECC is being enabled or disabled, when the operation for each device completes, and the resultant status. A verbose option was also added to micsmc CLI which, in conjunction with the ECC option, outputs additional information, but in an interleaved format.
- Enhancements to libmicmgt’s APIs for errors reported by the RAS module running on the Intel® Xeon Phi™ coprocessor: In the current SW stack, the mic management library (/usr/lib64/libmicmgmt.so*) running on the host interacts with the RAS module on the Intel® Xeon Phi™ coprocessor for certain requests using a protocol over SCIF as the base transport. Given this, an operation could result in a failure due to a number of conditions including SCIF errors, protocol errors (for example: malformed request seen by the RAS module) or errors on the coprocessor itself (for example: failure to read an SMC register). In previous releases, an error returned by the RAS module was reported as a SCIF error (i.e. E_MIC_SCIF_ERROR) and provided a message with the following format:
scif_recv: cmd 0x<command code>: Error 0x<RAS error code>: Len 0x<length>
The APIs included in this library now provide additional and RAS specific error return values by:
- Returning a specific and appropriate error code (for example: E_MIC_RAS_ERROR instead of E_MIC_SCIF_ERROR).
- Providing a human readable error message instead of the RAS error number (for example: "Error: Permission denied" instead of "Error: 0x5").
- Providing a way for the application to retrieve the specific RAS error code.
With this enhancement, the The error messages for API functions that interact with the RAS module will have the following format:
scif_recv: cmd 0x<command code>: <RAS error message> (0x<RAS error code>): Len 0x<length>
Where <RAS error message> can be one of the following errors:
- Invalid command/operation
- Invalid length for operation
- Invalid parameter for operation
- Invalid data block
- Permission denied
- Out of memory
- SMC communication error
- No valid value to report
- Unsupported feature/operation
- Parameter out of range
Additionally, in case of a RAS failure, the error code can be retrieved via mic_get_ras_errno(), see /usr/include/miclib.h in pkg-libmicmgmt*.rpm for details. The overall intent of this change is to allow applications that use the micmgmt library to improve their error handling capabilities given their ability to retrieve specific RAS errors from the coprocessor.
Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.