MPSS True Scale support on RHEL 6.5

MPSS True Scale support on RHEL 6.5

I am confused about OFED and True Scale adapter support in RHEL 6.5. Section 2.2 of the Users Guide says that native Phi applications running on True Scale systems are only supported through 6.3, but later on that same page it says that Infiniband support for RHEL 6.5 is provided by the distribution packages. What is special about 6.5 that it is singled out there?  I can get non-phi applications to use Infiniband using the distribution packages in any recent version of RHEL. If there is something relevant to MPSS/Phi then what is it and how do I configure it?

More generally I am trying to figure out how to get all the modes of operation working for our Phi cards and have been running into  problems, mostly with native application trying to use MPI. Our compute only nodes have Mellanox cards and we have Mellanox switches, but the Phi nodes that we got came with Qlogic cards that need the ib_qib hardware driver. Using the RHEL 6.5 provided driver we haven't been able to get MPI on the cards to talk to the host or other cards and compiling the MPSS ofed drivers doesn't include the ib_qib module. Can we make this work or do I need to move back to a 6.3 kernel? Also, Is this incompatibility going to be fixed in MPSS any time soon and how would we monitor the status of these kinds of bugs?

 

Thanks,

Mike Robbert

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Let ask around and get back to you. 

Which version of the MPSS are you using? 

I currently have MPSS 3.1.2 installed. I see that 3.1.4 is out, but I didn't see anything in the changelog nor documentation to indicate a change to this.

Mike

Hi Michael, 

I have some answers: 

Intel Xeon Phi coprocessor supports 3 execution modes: Native, Offload and Symmetric. 

Section 2.2 indicates that, with MPSS 3.1, True Scale supports both Native and Offload modes through 6.3. But supports only Offload in 6.4 and 6.5. I was told that it had to do something with PSM (MPI interface for True Scale) that hasn't been qualified with 6.4 and 6.5. 

In offload, the coprocessor does not use True Scale. However, 6.5 is singled out to indicate that it is supported by the latest Intel OFED version and can be used for IB communication between compute nodes in a typical True Scale IB fabric while using the coprocessor for offload codes. 

Since only certain execution modes are supported for a given RHEL version, using unsupported modes may encounter issues. 

To summarize, RHEL 6.3 supports native and offload modes but does not support symmetric mode. RHEL 6.4 and 6.5 support only offload and cannot run native or symmetric modes. 

Let me get back to you with updates on the anticipated timelines for the fixes. 

Hi Michael, 

Another update for you: With MPSS 3.2, which is schedule to be released soon, users will be able to run both offload and native modes on True Scale for RHEL 6.5. 

I see that MPSS 3.2 is out, but the documentation is still pretty confusing on what is supported. There is still a note in section 2.2 that says native applications over True Scale Infiniband are only supported in RHEL 6.1-6.3. The last comment in this thread indicates that 6.5 was supposed to be supported. There is a new table in section 2.1 (Table 2) that indicates Scif support is available with IFS OFED, but not ccl-proxy. Does Scif map to native mode and ccl-proxy to symmetric mode or am I way off on that thinking? 

I'm also wondering if it will ever be possible to use the OFA generic OFED (1.5.4.1) with QLogic adapters within the MPSS framework? I believe that I've seen that the hardware drivers (ib_qib) do ship with  generic OFED, but the build scripts for MPSS specifically disable them. 

Thanks,

Mike

Hi Mike, am trying to get complete answers to your questions, bear with us....and thank you for the frank feedback

 

0) The OFED section in the user's guide is confusing. 

    We agree  and will fix this.

1) Rhel 6.1-6.3 supposedly only supported by IFS.

   For True Scale we support the same set of Operating systems as with Mellanox; the user guide needs to be fixed to reflect that.

2) Confusion about the table, scif, and ccl-proxy support.

we will work on some nomenclature before the table to help clear this up.

3) Does Intel/Truescale work with OFED-1.5.4.1?

One of our experts assumed that a user could:

                a) install ofed/*rpm psm/*rpm

                a) set QIB_LOAD=yes in /etc/infiniband/openib.conf

 

And use OFED-1.5.4.1 with MPSS and Intel/QIB hardware.  Looking at the User's guide these instructions are not enumerated.   We are checking if this combination is supported and supportable.

 

So does MPSS 3.2 support native MPI applications with True Scale under RHEL 6.5?

Hi Andrey,

the answer is yes.

Login to leave a comment.