Xeon Phi + OFED questions

Xeon Phi + OFED questions

imagem de Taras Shapovalov

Hello,

By some reason, kernel modules, built from intel-mic-ofed-kmod source package, don't want to load on CentOS 6.3, for example:

[root@node001 ~]# modprobe ib_umad
FATAL: Error inserting ib_umad (/lib/modules/2.6.32-279.22.1.el6.x86_64/updates/drivers/infiniband/core/ib_umad.ko): Unknown symbol in module, or unknown parameter (see dmesg)

It seems CentOS 6.3 and intel-mic-ofed-kmod sources are not compatible (or I am doing something wrong). We are using the latest publicly available MPSS stack (Update 1) and we build intel-mic-ofed-* packages on boot. So, could you please answer (or point me to the correct documentation) the following two questions:

  1. There are at least 3 widly used OFED versions: OFA OFED, Mellanox OFED and QLogic OFED. What exact MPSS versions work with what  OFED versions on what linux distrubutions? I suspect the correct answer changes all the time quickly, but I will be highly approtiated if somebody provides us with at least the current state.
  2. As far as I understand, the main reason to set up OFED on a host is to emulate HCA and allow ib-wise applications to communicate between the host and the card via "infiniband" (using rdma). Is it possible to use several MICs (installed on *different* hosts phisicaly connected to ib-switch) to get all advantages of ib communication between them? For example, it would be nice to run a mvapich2 native application on several MICs (on different hosts) in a cluster using ib only (Probably HCA emulation makes ib communication slower, but I am not sure).
Taras
13 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Frances Roth (Intel)

Quick answer to question 1 - only OFED version 1.5.4.1 is currently supported. I think the version that comes with Red Hat is different. The recommended location to get the file from is OFA: http://www.openfabrics.org/downloads/OFED/ofed-1.5.4/OFED-1.5.4.1.tgz 

imagem de Taras Shapovalov

Hi Frances,

Quote:

Frances Roth (Intel) wrote:

Quick answer to question 1 - only OFED version 1.5.4.1 is currently supported. I think the version that comes with Red Hat is different. The recommended location to get the file from is OFA: http://www.openfabrics.org/downloads/OFED/ofed-1.5.4/OFED-1.5.4.1.tgz 

Thanks for your unswer. We are not using base distribution OFED. We can use QLogic 1.5.4.1, Mellanox 1.5.3 or OFA OFED 1.5.4.1. So, QLogic (Intel True Scale) and OFA versions look good. Probably my first question was not clean enough. I was asking whether QLogic (and OFA) OFED + RHEL6.2 (and 6.3, and aslo 6.4) + MPSS Update1 (since yesterday it is Update2) should work. It would be useful to know the same about SLES11 SP2 and SP3. Our software depends on all 3 versions (OFED + DISTRO + MPSS), therefore we should know what combination is supposed to work properly and what not.

Taras
imagem de Jianxin Xiong (Intel)

The "unknown symbol" error is a result of mismatched kernel symbol versions, please check:

(1) Does the file "/lib/modules/`uname -r`/build/Module.symvers.mic" exist before building from the source rpm? This is to ensure the new modules have the correct symbol versions.

(2) After the building, have you run "sudo service openibd restart" (or just reboot the machine) before trying to load the newly built modules? This is to ensure that the old IB modules (with the wrong symbol versions) have been unloaded. 

Only the OFA OFED 1.5.4.1 is offically supported and that is recommended.

imagem de Taras Shapovalov

Hi Jianxin,

Quote:

Jianxin Xiong (Intel) wrote:

(2) After the building, have you run "sudo service openibd restart" (or just reboot the machine) before trying to load the newly built modules? This is to ensure that the old IB modules (with the wrong symbol versions) have been unloaded. 

Right, openib was not restarted. We build intel-mic-ofed-* packages on a node boot, which means we should restart openib every time after boot (probably from /etc/init.d/ofed-mic). Thanks for the help.

Taras
imagem de Taras

Hello,

Do I understand correctly that now two OFED versions are officially supported for a MIC host: OFA OFED and base distribution OFED in RHEL (in the latest version of MPSS)? All other OFED versions are not supported (and will not work)?

Thanks.

Taras
imagem de Frances Roth (Intel)

The current state of OFED -

You can replace the OFED from your Linux distribution with OFA OFED 1.5.4.1 from http://www.openfabrics.org/  and add in the MPSS support for OFED. This will give you the ability to directly communicate between a native application on the coprocessor and a Mellanox* InfiniBand Adapter.

You can replace the OFED from you Linux distribution with the OFED for Intel TrueScale InfiniBand adapters and add in the MPSS support for OFED. This will also give you the ability to directly communicated between a native application on the coprocessor and your Intel TrueScale InfiniBand adapter.

You can use the OFED from your LInux distribution (which will NOT allow you to add in any MPSS support for OFED.) Communication from native applications on the coprocessor will go through the regular virtual network to the host before it reaches the InfiniBand adapter.

If you are using RHEL 6.4 and Intel TrueScale Infiniband adapters, you will need to use this last solution. Because you are giving up the ability to communicate directly between a native application and the adapter, this is not the solution you will want in general.

imagem de Taras

Dear Frances,

Thank you for the answer.

> You can use the OFED from your LInux distribution (which will NOT allow you to add in any MPSS support for OFED.)

Could you please give some details what exactly "will NOT allow you to add in any MPSS" means?

Taras
imagem de Christian Simmendinger

Hello --
Qucik question, will there be support for OFED-2.0-3.0.0 at some point in time ?

Thanks

imagem de Frances Roth (Intel)

Taras - Rereading what I wrote, I think I was, perhaps, being too emphatic. After installing OFA OFED or TrueScale, you then install the rpm files from the ofed directory in the MPSS release. If you are using the OFED that came with your Linux distribution, there is no guarantee that those rpm files will install correctly. In the case of RHEL 6.4, you definitely cannot install them and get the direct coprocessor to adapter communication for native applications. I don't know the details of what and why it doesn't work. I can ask the MPSS team for more details.

Christian - that is a question I will need to ask the MPSS team.

imagem de Taras

Dear Frances,

Thanks for the answer.

Quote:

Frances Roth (Intel) wrote:

If you are using the OFED that came with your Linux distribution, there is no guarantee that those rpm files will install correctly. In the case of RHEL 6.4, you definitely cannot install them and get the direct coprocessor to adapter communication for native applications.

I am a bit confused now. According http://registrationcenter.intel.com/irc_nas/3529/readme-en.txt :

"Infiniband support for RHEL 6.4 is provided through the RDMA/infiniband packages that come wih the distribution."

This is said in the section "3.2 Steps to Install Intel(R) MPSS with OFED Support using Intel(R) True Scale InfiniBand Adapters". Maybe I understand incorrectly "Infiniband support"? Could you please explain again: should I install base distribution OFED (as said in the readme_en.txt) on RHEL6.4 with:

a. True Scale (QLogic) HCA

b. Mellanox HCA

Quote:

Frances Roth (Intel) wrote:

I don't know the details of what and why it doesn't work. I can ask the MPSS team for more details.

Yes, I will approtiate any details. If it does not work it will be useful to know when it will work.

Thanks.

Taras
imagem de Christian Simmendinger

Hello Frances -

Any update on the MPSS OFED 2-0 support for ?
Thanks !

Christian

 

 

 

imagem de Taylor Kidd (Intel)

Hi Christian,

I'm trying to find a more definitive answer of you. Since things are relatively quiet around here due to the US holiday, you probably won't receive an answer until the later half of next week.

Regards
--
Taylor

 

Faça login para deixar um comentário.