infiniband connection host-mic and mic-mic

infiniband connection host-mic and mic-mic

Hi, I'm trying to set up infiniband connection between host and mic, mic and mic. Host is showing this on ifconfig:

mic0:ib: flags=67<UP,BROADCAST,RUNNING>  mtu 64512
        inet 192.0.2.100  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 4c:79:ba:20:06:63  txqueuelen 1000  (Ethernet)

and two: mic0 and mic1 interfaces.

I have two coprocessors installed.  I can run ib_read_bw between host and mic0, but not host and mic1 or mic0 and mic1.  Getting error:  

Received 10 times ADDR_ERROR
Unable to perform rdma_client function.

ifconfig from mic0: 
mic0:ib   Link encap:Ethernet  HWaddr 4C:79:BA:20:06:62  
          inet addr:192.0.2.101  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:64512  Metric:1
iconfig from mic1:
mic0:ib   Link encap:Ethernet  HWaddr 4C:79:BA:20:06:9C  
          inet addr:192.0.2.102  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:64512  Metric:1

 

Do I have to do some configuration setup?

Thanks,
Azamat

6 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

I will try to find and answer and get back to you.

-t

It would be great, Thomas. Thanks a lot. Struggling with this issue for days.

There's an interesting article at http://research.colfaxinternational.com/post/2014/03/11/InfiniBand-for-MIC.aspx that provides configuration information that might help you.

Would you please post your micinfo utility output?

If you are trying to use Xeon Phi with True Scale at this time, there is no IPoIB support on MIC for True Scale at this time. The workaround is to bridge the MIC cards onto the Ethernet network framework.

Support for Intel® True Scale Fabric products can be found through e-mail, the Web Portal, or by calling 888-285-7880 (outside the United States please dial +1 937-449-4279) and a support representative will assist you.

I found these other useful Xeon Phi resources:

Which systems support the Intel® Xeon Phi™ coprocessor?
https://software.intel.com/en-us/articles/which-systems-support-the-inte...

Check out the other Intel® Xeon Phi™ Coprocessor Applications and Solutions
https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-app...

We might want to browse the other developer forums as well.
https://software.intel.com/en-us/forums/intel-many-integrated-core

Also, I want to point out a future Xeon Phi training opportunity.
https://software.intel.com/en-us/blogs/2015/03/04/intel-xeon-phi-coproce...

 

Thomas, following is the information you asked about:

 

[root@new-host ~]# micinfo

MicInfo Utility Log

Created Tue Mar  3 23:54:58 2015

              System Info

                     HOST OS                  : Linux

                     OS Version                 : 3.10.0-123.el7.x86_64

                     Driver Version             : 3.4.2-1

                     MPSS Version               : 3.4.2

                     Host Physical Memory : 7911 MB

Device No: 0, Device Name: mic0

              Version

                     Flash Version              : 2.1.02.0390

                     SMC Firmware Version              : 1.16.5078

                     SMC Boot Loader Version    : 1.8.4326

                     uOS Version                : 2.6.38.8+mpss3.4.2

                     Device Serial Number : ADKC31600817

              Board

                     Vendor ID                  : 0x8086

                     Device ID                  : 0x225e

                     Subsystem ID               : 0x2500

                     Coprocessor Stepping ID     : 3

                     PCIe Width                 : x16

                     PCIe Speed                 : 5 GT/s

                     PCIe Max payload size : 256 bytes

                     PCIe Max read req size            : 512 bytes

                     Coprocessor Model    : 0x01

                     Coprocessor Model Ext             : 0x00

                     Coprocessor Type     : 0x00

                     Coprocessor Family   : 0x0b

                     Coprocessor Family Ext            : 0x00

                     Coprocessor Stepping              : B1

                     Board SKU                  : B1PRQ-31S1P

                     ECC Mode                   : Enabled

                     SMC HW Revision      : Product 300W Passive CS

              Cores

                     Total No of Active Cores : 57

                     Voltage                     : 1088000 uV

                     Frequency                  : 1100000 kHz

              Thermal

                     Fan Speed Control    : N/A

                     Fan RPM                    : N/A

                     Fan PWM                    : N/A

                     Die Temp                   : 57 C

              GDDR

                     GDDR Vendor          : Elpida

                     GDDR Version               : 0x1

                     GDDR Density               : 2048 Mb

                     GDDR Size                  : 7936 MB

                     GDDR Technology                   : GDDR5

                     GDDR Speed                 : 5.000000 GT/s

                     GDDR Frequency                    : 2500000 kHz

                     GDDR Voltage               : 1501000 uV

Device No: 1, Device Name: mic1

              Version

                     Flash Version              : 2.1.02.0390

                     SMC Firmware Version     : 1.16.5078

                     SMC Boot Loader Version  : 1.8.4326

                     uOS Version                : 2.6.38.8+mpss3.4.2

                     Device Serial Number     : ADKC31600846

              Board

                     Vendor ID                : 0x8086

                     Device ID                : 0x225e

                     Subsystem ID               : 0x2500

                     Coprocessor Stepping ID  : 3

                     PCIe Width                 : x1

                     PCIe Speed                 : 5 GT/s

                     PCIe Max payload size    : 128 bytes

                     PCIe Max read req size   : 512 bytes

                     Coprocessor Model          : 0x01

                     Coprocessor Model Ext     : 0x00

                     Coprocessor Type     : 0x00

                     Coprocessor Family   : 0x0b

                     Coprocessor Family Ext            : 0x00

                     Coprocessor Stepping              : B1

                     Board SKU                  : B1PRQ-31S1P

                     ECC Mode                   : Enabled

                     SMC HW Revision      : Product 300W Passive CS

              Cores

                     Total No of Active Cores : 57

                     Voltage                    : 1085000 uV

                     Frequency                  : 1100000 kHz

              Thermal

                     Fan Speed Control    : N/A

                     Fan RPM                    : N/A

                     Fan PWM                    : N/A

                     Die Temp                 : 60 C

              GDDR

                     GDDR Vendor          : Elpida

                     GDDR Version               : 0x1

                     GDDR Density               : 2048 Mb

                     GDDR Size                  : 7936 MB

                     GDDR Technology           : GDDR5

                     GDDR Speed                 : 5.000000 GT/s

                     GDDR Frequency            : 2500000 kHz

                     GDDR Voltage               : 1501000 uV

ib interface on the host:

mic0:ib: flags=67<UP,BROADCAST,RUNNING>  mtu 64512

       inet 192.0.2.100  netmask 255.255.255.0  broadcast 0.0.0.0

       ether 4c:79:ba:20:06:63  txqueuelen 1000  (Ethernet)

ib interface on mic0:

mic0:ib   Link encap:Ethernet  HWaddr 4C:79:BA:20:06:62  

         inet addr:192.0.2.101  Bcast:0.0.0.0  Mask:255.255.255.0

         UP BROADCAST RUNNING  MTU:64512  Metric:1

ib interface on mic1:

mic0:ib   Link encap:Ethernet  HWaddr 4C:79:BA:20:06:9C  

         inet addr:192.0.2.102  Bcast:0.0.0.0  Mask:255.255.255.0

         UP BROADCAST RUNNING  MTU:64512  Metric:1

 

 

Thanks for helping me out.

发表评论

登录添加评论。还不是成员?立即加入