Last KNC with 7110P

Last KNC with 7110P

imagem de Pierre L.

 Greetings,

I Am running a mixed configuration of 7110P and 5120P processors. Updating the last ones with most rcent KNC_Gold was just simple, worked as a dream. However after installing the same MPSS on first ones, they all stopped working. Environment is based on SLES 11p2, only GBit switch - no IB so no OFED needed.

Did anyone experienced the same ? Any idea to undo and come back to previous stable software stack ?

Thanks in advance for suggestions and help,

Pierre.

7 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Sumedh Naik (Intel)

Hi Pierre, 

Could you please elaborate on what exactly you mean by "they all stopped working". 

I faced a similar problem when I had updated my MPSS. The problem was resolved when I reset the configuration files using "micctrl -initdefaults". 

-Sumedh

imagem de Sumedh Naik (Intel)

Hi Pierre, 

I made a small mistake. I meant to say "micctrl -resetconfig" not "micctrl -initdefaults". 

-Sumedh

imagem de Pierre Lagier

Hi Sumedh,

What we experienced exactly was installation of KNC_gold_update_1-2.1.4982-15-suse-11.2 was straightforward for the 5110P MICs, performance is OK and back to normal operation very quickly. On the 7110P side all was OK with initial setup as well as bootprom and flash. However after last reboot following the second micflash the two 7110P MICs did not come back to ready state anymore. After full power off - cable unplugged - and reboot of the machine, the MICs came back only once, ie. a ready state with micctrl command, but after that MPSS start command ended with time out. From there no way to reset the 7110P and get a ready state but halting the machine to power off and reboot, but still MPSS start ends with timeout.

I would like to come back to KNC_gold-2.1.4346-16-suse-11.2 but since there is no Bootloader update for this version I wonder what will be the compatibility with last KNC version.

Cheers, Pierre.

imagem de Sumedh Naik (Intel)

Hi Pierre, 

The older version of the MPSS  should work with the updated bootloader. So it would be safe to revert back to the Gold MPSS (KNC_gold-2.1.4346).

However, I am still unsure of what is causing the coprocessors to freeze. I'll further investigate this issue and see what I can find.

-Sumedh 

imagem de Pierre L.

Hi Sumedh,

If this can help, what we noticed is after full reboot from poweroff the 7110P are OK and work nicely until first MPSS stop, after that any micctrl - -w command returns "mic0: reset failed". Checking with micinfo, we get:

MicInfo Utility Log

Created Fri Feb 22 10:24:34 2013

    System Info
        Host OS                  : Linux
        OS Version               : 3.0.13-0.27-default
        Driver Version           : NotAvailable
        MPSS Version             : 2.1.4982-15
        Host Physical Memory     : 65944 MB
        CPU Family               : GenuineIntel  Family 6  Model 45  Stepping 7
        CPU Speed                : 1200.000
        Threads per Core         : 1

Device No: 0,  Device Name: Intel(R) Xeon Phi(TM) Coprocessor

    Version
        Flash Version            : NotAvailable
        UOS Version              : NotAvailable
        Device Serial Number     : NotAvailable

    Board
        Vendor ID                : 8086
        Device ID                : 225c
        SubSystem ID             : 2500
        Coprocessor Stepping ID  : f
        PCIe Width               : x0
        PCIe Speed               : Invalid Link Speed
        PCIe Max payload size    : 16384 bytes
        PCIe Max read req size   : 16384 bytes
        Coprocessor Model        : 0x0f
        Coprocessor Model Ext    : 0x0f
        Coprocessor Type         : 0x03
        Coprocessor Family       : 0x0f
        Coprocessor Family Ext   : 0xff
        Coprocessor Stepping     : Undefined
        Board SKU                : NotAvailable
        ECC Mode                 : NotAvailable
        SMC HW Revision          : NotAvailable
...

miccheck 2.1.4982-15, created 05:25:08 Dec 17 2012
Copyright 2011-2012 Intel Corporation  All rights reserved

Test 1 Ensure installation matches manifest : OK
Test 2 Ensure host driver is loaded         : OK
Test 3 Ensure driver matches manifest       : OK
Test 4 Detect all listed devices            : OK
MIC 0 Test  1 Find the device                       : OK
MIC 0 Test  2 Read device configuration file        : OK
MIC 0 Test  3 Ensure IP address is unique           : OK
MIC 0 Test  4 Ensure MAC address is unique          : OK
MIC 0 Test  5 Check the POST code via PCI           : FAILED
MIC 0 Test  5> Current POST code is �� (not FF) for MIC 0
MIC 0 Test  6 Ping the device                       : SKIPPED
MIC 0 Test  6> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test  6>  The device is not online
MIC 0 Test  7 Connect to the device                 : SKIPPED
MIC 0 Test  7> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test  7>  The device is not online
MIC 0 Test  8 Check for normal mode                 : SKIPPED
MIC 0 Test  8> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test  8>  The device is not online
MIC 0 Test  9 Check the POST code via SCIF          : SKIPPED
MIC 0 Test  9> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test  9>  The device is not online
MIC 0 Test 10 Send data to the device               : SKIPPED
MIC 0 Test 10> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 10>  The device is not in normal mode
MIC 0 Test 11 Compare the PCI configuration         : OK
MIC 0 Test 12 Ensure Flash version matches manifest : SKIPPED
MIC 0 Test 12> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 12>  The device is not in normal mode
MIC 0 Test 13 Ping the host                         : SKIPPED
MIC 0 Test 13> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 13>  The device is not in normal mode
Status: The POST code was not "FF"

Cheers,

Pierre.

imagem de Pierre L.

Hi Sumedh,

I just reverted back to KNC_gold-2.1.4346 is and is OK, no more problem with all my 7110P MICs. There is definitely something wrong with last KNC version running on my harware here. Correct micinfo is as follow:

MicInfo Utility Log

Created Fri Feb 22 16:37:25 2013

    System Info
        Host OS                 : Linux
        OS Version              : 3.0.13-0.27-default
        Driver Version          : 4346-16
        MPSS Version            : 2.1.4346-16
        Host Physical Memory    : 65944 MB
        CPU Family              :  GenuineIntel  Family  6  Model  45  Stepping  7
        CPU Speed               :  1200.000
        Threads per Core        : 1

Device No: 0,  Device Name: Intel(R) Xeon Phi(TM) coprocessor

    Version
        Flash Version           : 2.1.01.0375
        UOS Version             : 2.6.34.11-g65c0cd9
        Device Serial Number    : ADKC24100302

    Board
        Vendor ID                  : 8086
        Device ID                  : 225c
        SubSystem ID               : 2500
        MIC Processor Stepping ID  : 3
        PCIe Width                 : x16
        PCIe Speed                 : 5 GT/s
        PCIe Max payload size      : 256 bytes
        PCIe Max read req size     : 512 bytes
        MIC Processor Model        : 0x01
        MIC Processor Model Ext    : 0x00
        MIC Processor Type         : 0x00
        MIC Processor Family       : 0x0b
        MIC Processor Family Ext   : 0x00
        MIC Silicon Stepping       : B1
        Board SKU                  : B1QS-7110P
        ECC Mode                   : Enabled
        SMC HW Revision            : Product 300W Passive CS

    Core
        Total No of Active Cores: 61
        Voltage                 : 977000 uV
        Frequency               : 800000 kHz

    Thermal
        Fan Speed Control       : N/A
        SMC Firmware Version    : 1.6.3983
        FSC Strap               : 14 MHz
        Fan RPM                 : N/A
        Fan PWM                 : N/A
        Die Temp                : 130 C

    GDDR
        GDDR Vendor             : Elpida
        GDDR Version            : 0x1
        GDDR Density            : 2048 Mb
        GDDR Size               : 7936 MB
        GDDR Technology         : GDDR5
        GDDR Speed              : 5.500000 GT/s
        GDDR Frequency          : 2750000 kHz
        GDDR Voltage            : 1000000 uV

Let us now if you find anything, I guess I'll be theonly one to face this issue.

Cheers, Pierre.

Faça login para deixar um comentário.