Failed firmware/SMC upgrade - ran C0 stepping update command - did this brick my Phi?

Failed firmware/SMC upgrade - ran C0 stepping update command - did this brick my Phi?

I just attempted to upgrade the firmware and SMC on an Intel Phi from version 375 to version 386. I accidentally ran the upgrade command for C0 stepping units (mine is B1 stepping):

$ sudo /opt/intel/mic/bin/micflash -update -device all
No image path specified - Searching: /opt/intel/mic/flash
mic0: Flash image: /opt/intel/mic/flash/EXT_HP2_B1_0386-03.rom.smc
mic0: Flash update started
mic0: Flash update done
mic0: SMC update started
micflash: mic0: Flash operation timed out

I followed that up trying to run the correct update invocation:

$ sudo /opt/intel/mic/bin/micflash -update -smcbootloader -device all
No image path specified - Searching: /opt/intel/mic/flash
mic0: Flash image: /opt/intel/mic/flash/EXT_HP2_B1_0386-03.rom.smc
mic0: SMC boot-loader image: /opt/intel/mic/flash/EXT_HP2_SMC_Bootloader_1_8_4326.css_ab
mic0: SMC boot-loader update started
micflash: mic0: Flash operation timed out

Upon rebooting the host/resetting the card, the card fails to reset:

mic0: Transition from state resetting to reset failed
MIC 0 RESETFAIL postcode H0 12360

Is it possible to recover from this? And is running the wrong update invocation in fact what would have broken the card?

A little more info - prior to the upgrade attempt, micinfo did return some values, but returned NotAvailable for firmware and SMC versions. This did not change after the failed upgrade, but /sys/class/mic/mic0/flashversion does show 386 now, which seems to indicate my problem is related to the failed SMC update.

Thanks.
Andy

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

As a general rule, we do not recommend using micflash with a specific flash version -- the MPSS readme instructs to use the auto-detect capability; this will reduce the chances of such failures).

To attempt to fix your coprocessor, try this sequence of steps:

 

1)   Disable MPSS from starting on host boot

$ sudo chkconfig mpss off

2)   Power-cycle the host (ie. remove power plug completely for a few seconds then re-apply power plug)

3)   Boot host, do not start MPSS

4)   Reset card:

$ sudo micctrl –rw

$ sudo micctrl –s

          (should show ‘ready’.. if shows ‘failed’ or host kernel log shows hung @ some post code, go back to step #2)

5)   Flash card

$ sudo /opt/intel/mic/bin/micflash –update –smcbootloader –device all

6)   If the above fails, immediately execute micdebug.sh and send back the archive this script creates

$ sudo /opt/intel/mic/bin/micdebug.sh

 

 It’s worth noting that step #5 above has the newer, safer way to update the smc bootloader, smc firmware, and KNC flash all in one go. Once smc bootloader has been updated on a card, it does not need to be updated again (omit the ‘-smcbootloader’ option to micflash when updating to later MPSS releases)

 Please let us know how this goes!

thanks

Hi,

Your procedure worked. I had not yet tried actually turning off power, and I think that's what did it for me. The card came up 'ready' after boot and from there the flashing worked as expected.

Thanks!
Andy

Andrew, pardon my interference, but I got curious:

According to what you have quoted in the OP, you did not flash your board with a stepping C0 fw - you flashed it with a B1 stepping fw (as per the automatic flash compatibility procedure). I suspect that if you had actually flashed it with the wrong stepping fw things would've been, erm, 'brickier'.

Hi Martin,

You are correct that I did not specify the firmware - I let the tool do that for me. But I did run a micflash invocation that the documentation specifies is for C0 stepping devices:

6) Run from the command prompt:
user_prompt> sudo /opt/intel/mic/bin/micflash -update \ -smcbootloader -device all

*NOTE* If using C0 stepping, use this command instead for step 6:
user_prompt> sudo /opt/intel/mic/bin/micflash -update -device all

I ran the second invocation first, on accident. That is what I was referring to.

Andy

Ah, I see now. Thank you for the clarification.

Leave a Comment

Please sign in to add a comment. Not a member? Join today