After an upgrade of a node from MPSS Gold Update 1 to Update 2 I have had issues with the frontend node in our cluster crashing on boot. I tried to downgrade back to Update 1 but it still keeps happening.
We have upgraded the compute nodes succesfully. They have identical hardware and a bridged network configuration. The frontend has the default configuration in /etc/sysconfig/mic.
The host OS is CentOS 6.3 and the card model is 5110P (B1)
On the host side we get the following error during boot:
micscif_handle_lostnode 1250 node 1
On the MIC I can see the following kernel panic during the early initialization:
[ 0.010000] SFI: Entering sfi_map_memory, phys = eefa0, size = 24 [ 0.010000] SFI: sfi_map_table, th = ffff8800000eefa0 [ 0.010000] SFI: Entering sfi_map_memory, phys = eefa0, size = 312 [ 3.530141] i8042: Can't read CTR while initializing i8042 [ 7.058807] Kernel panic - not syncing: Attempted to kill init! [ 7.058848] Pid: 1, comm: switch_root Tainted: G W 22.214.171.124-g32944d0 #2 [ 7.058875] Call Trace: [ 7.058912] [<ffffffff8134e076>] ? panic+0x91/0x18c [ 7.058944] [<ffffffff81036666>] ? do_exit+0x7b/0x768 [ 7.058971] [<ffffffff81036fcf>] ? do_group_exit+0x6c/0x9f [ 7.058997] [<ffffffff81037019>] ? __wake_up_parent+0x0/0x28 [ 7.059028] [<ffffffff81002aab>] ? system_call_fastpath+0x16/0x1b [ 7.070124] mic_shutdown: system state 57005 dbreg 0x8000dead