Kernel Panic on MIC boot

Kernel Panic on MIC boot

After an upgrade of a node from MPSS Gold Update 1 to Update 2 I have had issues with the frontend node in our cluster crashing on boot. I tried to downgrade back to Update 1 but it still keeps happening.

We have upgraded the compute nodes succesfully. They have identical hardware and a bridged network configuration. The frontend has the default configuration in /etc/sysconfig/mic.

The host OS is CentOS 6.3 and the card model is 5110P (B1)

On the host side we get the  following error during boot:

micscif_handle_lostnode 1250 node 1

On the MIC I can see the following kernel panic during the early initialization:

[    0.010000] SFI: Entering sfi_map_memory, phys = eefa0, size = 24
[    0.010000] SFI: sfi_map_table, th = ffff8800000eefa0
[    0.010000] SFI: Entering sfi_map_memory, phys = eefa0, size = 312
[    3.530141] i8042: Can't read CTR while initializing i8042
[    7.058807] Kernel panic - not syncing: Attempted to kill init!
[    7.058848] Pid: 1, comm: switch_root Tainted: G        W #2
[    7.058875] Call Trace:
[    7.058912]  [<ffffffff8134e076>] ? panic+0x91/0x18c
[    7.058944]  [<ffffffff81036666>] ? do_exit+0x7b/0x768
[    7.058971]  [<ffffffff81036fcf>] ? do_group_exit+0x6c/0x9f
[    7.058997]  [<ffffffff81037019>] ? __wake_up_parent+0x0/0x28
[    7.059028]  [<ffffffff81002aab>] ? system_call_fastpath+0x16/0x1b
[    7.070124] mic_shutdown: system state 57005 dbreg 0x8000dead

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I removed all the persistent MIC-related directories that were not cleaned with the RPM removal (/etc/sysconfig/mic, /opt/intel/mic) I also noticed that the ofed drivers were missing and installed them.

It seems that one of these two actions helped and the card boots again normally.

Leave a Comment

Please sign in to add a comment. Not a member? Join today