AMT could not connect to machine after Windows boot error displayed

AMT could not connect to machine after Windows boot error displayed

Hi,

A customer has a Lenovo M93p with AMT 9.1.0. I have set it up for remote KVM and it works fine.

Last week, I restored this system from a Windows backup. While the restore was running, I left the customer site and returned to my office. When I got back to my office, I was able to connect to the machine using the Intel Manageability Commander Tool, start the remote VNC Viewer, watch the restore complete, and click OK to initiate the required reboot. So far so good.

However, after the reboot, the machine did not start properly; it needed a startup repair. I saw error 0xc00000e as white text on a black background. I believe it was at this point that I tried to use Take Control to reboot the machine. Soon I was no longer able to connect remotely to the machine using the Commander Tool. The machine was not even responding to ping. It was like AMT it was no longer pulling an IP address for the machine. I had to drive back on site to complete the startup repair.

Once I completed the startup repair, AMT remote KVM started working again.

Obviously this is the exact situation where you want out-of-band manageability, and the reason to pay a premium for advanced AMT machines. But it failed.

Is there a known issue with controlling a machine that is displaying a boot error? Is there something I could have done to restore connectivity while off-site?

Thanks,

Mark Berry
MCB Systems

 

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hey Mark

In theory the issue you saw was related to the Master Boot Record, so should not have been a factor for remote management.

However your description left me wondering exactly how were you connecting? If you where using only the Commander tool, then you were actually using Serial-Over-Lan (SOL). AMT Commander does not have KVM technology integrated by default. That would explain the lack of visible connection as the device was possibly at gui based windows(KVM) vs command line (SOL).

As to the end of ping, depending on where in the boot cycle it stopped. The NIC could have been handed off to the OS drivers and respond to ping would follow the rules of the OS at that time.

Was the device by chance provisioned and connected via wireless?

Hi Joseph,

Thank you for your prompt reply.

Yes probably MBR and/or wrong partition was marked Active.

The initial connection is always via Remote Commander Tool. I have installed self-signed certificates on all managed machines, so I use Remote Commander to initiate the SSL connection, then click Launch Viewer to do KVM using UltraVNC. (Blogged all this here.)

When I say it couldn't connect, I mean that when I clicked on Remote Commander's "Connect" button, it tried for a while ("Abort Connect") then gave up (went back to "Connect'). The button never changed to "Connected" so both the "Take Control" and "Launch Viewer" buttons stayed grayed out. In other words, I never got as far as Serial-over-LAN.

No, this device has no Wi-Fi. It's all through the on-board NIC.

Hmm... are OS network drivers active during a boot error? I don't know, but I agree that could explain no ping.

I realize troubleshooting an issue that is no longer occurring is almost impossible. Just thought I would check if there is a known issue (so I don't drive home next time before I finish!). It occurs to me I should have checked Help > Debug Information while the error was occurring and;/or done some packet sniffing. Probably not worth it to me to try to recreate the issue at this point.

Mark Berry
MCB Systems
 

Hey Mark,

In general the only reason to loose the network connectivity during a reboot is during the handoff from the firmware to the OS. On a physical LAN this happens in just a few seconds(~2 ping return failures), with wireless this hand off can take a lot longer to commence.

Your usage model according to the blog looks just fine and as far as I can tell things should have worked as expected. Sounds like we might not figure this one out, if you encounter it again let us know.

Joe

Thanks Joe. So what triggers the "handoff"? It seems like AMT maybe handed off network control to the "OS" but it wasn't really an OS, just a boot error, so it didn't load a network driver. Is there a flowchart somewhere that explains how the control flow works?

It's tempting to scramble an MBR and see if I could duplicate the problem...

Mark Berry
MCB Systems

Hey Mark

The handoff of the LAN connection is the activation or deactivation of the Network driver within the OS. An easy way to see this is to setup a non terminating ping (ping <ipaddress> -t) to the vPro client and then disable the network interface within the OS. This will provide a temporary stop to returns (2) of the ping packet and then a restart with a different TTL value.

The handoff of the wireless connection is a bit more sophisticated and is described here.

As to your specific situation, it is unclear as to why the AMT failed to respond, without any debug logs.

If I had to conjecture, it was probably a BIOS issue of some sort and how it handled the handoff. During POST, the BIOS which temporarily "owns" the NIC failed to release control, as there was a corrupt MBR. If you continue to see this, I would recommend filing a bug report to the OEM to root cause the issue 

The odd thing is that I saw the boot error in the KVM window the first time (I know because I wrote it down after seeing it). I'm pretty sure it was after I attempted to reboot remotely from the Commander Tool that I lost the KVM and further connections failed. It's like the error screen was displayed while the BIOS was still in control but then it did the handoff--to nothing. I.e. the BIOS failed to _retain_ control since there was no network driver in the bootloader OS.

I will try to remember to test this next time I encounter a boot error.

Thanks for your help and insight,

Mark Berry
MCB Systems

I'm running into a similar problem with an Asus Q170M-C-CSM motherboard. If Windows 10 crashes during boot (which it has been on a consistant basis, usually during automated reboot due to updates being installed), the remote power controls don't work even when I've been connected from commander from boot.

Would there be a way to prevent hand-off of the ethernet port to windows? I have multiple network cards so would be able to dedicate the internal port for management and use a PCIX port for OS.

I'll have to open a ticket with Asus maybe if this continues. Coming from a DQ57TM board that was rock solid.

Thanks,

Chris

Hey Chris,

I suspect that your issue isn't with the hand off as you seem to be talking a wired connection. In wireless Link control is an issue between OS and Firmware.

If you are connected via wired connection and watching the boot process and the KVM fails when the OS crashes. It sounds like it is a OS and Board interaction causing a disconnect. I would put a ticket in with ASUS for them to resolve the Win 10 issue

Leave a Comment

Please sign in to add a comment. Not a member? Join today