Xeon Phi causing PC to freeze

Xeon Phi causing PC to freeze

For some time now, my PC has been freezing periodically - about once every 2 days, the screen looks normal, but mouse, keyboard etc. do not respond and I have to hold the power key in to restart.

By process of elimination, the culprit appears to be the Xeon Phi 3120A card - I just took the card out and ran the PC for a month without a freezing incident.
Put it back in and it froze the same day.

I have tried it in both possible expansion slots - froze with both.

It is not happening when the card is in use, mostly it happens when the PC is idle and I come back and find it frozen, so it is nothing to do with any Xeon Phi activity.

The motherboard is an Asus Z9 PE-D8 WS with 2 Xeon E5-2655 v2 processors.
It has the latest bios.

The OS is Windows 8.1 Pro.

There is also an NVidia NVS295 video card in slot 1.

I have taken all other cards out (part of my process of elimination).

The Xeon Phi works fine (when the system is not frozen).

What can be causing this and what can I do about it?

Many thanks

Roger

 

27 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Have you installed any MS updates at the time you started to see the freezing?
Have you installed any video card updates at the time you started to see the freezing?

You may have an issue with screensaver and/or hibernation and/or low power state management.

Try setting power management to highest performance (iow no shut down of devices) and disabling screensaver and hibernation.

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim,

Thanks for your thoughts.

No MS or video card updates since reinstalling the card (i.e. Saturday morning).

It first started doing this about 3 months ago, but it took a long time to narrow the problem down.
That was when I first bought the card and built the PC - i.e. the card has always done this.
There have been many updates over that period.

Windows power management has been disabled throughout (great minds think alike) and so there is no screensaver/hibernation/low power state management that I am aware of.

best regards

Roger

.

I have an ASUS P9X79 WS with one Xeon E5-2620, and two Xeon Phi 5510P coprocessors. While I can dual boot to CentOS and Windows 7 Pro x64, I usually run CentOS. Neither OS exhibited a periodic hang. I think the last time I booted my system was 3-4 months ago.

This may be a Windows 8.1 issue.

I have an Ultrabook with Windows 8.1 and got tired of the touchpad so I added a Bluettooth mouse. Periodically the O/S would kill the mouse... while using the system. It took a while to figure out that just using the system (e.g. typing in content or reading using Page Down), that the O/S would assume I am no longer using the Bluetooth mouse... so it shut it down. Apparently when its down, it won't see the mouse move. I turned that off and that fixed that issue.

Because your system works while you are using your Xeon Phi, this is indicative of a power management gone wrong. Devices have power management, the Bluetooth controller in my case, but Hard Disks have sleep mode too. If I were to guess it would be Windows 8.1 is trying to power manage what it thinks is a Video Adapter.

There is a new MPSS driver, I think it came out this month, you might give that a try. I haven't installed it as my system does not crash.

Jim Dempsey

 

www.quickthreadprogramming.com

Thanks Jim,

I will certainly try the new MPSS.

Windows 8.1 is a pain, but as there is no upgrade path to Windows 7, I find the complete reinstallation of everything that changing would entail very daunting. Perhaps 8.2 will be better (i.e. more like 7).

I am not clear why you see the power management as the likely cause. I had assumed that with Windows Power service disabled, it would not be trying to manage anything much. Incidentally, a blue light on the back of the Xeon Phi flashes even when the system is frozen, so it looks like it is alive.

Another possibility is presumably that the Xeon Phi is defective, but unfortunately I do not have another PC with a suitable double-width port.
I am not sure how I could determine this. 
Alternatively, the motherboard could also be defective or incompatible with this Xeon Phi model although again, I am not sure how I could tell.

Best regards

Roger

 

I use Windows Server 2012, which is very similar to Windows 8, but I didn't see this problem. I will try install MPSS on  Windows 8.1 and let you know. Thank you. 

>>>By process of elimination, the culprit appears to be the Xeon Phi 3120A card - I just took the card out and ran the PC for a month without a freezing incident.>>>

Looks like a kernel issue.Can you check event viewer for any Kernel-Power (PnP) related issues?
 

Thanks for all your comments,

@iliyapolak  yes, I do see a Kernel Power error ID 41, without a bugcheck code.
From the time, it looks like the moment I held the power button in for several seconds to escape from the freeze, rather than the moment it froze, but I am not absolutely sure.
If it happens again, I will double-check.

 @loc-nguyen thanks for your efforts.
In case it is relevant and you have a choice, I am using Windows 8.1 Pro x64 in English.

Best regards

Roger

>>>@iliyapolak  yes, I do see a Kernel Power error ID 41, without a bugcheck code>>>

Probably system did not call KeBugCheckEx function either because it is hang or probably the issue is not regarded as a critical one.

Can you post an error description?

RE: I am not clear why you see the power management as the likely cause. I had assumed that with Windows Power service disabled, it would not be trying to manage anything much

1> It is not happening when the card is in use,
2> mostly it happens when the PC is idle and I come back and find it frozen

The above can be caused by power management.

Additional possibility could be something scheduled to run when system appears to be idle. This could be a Windows 8.1 thing or some addon (e.g. background scan).

Describe what you have when you "find it frozen"

Mouse dead, keyboard dead, screen black, screen in screensaver mode, disk dead...

Jim Dempsey

 

www.quickthreadprogramming.com

@iliyapolak  This is what I see in the Windows System error description.

The error is flagged as "Critical"

 

- System 

  - Provider 

   [ Name]  Microsoft-Windows-Kernel-Power 
   [ Guid]  {331C3B3A-2005-44C2-AC5E-77220C37D6B4} 
 
   EventID 41 
 
   Version 3 
 
   Level 1 
 
   Task 63 
 
   Opcode 0 
 
   Keywords 0x8000000000000002 
 
  - TimeCreated 

   [ SystemTime]  2014-03-30T10:10:13.125119700Z 
 
   EventRecordID 29315 
 
   Correlation 
 
  - Execution 

   [ ProcessID]  4 
   [ ThreadID]  8 
 
   Channel System 
 
   Computer AsusDesktop 
 
  - Security 

   [ UserID]  S-1-5-18 
 

- EventData 

  BugcheckCode 0 
  BugcheckParameter1 0x0 
  BugcheckParameter2 0x0 
  BugcheckParameter3 0x0 
  BugcheckParameter4 0x0 
  SleepInProgress 0 
  PowerButtonTimestamp 0 
  BootAppStatus 0 

@jimdempseyatthecove  

OK thanks.

I saw it more as an matter of probability that it didn't happen when the card was being used and did when the machine was idle as that is the normal state:

- the card has barely ever been in use, as I am just now starting to test it and learn about it. None of my existing code is adapted to use it yet, so it has had little opportunity to fail during use.

- it has frozen sometimes when I was using the PC actively (e.g. in Excel, e-mail etc.), but the PC spends more of its day being ignored than used, especially as I have left it running deliberately to see whether moving the card between slots made a difference etc.

Still that doesn't mean that it isn't a power issue.

 

By frozen, I mean mouse dead, keyboard dead, but screen looks normal showing whatever it was doing before. So it could just be all USBs disabled and the CPUs etc. running fine.

Updated: on reflection, my last sentence is wrong as I have seen it freeze with real-time monitors (CPU temp etc.) running and they froze too.

My experience with Win8.x shows even worse behavior than previous versions when left to time out.  I've kept my MIC usage to linux, and I go back to Win7 when building gcc and running cyberbass.

There's been some talk of Intel trying to build up Android as a serious contender to Windows in the tablet arena. Most Win8 platforms were designed to exclude alternatives, and I'm sure I'm not the only one to regret this.

If Android has any effect on the development of MIC support for consumer-oriented platforms, it is more likely to be negative due to taking away resources.  

I haven't noticed any direct effort from Intel to support MIC on other than dual Xeon CPU servers.  If the market for single CPU desktop servers were bigger, MIC might make more sense, as the relative benefit could be much larger.  I still have the original core I7 desktop running linux and Win7, with the MIC card no longer supported.  It has burned through 2 video cards as well, so it's in the scrap heap era as far as anything which would have current hardware support.

@Roger567

I think that Kernel-Power PnP event message is a direct to response to unexpected shutdown of the computer.

http://support.microsoft.com/kb/2028504#method1

>>>It is not happening when the card is in use, mostly it happens when the PC is idle and I come back and find it frozen, so it is nothing to do with any Xeon Phi activity.>>>

Looks very strange and as Jim hinted it could be power management issue.In general freeze of the system can be due to IRQL being at  DIRQL or above for prolonged time or due to ISR/DPC routine(s) running endlessly.In such a situation usually DPC_WATCHDOG_TIMER BSOD will be generated to bring down the system.Now in your case there is no aferomentioned BSOD and I suppose that probably elevated DIRQL and/or kernel mode code entered infinite loop.

*DIRQL - Device IRQL.

The problem is also that you cannot created kernel mode dump because of system hang.I would like to advise you to create kernel mode dump when the Xeon Phi is idle  with the tool which is called NotMyFault and maybe contact Intel Premier Support in order to send them that dump if possible.

@iliyapolak  
Re: >>> I think that Kernel-Power PnP event message is a direct to response to unexpected shutdown of the computer <<<

It may be, but I don't know how you can be so sure.

Wouldn't scenario 2 from your link also explain the Kernel error?
I did hold down the power button to recover.

It hasn't frozen for a couple of days. Next time it does (if it does), my plan is to note the precise time I press the button and see if the Kernel error (if there is one) occurs exactly at or some time before the button press.

If it is clear that a Kernel error precedes a freeze, I will try Premier Support and I have downloaded NotMyFault in anticipation, but for now I am trying to keep an open mind.

If I am really really lucky, Jim's first suggestion of updating MPSS has already fixed it, but with an average frequency of 1 freeze per 2 days, it will take some time to be confident.

 

Hi Roger,

Just for a quick update, I installed Windows 8.1 Pro x64 English on my server, then MPSS 3.2 followed by Visual Studio 2012, and Intel Cluster Studio XE 1013 SP1 Update 1 (which includes Composer 2013 SP1 update 2).

So far, I ran the MIC sample C code shipped with the composer successfully. And the system is still up for many hours without any problem. 

Thanks for your efforts.

I really hope it keeps working.

To discover that it is a fundamental incompatibility with Windows 8 and to have to completely reinstall everything would be a nightmare scenario for me.

>>>It may be, but I don't know how you can be so sure.

Wouldn't scenario 2 from your link also explain the Kernel error?

I did hold down the power button to recover>>>

I do not think that Kernel-Power ID:41 is directly related to Xeon Phi hang.I can only theorize that event ID:41 is related to scenario where you pressed power button(Scenario #2 in that link).

 

 

It has now run for 5 days without a freeze, which I reckon only has an 8% probability if it isn't fixed.
So not conclusively mission accomplished, but looking very good.

The only substantive change was the MPSS update to 3.2.

@loc-nguyen did your test machine with Windows 8.1 continue to behave nicely?

Hi Roger,

Sorry for my late response. I haven't had any problem since I tested the system with Windows 8.1. The system still behaves properly even though  I did not work with the coprocessor for a while.

Thanks for your efforts.

A while ago the motherboard died and I don't know if the coprocessor was the culprit (it had previously worked until I installed the Xeon Phi).

The replacement motherboard has run without problems (and without the coprocessor) for some months now and some day I hope to have the courage (and enough time to recover if it fails again) to put the coprocessor back in.

Is your power supply sufficient for all the "stuff" you have?

If it is marginal, and the voltages drop a bit, then things can over heat.

Also, verify that the cooling is not an issue. Most motherboards now have a decent monitor that looks at the various temperature sensors and voltage sensors and fans. It wouldn't hurt you to keep this running, and in view, while you experiment with the Xeon Phi installed.

Also run the MPSS utility that monitors the Xeon Phi.

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim,

Is your power supply sufficient for all the "stuff" you have?

It should be.

The power supply is 1500W and the output seems rock steady. The power supply manufacturer (EVGA) has its own monitoring utility and a USB connection to the MB. It mostly runs <30% capacity.

Cooling was also something I considered carefully in the build (big case, loads of fans, big CPU fans) and everything I can measure is fine. CPU temp is mostly <50C except when running intensively.

Originally I had in mind that I might get 3 of these coprocessors and so went overboard with the power supply and cooling, but I have never tried it with more than one, so it shouldn't be taxed.

The only issue I've had with blown cards is the video card. I am not into gaming but I do use dual monitors. I chose the EVGA GEForce GT 610. The choice was primarily due to it being only a single height card and I wanted to stick it into the bottom slot. Under the bottom slot on my mother board are fan headers that get in the way of a double height card. I do have room between the two Xeon Phi's I have installed to insert a high end video card, but I have no need for such a card. The original card (same model) lasted 9 months. It is a $50 card, not worth the paperwork to file for a refund.

I do not use Windows 8.1 other than on a notebook I have. It is rumored the next Windows 8.2 upgrade will bring back the 7.x desktop (for desktop users). http://www.geek.com/microsoft/windows-8-2-could-bring-back-the-start-menu-this-august-1592050/

It may also be a re-dressed W7. Hopefully, you will also get back some of the controls and functionality they threw out. One of my personal gripes is from my Windows 7 desktop, with 4K monitor, that I cannot Remote Desktop connect to the Windows 8.1 Lenovo Yoga. Luckily VNC works. (My Xeon Phi's are in a different system.)

Jim Dempsey

www.quickthreadprogramming.com

ooops.

Leave a Comment

Please sign in to add a comment. Not a member? Join today