mce: [Hardware Error]: Machine check events logged

mce: [Hardware Error]: Machine check events logged

Hello,

I have a custom board(RC10), which has E3845 and is similar to MinnowBoard MAX. I have customized from Intel Firmware Engine MinnowBoard MAX firmware to RC10 by enabling i2c-0, PCIe-2, etc. When the Linux system boots, it shows "mce: [Hardware Error]: Machine check events logged" 300 seconds after the boot.

1. Since the original configuration came from the MinnowBoard MAX, which uses E3825, the mce error might come from it. If yes, how can I change the processor to E3845.

2. Other than #1 I don't have any idea where the mce error came from. Is there any way to track it down by disabling HW components(e.g. PCIE-0)?

 

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

We'd like to get the log of the machine check exception to figure out what's going on.

On Linux systems, you should be able to get this using mcelog - http://mcelog.org/

As an example you can install this on Ubuntu/Debian using apt-get:

sudo apt-get install mcelog

The events will be logged to /var/log/mcelog. You can also run:

sudo mcelog --client

to query the mcelog daemon for errors.

Best Reply

Hello Brian,

Here is the output of mcelog --client:

mcelog: failed to prefill DIMM database from DMI data

Kernel does not support page offline interface

mcelog: Family 6 Model 37 CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 0

ADDR fef80000

TIME 978536917 Wed Jan  3 10:48:37 2001

MCG status:

MCi status:

Uncorrected error

MCi_ADDR register valid

Processor context corrupt

MCA: Internal unclassified error: 410

Running trigger `unknown-error-trigger'

STATUS a600000007600410 MCGSTATUS 0

MCGCAP 806 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 55

Thanks Jong. We'll investigate this and let you know what we find.

Jong: did you try to enable ECC memory on your board?

Hello Brain,

Unfortunately, we don't have ECC (E3845 - DRAM1_DQ[56..x] aka DRAM0_ECC_DQ[0..x]) in RC10 board design. We didn't think it was necessary.

Are you recommending to have ECC in RC10 board design? Do you think the MCE message come from memory?

 

No, I don't recommend using ECC with this project. It wasn't a feature enabled on the MinnowBoard Max. I'm just trying to rule it out as a problem. Thanks for the information.

Hello Brian,

For your information I tired 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max on both RC10 and MinnowBoard MAX. None of those had the mce error. I think the firmware built from the Intel Firmware Engine had some problem. What do you think?

We're investigating the 0.84 codebase differences already. There may be some delay on our end due to the Christmas holiday, but I'll keep you posted. Thanks.

Hello,

Is there any update? Thank you.

what kind of linux did you try? yocto?

It's Debian 8 Jessie. As I mentioned previously, 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max didn't have mce error with Debian 8.

I can reproduce it in ubuntu and yocto. After debugging, i found this machine check error actually happens during bios post. It is not a critical error, minor issue and happens only once. Will not impact later OS running. You can temporarily ignore it. Besides, the root cause has been found, we are gonna fix this bug in later release.

Hi,

Is it possible to send you a release candidate to see if you see the issue again?

 

Hello Laurie,

Yes, I can try.

Can you send an email to Firmware_Engine@intel.com so I can give instructions for downloading a pre-release for testing.

Leave a Comment

Please sign in to add a comment. Not a member? Join today