compute-1-12 has hardware errors
June 26, 2014by admin
I’m getting kernel: [Hardware Error]: Combined Unit Error: VB Data/ECC error
Could be CPU, memory, or motherboard. This post has almost the exact same hardware we’ve got: http://ubuntuforums.org/showthread.php?t=2010489 and he didn’t have bad memory… those motherboards have been really lousy. We’ve got two coming back from RMA soon, and I’ve got a couple of spare CPUs sitting around that came back from RMA… guess I could swap one or the other of those out.
June 30th, 2014 at 10:43 am
[root@compute-1-12 ~]#
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel: [Hardware Error]: CPU:4 MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc44c0c000040136
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel: [Hardware Error]: #011MC2_ADDR: 0x00000002ffa0a488
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel: [Hardware Error]: Combined Unit Error: Fill ECC error on data fills.
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel:[Hardware Error]: CPU:4 MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc44c0c000040136
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel:[Hardware Error]: MC2_ADDR: 0x00000002ffa0a488
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel:[Hardware Error]: Combined Unit Error: Fill ECC error on data fills.
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel:[Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD
Message from syslogd@compute-1-12 at Jun 29 08:35:34 …
kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD
Message from syslogd@compute-1-12 at Jun 29 20:15:34 …
kernel:[Hardware Error]: CPU:4 MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 0x9844c000000c0176
Message from syslogd@compute-1-12 at Jun 29 20:15:34 …
kernel:[Hardware Error]: Combined Unit Error: VB Data/ECC error.
Message from syslogd@compute-1-12 at Jun 29 20:15:34 …
kernel:[Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV
Message from syslogd@compute-1-12 at Jun 29 20:15:34 …
kernel: [Hardware Error]: CPU:4 MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 0x9844c000000c0176
Message from syslogd@compute-1-12 at Jun 29 20:15:34 …
kernel: [Hardware Error]: Combined Unit Error: VB Data/ECC error.
Message from syslogd@compute-1-12 at Jun 29 20:15:34 …
kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV
July 3rd, 2014 at 12:43 pm
It passed 11 consecutive memtest86+ tests, so I think the RAM is good. If it does it again, I’ll replace the CPU, as I have another. If it still does it, the motherboard is bad.
September 15th, 2014 at 8:21 am
It’s still doing it… and I bought an extra Gigabyte motherboard. Guess it’s time to swap it out. Those MSI ones are crap!