bad enclosure?

April 6, 2017
by Wm. Josiah Erikson (wjens)

Compute-2-26 through 2-29 are down. 29 has been down for quit awhile… when trying to replace it, for some reason all the rest of the nodes in that enclosure rebooted. I’m becoming suspicious of the enclosure itself and am going to replace it, as I have spares.


compute-1-7

February 27, 2017
by admin

Died. Will not power on again. Probably bad power supply. Will check.


compute-1-18 and compute-1-3

January 31, 2017
by Wm. Josiah Erikson (wjens)

These two nodes weren’t running tractor – they had rebooted themselves and reinstalled. Compute-1-18 5 days, 11:39 uptime and compute-1-3 1 day, 5:18. Not sure why – neither has anything in the logs nor shows anything particularly suspicious in ganglia.


compute-2-25

January 31, 2017
by Wm. Josiah Erikson (wjens)

Appears to have rebooted and reinstalled itself about 20 hours ago. Not
sure why – nothing in the logs. I’m beginning to suspect that chassis
that has 22-25 in it may have a backplane problem – I can replace it if
so, as I have two spares.


compute-2-23

January 30, 2017
by Wm. Josiah Erikson (wjens)

Died. Like won’t turn on. Luckily I had just bought some new compute nodes, and I replaced it with a shiny new 6-core node with 48GB of RAM (combined the RAM from the old and the new – there were spare slots). Upgrades!


2-5 through 8

January 23, 2017
by Wm. Josiah Erikson (wjens)

One of the C6100’s, containing compute-2-5 through 2-8, rebooted 18 hours ago. I’m not entirely sure why yet, but if you saw runs mysteriously die, that’s why. It could be a power supply issue or it could be that side of the PDU got overloaded. I’ve moved it to the other side where there is 1A smaller power draw. If it does it again, I’ll replace the power supply.


compute-1-10 bad RAM

January 20, 2017
by Wm. Josiah Erikson (wjens)

One of the four sticks was bad. Am RMA’ing it. Will have 24GB of RAM until it comes back.


Three node notes

January 20, 2017
by Wm. Josiah Erikson (wjens)

1. Compute-1-7 had crashed with a “soft lockup” kernel bug. I rebooted.
Will keep watching it. Could indicate a hardware problem, could be random.

2. Compute-1-17’s hard drive died. I replaced it.

3. Compute-2-24 died entirely – would not power on. As it was a node in
one of the C6100’s, and I had a spare, I replaced it. Now it has faster
processors and twice as much RAM. Note to self: I’m out of spare C6100
nodes.


compute-1-3 is dead, long live compute-1-3!

July 23, 2015
by admin

Motherboard this time. Video went all wonky. Tried replacing both RAM and CPU and didn’t help. Had new AM3+ motherboard on hand, so got 16GB of RAM and an FX-8350 for under $300 and threw ’em in there.


compute-1-16 dead

June 10, 2015
by admin

Probably power supply. Will check out this afternoon