Archive for September, 2012

compute-1-1 is back

Wednesday, September 19th, 2012

Replaced power supply with one salvaged from the Library Basement. Good to go.

Power outage

Wednesday, September 19th, 2012

We had a power outage last night. All the nodes came back up fine (most of them didn’t go down at all, thanks to the UPSes, actually, but all of Rack 2 went down, of course) except for compute-4-4, which came back up with a hard power-off and power-on. Not sure why compute-4-4 didn’t make it through the power outage when the others did… maybe I need to swap around the dual power supplies a bit…

Current problems with the cluster

Wednesday, September 19th, 2012

1. compute-1-6 has some bad RAM (memtest86+ says so). Fix: RMA the RAM Timeframe: Very soon Difficulty: PITA, but whatever
2. compute-1-1 probably has a bad power supply. Fix: Replace power supply. Timeframe: very soon Difficulty: easy
3. compute-4-5 (the monsterest node) isn’t compatible with the version of ROCKS install. Fix: rebuild the cluster with the latest version of ROCKS Timeframe: Have to coordinate with GP and Animation folks Difficulty: Well, every time we rebuild the cluster there are about a million things to figure out again. Hard, I guess, but quite doable.
4. Tractor is out of date and NIMBY still doesn’t work Fix: Update tractor and figure out the new permissions system Timeframe: Coordinate with GP and Animation folks Difficulty: Probably not that hard, but I’d rather do it when rebuilding the cluster if possible.