Current problems with the cluster

September 19, 2012
by Wm. Josiah Erikson (wjens)

1. compute-1-6 has some bad RAM (memtest86+ says so). Fix: RMA the RAM Timeframe: Very soon Difficulty: PITA, but whatever
2. compute-1-1 probably has a bad power supply. Fix: Replace power supply. Timeframe: very soon Difficulty: easy
3. compute-4-5 (the monsterest node) isn’t compatible with the version of ROCKS install. Fix: rebuild the cluster with the latest version of ROCKS Timeframe: Have to coordinate with GP and Animation folks Difficulty: Well, every time we rebuild the cluster there are about a million things to figure out again. Hard, I guess, but quite doable.
4. Tractor is out of date and NIMBY still doesn’t work Fix: Update tractor and figure out the new permissions system Timeframe: Coordinate with GP and Animation folks Difficulty: Probably not that hard, but I’d rather do it when rebuilding the cluster if possible.



2 Responses to “Current problems with the cluster”

  1.   wjens Says:

    Reseating compute-1-6’s RAM did the trick. Ran 7 full memory tests with memtest86+ and is passed. Re-added to cluster.

  2.   wjens Says:

    Replaced compute-1-1’s power supply. Fixed.

Leave a Reply

You must be logged in to post a comment.