UPS Batteries

December 17, 2010
by Wm. Josiah Erikson (wjens)

We had four MGE Pulsar Evolution 3000’s and two free-standing tower units (one APC, one Tripp-Lite, both 3000VA), as well as a smaller BACK-UPS Pro 1000. One of the tower units stopped “booting” – I threw it out. The two bottom rackmount MGE units stopped getting us through power failures and were beeping at me to replace the batteries, so I ordered some. Much cheaper than a new UPS! Just today, the top UPS randomly power-cycled itself. I don’t really trust it and think it might actually be dead, but I will test it by putting the batteries from another unit in it and seeing if it works then. If it does, I’ll get it new batteries, too. For now, I’ve offloaded the machines that were plugged into it onto the bottom two UPSes that have brand-new batteries.


Pixar software upgraded

December 16, 2010
by Wm. Josiah Erikson (wjens)

Upgraded to tractor 1.0.5 and RMS3 awhile (three weeks?) ago, forgot to post it.


R upgraded to 2.12-0.1

December 16, 2010
by Wm. Josiah Erikson (wjens)

That’s all. Cluster-wide, installed from RPMs.


compute-1-9

December 16, 2010
by Wm. Josiah Erikson (wjens)

It was rebooting randomly, figured out eventually, after testing many other things, that it was the power supply just not quite supplying the correct power. The power supply in there is an el cheapo unit from like 8 years ago. The node is currently sitting on the table, top off, running, with another temporary power supply from another old machine (I think it’s from a PIII/500) plugged in while we wait for the new, good-quality Antec power supply to come in. I think I’ll have to turn the fan in it around, as the new one is designed for a standard PC case and the airflow will be backward.


Massive upgrade complete

July 2, 2010
by Wm. Josiah Erikson (wjens)

…except for the front page of fly and the wiki, which I still have to update and/or migrate over. Plus, of course, any software config that I forgot. Synopsis:

Before: 40 nodes, 160GB of RAM, 84 processor cores. (in March)

After: 32 nodes, 240GB of RAM, 162 processor cores.

Now I just have to compute our power savings, which I’m sure is quite substantial, since it’s not just power drawn, but that is actually more than doubled, because you have to factor in A/C as well. Those old dual-Xeon P4 HT nodes I pulled out (11 of them!) used a lot of power.

The two nodes in rack 4 are both going full-bore right now with Lee’s Clojure GC runs, and they’re throwing off a surprisingly small amount of heat – less than the old compute-0-x nodes used to at idle, by the “hand at the back of the rack” test, which is, as we all know, extremely accurate.

We had a little adventure with compute-1-9, which wouldn’t stay up, and I bought it a new motherboard and CPU before finally figuring out (with Doug’s help) that it was actually the power supply the entire time. Of course, I replaced the old dual-core 2.13Ghz CPU with a quad-core 2.6Ghz, and the motherboard is much better, and it’s got 8GB of RAM now, so we made out well in the end. Putting a generic off-the-shelf power supply in that node was a big of a kludge – had to turn the fan around backwards so that air would flow the correct direction, and then use a power extension cable to get the power to the back of the case. It looks a little silly, but works fine.

We’ll see over the next few weeks how many things I forgot.

I also only restored accounts that had been actually used this semester, so I may get some account restoration requests – no problem.

Reminder to self: put new backup drive in place and set backups back up.


Massive upgrade underway

June 28, 2010
by Wm. Josiah Erikson (wjens)

We’ve got a new 32GB, 48-core node (not kidding! Yes this node is more powerful by itself than any previous entire rack of the cluster) to put in, 10 32-bit nodes and a 32-bit head node to throw out, a new 8-core, 16GB head node to put in, and 13 nodes to upgrade from dual-core Athlon X2 5000+ with 4GB of RAM to quad-core Phenom II 945’s with 8GB of RAM. Actually, I already did one of those – compute-1-0… worked great, though it required a BIOS update. We also got a recent upgrade to compute-1-8… it’s now a quad-core.

I’ll have to compare the CPU/RAM count of the cluster as it will be next week to how it was in March – I think we might double it.

The cluster will be down for much of this week as I execute this massive upgrade.


compute-0-2

April 21, 2010
by Wm. Josiah Erikson (wjens)

has a bad hard drive… must fix (or not, this is one of the oldest nodes. I’ll put a new one in if I have a random spare)


dual quad-core Nehalem node!

April 20, 2010
by Wm. Josiah Erikson (wjens)

Very exciting… Lee has just funded us buying a Dell PowerEdge R410 with 24GB of RAM and dual quad-core 2.26Ghz Nehalems. This will show up for most intents and purposes as a 16-processor node. Should be fun. Fighting with kernels for the install – had to get a newer pxeboot kernel to support the BCM5716 and then put the newer kernel into the distro as well – we’ll see if this screws things up with the distro – I just took the packages from newer CentOS and ROCKS 5.3 (rocks-boot and kernel-*) without actually upgrading to ROCKS 5.3 – we’re still running 5.1. Cross your fingers!


January 5, 2010
by Wm. Josiah Erikson (wjens)

compute-1-8 is down for some reason… I’ll check it out tomorrow


entire cluster back up

October 14, 2009
by Wm. Josiah Erikson (wjens)

…and it’s got erlang, and ruby, and it’s rendering correctly with the new version of tractor. Also, compute-0-0 and compute-0-1 are back; Zach helped me to replace the hard drives this morning.