Archive for May, 2013

UPS Overwhelmed

Friday, May 10th, 2013

I guess the new nodes draw a lot more power when going full-tilt. The latest job that Tom submitted took out one of the UPSes, and therefore a bunch of nodes. I’ve moved some of them over to wall power, and they’re reinstalling now…. should be back up soon, but you’ll have to restart some of your jobs (or your whole run, depending on how you feel about the randomness of that event), Tom. Sorry! I guess we need another couple of UPSes if we want to cover the whole cluster… I have some, but they need batteries – around $500 would put them both back in business….

Troubleshooting emergent

Friday, May 3rd, 2013

There’s a problem with emergent and breve – they both segfault regularly, probably because I built them wrong, and Jaime has had to write a little script to detect this and kill/restart them as appropriate (I don’t understand the details of what he’s done, but suffice it to say that I know it’s broken). So I’m working on recompiling it more correctly, on a node where there aren’t stray random libraries in /usr/local, etc

Here are my notes from Trello, where I’m tracking the project. I’m currently recompiling Qt:

emergent won’t compile without GL support in Qt… installing GL libraries on compute-2-1, where I’m compiling… lots of dependencies, total PITA

58 minutes ago
Wm. Josiah Erikson

Qt-everywhere had been built with OpenGL support, since that’s installed on the head node, but it seemed to be broken anyway, we don’t need it, and it’s not installed on the compute nodes, so recompiling without it…

yesterday at 10:53 am
Wm. Josiah Erikson

Compiling on compute-2-1. Also forced 64-bit compile….

yesterday at 10:34 am
Wm. Josiah Erikson

Gonna see if I can compile it on a node.

yesterday at 10:01 am
Wm. Josiah Erikson

Didn’t help. Still crashes in X, too. There are a few more libraries I can install on the nodes to see if it makes any difference, but they’re installed on the head node and it crashes in X there – though we haven’t tried running what he runs through tractor on the head node, which also segfaults. Something else is up…. can’t figure out what. breve segfaults too… what?

May 1 at 8:50 am
Wm. Josiah Erikson

Recompiling with correct lib locations