Fly crashed again

April 12, 2013
by Wm. Josiah Erikson (wjens)

Identical symptoms. Again, hostname was fly.hampshire.edu on restart, again it wouldn’t mount /helga. Had to manually mount /helga, set hostname to fly.local, tentakel restart rpcidmapd (and also on the head node), then start tractor-engine (which wouldn’t start because /helga wasn’t mounted). But the real question is: why is it crashing? Do we need new hardware? I should run some tests…. I’m afraid it will go down again this weekend. However, I have to run to manager training….



6 Responses to “Fly crashed again”

  1.   wjens Says:

    …and it crashed again, but differently this time. This time it was a soft lockup, and there’s tons of this in the logs:

    Apr 22 17:47:14 fly kernel: ————[ cut here ]————
    Apr 22 17:47:14 fly kernel: WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Not tainted)
    Apr 22 17:47:14 fly kernel: Hardware name: Seabream
    Apr 22 17:47:14 fly kernel: list_add corruption. prev->next should be next (ffffffffa04a98e0), but was ffff8800a84c35e0. (prev=ffff8800a84c35e0).
    Apr 22 17:47:14 fly kernel: Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc ipmi_devintf ipmi_si ipmi_msghandler p4_clockmod freq_table speedstep_lib ipv6 ipt_REJECT xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ext3 jbd usb_storage sg microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i5000_edac edac_core i5k_amb ioatdma dca bnx2 e1000e shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]
    Apr 22 17:47:14 fly kernel: Pid: 2079, comm: nfsd Not tainted 2.6.32-220.13.1.el6.x86_64 #1
    Apr 22 17:47:14 fly kernel: Call Trace:
    Apr 22 17:47:14 fly kernel: [] ? warn_slowpath_common+0x87/0xc0
    Apr 22 17:47:14 fly kernel: [] ? warn_slowpath_fmt+0x46/0x50
    Apr 22 17:47:14 fly kernel: [] ? __list_add+0x8f/0xa0
    Apr 22 17:47:14 fly kernel: [] ? nfsd_break_deleg_cb+0x48/0xa0 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? __break_lease+0x1b5/0x3f0
    Apr 22 17:47:14 fly kernel: [] ? nfsd_break_lease+0x34/0x40 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? nfsd_unlink+0x1ba/0x290 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? nfsd4_remove+0x52/0x140 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? nfsd4_encode_operation+0x75/0x180 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? nfsd4_proc_compound+0x3d8/0x490 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? nfsd_dispatch+0xfe/0x240 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? svc_process_common+0x344/0x640 [sunrpc]
    Apr 22 17:47:14 fly kernel: [] ? default_wake_function+0x0/0x20
    Apr 22 17:47:14 fly kernel: [] ? svc_process+0x110/0x160 [sunrpc]
    Apr 22 17:47:14 fly kernel: [] ? nfsd+0xc2/0x160 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? nfsd+0x0/0x160 [nfsd]
    Apr 22 17:47:14 fly kernel: [] ? kthread+0x96/0xa0
    Apr 22 17:47:14 fly kernel: [] ? child_rip+0xa/0x20
    Apr 22 17:47:14 fly kernel: [] ? kthread+0x0/0xa0
    Apr 22 17:47:14 fly kernel: [] ? child_rip+0x0/0x20
    Apr 22 17:47:14 fly kernel: —[ end trace 8f41b2d2342f4c4e ]—
    Apr 22 17:47:40 fly kernel: ————[ cut here ]————

  2.   wjens Says:

    Interesting. Looks like this could be a kernel bug. Also, it is interesting to note that It is nfsd every single time…

  3.   wjens Says:

    Anyway, rebooted and we’re back up and running. Once the semester’s over, it’s probably time to replace fly’s head node…

  4.   wjens Says:

    Looks like it’s time to replace the head node – that will be a bit of a PITA. Anyway, it happened again:

    Message from syslogd@fly at Apr 23 23:11:10 …
    kernel:Stack:

    Message from syslogd@fly at Apr 23 23:11:10 …
    kernel:Call Trace:

    Message from syslogd@fly at Apr 23 23:11:10 …
    kernel:Code: 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 27 4d 8b 45 00 4d 39 c4 75 40 49 89 5c 24 08 4c 89 23 4c 89 6b 08 4c 8b 65 f0 49 89 5d 00 8b 5d e8 4c 8b 6d f8 c9 c3 49 89 d1 48 89 f1 48 c7 c2 c0 ac

    I tried to reboot it remotely, not sure it’ll work… probably have to wait until morning. I did get a ps aux but not sure it reveals much. Load at over 20 at this point:

    root 1 0.0 0.0 19336 1528 ? Ss 08:45 0:01 /sbin/init
    root 2 0.0 0.0 0 0 ? S 08:45 0:00 [kthreadd]
    root 3 0.0 0.0 0 0 ? S 08:45 0:00 [migration/0]
    root 4 4.2 0.0 0 0 ? S 08:45 36:19 [ksoftirqd/0]
    root 5 0.0 0.0 0 0 ? S 08:45 0:00 [migration/0]
    root 6 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/0]
    root 7 0.0 0.0 0 0 ? S 08:45 0:00 [migration/1]
    root 8 0.0 0.0 0 0 ? S 08:45 0:00 [migration/1]
    root 9 1.4 0.0 0 0 ? S 08:45 12:37 [ksoftirqd/1]
    root 10 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/1]
    root 11 0.0 0.0 0 0 ? S 08:45 0:00 [migration/2]
    root 12 0.0 0.0 0 0 ? S 08:45 0:00 [migration/2]
    root 13 1.0 0.0 0 0 ? S 08:45 9:08 [ksoftirqd/2]
    root 14 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/2]
    root 15 0.0 0.0 0 0 ? S 08:45 0:00 [migration/3]
    root 16 0.0 0.0 0 0 ? S 08:45 0:00 [migration/3]
    root 17 0.7 0.0 0 0 ? S 08:45 6:38 [ksoftirqd/3]
    root 18 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/3]
    root 19 0.0 0.0 0 0 ? S 08:45 0:00 [migration/4]
    root 20 0.0 0.0 0 0 ? S 08:45 0:00 [migration/4]
    root 21 0.4 0.0 0 0 ? S 08:45 4:13 [ksoftirqd/4]
    root 22 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/4]
    root 23 0.0 0.0 0 0 ? S 08:45 0:00 [migration/5]
    root 24 0.0 0.0 0 0 ? S 08:45 0:00 [migration/5]
    root 25 1.1 0.0 0 0 ? S 08:45 9:55 [ksoftirqd/5]
    root 26 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/5]
    root 27 0.0 0.0 0 0 ? S 08:45 0:00 [migration/6]
    root 28 0.0 0.0 0 0 ? S 08:45 0:00 [migration/6]
    root 29 1.1 0.0 0 0 ? S 08:45 10:05 [ksoftirqd/6]
    root 30 0.0 0.0 0 0 ? S 08:45 0:00 [watchdog/6]
    root 31 0.0 0.0 0 0 ? R 08:45 0:00 [migration/7]
    root 32 0.0 0.0 0 0 ? S 08:45 0:00 [migration/7]
    root 33 1.1 0.0 0 0 ? S 08:45 9:39 [ksoftirqd/7]
    root 34 0.0 0.0 0 0 ? R 08:45 0:00 [watchdog/7]
    root 35 0.0 0.0 0 0 ? S 08:45 0:02 [events/0]
    root 36 0.0 0.0 0 0 ? S 08:45 0:02 [events/1]
    root 37 0.0 0.0 0 0 ? S 08:45 0:03 [events/2]
    root 38 0.0 0.0 0 0 ? S 08:45 0:01 [events/3]
    root 39 0.0 0.0 0 0 ? S 08:45 0:03 [events/4]
    root 40 0.0 0.0 0 0 ? S 08:45 0:02 [events/5]
    root 41 0.0 0.0 0 0 ? S 08:45 0:01 [events/6]
    root 42 0.0 0.0 0 0 ? R 08:45 0:03 [events/7]
    root 43 0.0 0.0 0 0 ? S 08:45 0:00 [cgroup]
    root 44 0.0 0.0 0 0 ? S 08:45 0:00 [khelper]
    root 45 0.0 0.0 0 0 ? S 08:45 0:00 [netns]
    root 46 0.0 0.0 0 0 ? S 08:45 0:00 [async/mgr]
    root 47 0.0 0.0 0 0 ? S 08:45 0:00 [pm]
    root 48 0.0 0.0 0 0 ? S 08:45 0:00 [sync_supers]
    root 49 0.0 0.0 0 0 ? S 08:45 0:00 [bdi-default]
    root 50 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/0]
    root 51 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/1]
    root 52 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/2]
    root 53 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/3]
    root 54 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/4]
    root 55 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/5]
    root 56 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/6]
    root 57 0.0 0.0 0 0 ? S 08:45 0:00 [kintegrityd/7]
    root 58 0.0 0.0 0 0 ? S 08:45 0:01 [kblockd/0]
    root 59 0.0 0.0 0 0 ? S 08:45 0:01 [kblockd/1]
    root 60 0.0 0.0 0 0 ? S 08:45 0:00 [kblockd/2]
    root 61 0.0 0.0 0 0 ? S 08:45 0:00 [kblockd/3]
    root 62 0.0 0.0 0 0 ? S 08:45 0:00 [kblockd/4]
    root 63 0.0 0.0 0 0 ? S 08:45 0:00 [kblockd/5]
    root 64 0.0 0.0 0 0 ? S 08:45 0:00 [kblockd/6]
    root 65 0.0 0.0 0 0 ? R 08:45 0:00 [kblockd/7]
    root 66 0.0 0.0 0 0 ? S 08:45 0:00 [kacpid]
    root 67 0.0 0.0 0 0 ? S 08:45 0:00 [kacpi_notify]
    root 68 0.0 0.0 0 0 ? S 08:45 0:00 [kacpi_hotplug]
    root 69 0.0 0.0 0 0 ? S 08:45 0:00 [ata/0]
    root 70 0.0 0.0 0 0 ? S 08:45 0:00 [ata/1]
    root 71 0.0 0.0 0 0 ? S 08:45 0:00 [ata/2]
    root 72 0.0 0.0 0 0 ? S 08:45 0:00 [ata/3]
    root 73 0.0 0.0 0 0 ? S 08:45 0:00 [ata/4]
    root 74 0.0 0.0 0 0 ? S 08:45 0:00 [ata/5]
    root 75 0.0 0.0 0 0 ? S 08:45 0:00 [ata/6]
    root 76 0.0 0.0 0 0 ? S 08:45 0:00 [ata/7]
    root 77 0.0 0.0 0 0 ? S 08:45 0:00 [ata_aux]
    root 78 0.0 0.0 0 0 ? S 08:45 0:00 [ksuspend_usbd]
    root 79 0.0 0.0 0 0 ? S 08:45 0:00 [khubd]
    root 80 0.0 0.0 0 0 ? S 08:45 0:00 [kseriod]
    root 81 0.0 0.0 0 0 ? S 08:45 0:00 [md/0]
    root 82 0.0 0.0 0 0 ? S 08:45 0:00 [md/1]
    root 83 0.0 0.0 0 0 ? S 08:45 0:00 [md/2]
    root 84 0.0 0.0 0 0 ? S 08:45 0:00 [md/3]
    root 85 0.0 0.0 0 0 ? S 08:45 0:00 [md/4]
    root 86 0.0 0.0 0 0 ? S 08:45 0:00 [md/5]
    root 87 0.0 0.0 0 0 ? S 08:45 0:00 [md/6]
    root 88 0.0 0.0 0 0 ? S 08:45 0:00 [md/7]
    root 89 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/0]
    root 90 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/1]
    root 91 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/2]
    root 92 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/3]
    root 93 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/4]
    root 94 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/5]
    root 95 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/6]
    root 96 0.0 0.0 0 0 ? S 08:45 0:00 [md_misc/7]
    root 97 0.0 0.0 0 0 ? S 08:45 0:00 [khungtaskd]
    root 98 0.0 0.0 0 0 ? S 08:45 0:00 [kswapd0]
    root 99 0.0 0.0 0 0 ? SN 08:45 0:00 [ksmd]
    root 100 0.0 0.0 0 0 ? SN 08:45 0:00 [khugepaged]
    root 101 0.0 0.0 0 0 ? S 08:45 0:00 [aio/0]
    root 102 0.0 0.0 0 0 ? S 08:45 0:00 [aio/1]
    root 103 0.0 0.0 0 0 ? S 08:45 0:00 [aio/2]
    root 104 0.0 0.0 0 0 ? S 08:45 0:00 [aio/3]
    root 105 0.0 0.0 0 0 ? S 08:45 0:00 [aio/4]
    root 106 0.0 0.0 0 0 ? S 08:45 0:00 [aio/5]
    root 107 0.0 0.0 0 0 ? S 08:45 0:00 [aio/6]
    root 108 0.0 0.0 0 0 ? S 08:45 0:00 [aio/7]
    root 109 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/0]
    root 110 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/1]
    root 111 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/2]
    root 112 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/3]
    root 113 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/4]
    root 114 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/5]
    root 115 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/6]
    root 116 0.0 0.0 0 0 ? S 08:45 0:00 [crypto/7]
    root 121 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/0]
    root 122 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/1]
    root 123 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/2]
    root 124 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/3]
    root 125 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/4]
    root 126 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/5]
    root 127 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/6]
    root 128 0.0 0.0 0 0 ? S 08:45 0:00 [kthrotld/7]
    root 129 0.0 0.0 0 0 ? S 08:45 0:00 [pciehpd]
    root 131 0.0 0.0 0 0 ? S 08:45 0:00 [kpsmoused]
    root 132 0.0 0.0 0 0 ? S 08:45 0:00 [usbhid_resumer]
    root 162 0.0 0.0 0 0 ? S 08:45 0:00 [kstriped]
    root 344 0.0 0.0 0 0 ? S 08:45 0:00 [mpt_poll_0]
    root 345 0.0 0.0 0 0 ? S 08:45 0:00 [mpt/0]
    root 346 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_eh_0]
    root 353 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_eh_1]
    root 354 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_eh_2]
    root 357 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_eh_3]
    root 358 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_eh_4]
    root 428 0.0 0.0 0 0 ? D 08:46 0:00 [jbd2/sda1-8]
    root 429 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 430 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 431 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 432 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 433 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 434 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 435 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 436 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 516 0.0 0.0 10760 852 ? S<s 08:46 0:00 /sbin/udevd -d
    root 809 0.0 0.0 0 0 ? S 08:46 0:01 [edac-poller]
    root 1059 0.0 0.0 0 0 ? S 08:46 0:03 [flush-8:0]
    root 1065 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_eh_5]
    root 1066 0.0 0.0 0 0 ? S 08:46 0:18 [usb-storage]
    root 1115 0.0 0.0 0 0 ? D 08:46 0:02 [kjournald]
    root 1116 0.0 0.0 0 0 ? S 08:46 0:01 [jbd2/sda2-8]
    root 1117 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1118 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1119 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1120 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1121 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1122 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1123 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1124 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1125 0.0 0.0 0 0 ? S 08:46 0:06 [jbd2/sdb1-8]
    root 1126 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1127 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1128 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1129 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1130 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1131 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1132 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1133 0.0 0.0 0 0 ? S 08:46 0:00 [ext4-dio-unwrit]
    root 1257 0.0 0.0 0 0 ? S 08:46 0:00 [kauditd]
    root 1659 0.0 0.0 4092 140 ? S 08:46 0:00 /sbin/pppoe-server
    root 1669 0.0 0.0 6144 572 ? Ss 08:46 0:00 /sbin/portreserve
    root 1676 0.0 0.0 248680 1488 ? Sl 08:46 0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 4
    root 1709 0.0 0.0 10756 840 ? S< 08:46 0:00 /sbin/udevd -d
    root 1710 0.2 0.0 0 0 ? SN 08:46 1:58 [kipmi0]
    root 1725 0.0 0.0 9136 628 ? Ss 08:46 0:10 irqbalance
    rpc 1744 0.0 0.0 18956 956 ? Ss 08:46 0:00 rpcbind
    nobody 1768 0.8 0.0 486440 3428 ? Sl 08:46 6:54 /usr/sbin/gmetad
    root 1785 0.0 0.0 13524 724 ? Ss 08:46 0:05 lldpad -d
    root 1795 0.0 0.0 4048 308 ? Ss 08:46 0:00 /opt/rocks/sbin/sec_channel_server
    root 1802 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/0]
    root 1803 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/1]
    root 1804 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/2]
    root 1805 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/3]
    root 1806 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/4]
    root 1807 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/5]
    root 1808 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/6]
    root 1809 0.0 0.0 0 0 ? S 08:46 0:00 [scsi_tgtd/7]
    root 1810 0.0 0.0 0 0 ? S 08:46 0:00 [fc_exch_workque]
    root 1811 0.0 0.0 0 0 ? S 08:46 0:00 [fc_rport_eq]
    root 1813 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/0]
    root 1814 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/1]
    root 1815 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/2]
    root 1816 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/3]
    root 1817 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/4]
    root 1818 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/5]
    root 1819 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/6]
    root 1820 0.0 0.0 0 0 ? S< 08:46 0:00 [fcoethread/7]
    root 1821 0.0 0.0 0 0 ? S 08:46 0:00 [cnic_wq]
    root 1822 0.0 0.0 0 0 ? S 08:46 0:00 [bnx2fc]
    root 1823 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_l2_threa]
    root 1824 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/0]
    root 1825 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/1]
    root 1826 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/2]
    root 1827 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/3]
    root 1828 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/4]
    root 1829 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/5]
    root 1830 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/6]
    root 1831 0.0 0.0 0 0 ? S< 08:46 0:00 [bnx2fc_thread/7]
    root 1834 0.0 0.0 8336 544 ? Ss 08:46 0:00 /usr/sbin/fcoemon –syslog
    dbus 1846 0.0 0.0 21784 1368 ? Ss 08:46 0:00 dbus-daemon –system
    named 1858 0.0 0.1 707256 31432 ? Ssl 08:46 0:39 /usr/sbin/named -u named
    rpcuser 1887 0.0 0.0 23132 1188 ? Ss 08:46 0:00 rpc.statd
    root 1921 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/0]
    root 1922 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/1]
    root 1923 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/2]
    root 1924 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/3]
    root 1925 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/4]
    root 1926 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/5]
    root 1927 0.0 0.0 0 0 ? S 08:46 0:00 [rpciod/6]
    root 1928 0.0 0.0 0 0 ? R 08:46 0:00 [rpciod/7]
    root 1958 0.0 0.0 4064 624 ? Ss 08:47 0:00 /usr/sbin/acpid
    68 1967 0.0 0.0 25376 4224 ? Ss 08:47 0:00 hald
    root 1968 0.0 0.0 18092 1160 ? S 08:47 0:00 hald-runner
    root 2008 0.0 0.0 20208 1072 ? S 08:47 0:00 hald-addon-input: Listening on /dev/input/event3 /dev/input/event5 /dev/input/event0 /dev/input/event1
    68 2015 0.0 0.0 17792 1032 ? S 08:47 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
    root 2073 0.0 0.0 107292 280 ? Ss 08:47 0:00 rpc.rquotad
    root 2076 0.0 0.0 0 0 ? S 08:47 0:00 [lockd]
    root 2077 0.0 0.0 0 0 ? R 08:47 0:00 [nfsd4]
    root 2078 0.0 0.0 0 0 ? S 08:47 0:00 [nfsd4_callbacks]
    root 2079 0.2 0.0 0 0 ? D 08:47 1:47 [nfsd]
    root 2080 0.2 0.0 0 0 ? D 08:47 1:47 [nfsd]
    root 2081 0.2 0.0 0 0 ? D 08:47 1:47 [nfsd]
    root 2082 0.1 0.0 0 0 ? D 08:47 1:42 [nfsd]
    root 2083 0.2 0.0 0 0 ? D 08:47 2:03 [nfsd]
    root 2084 0.2 0.0 0 0 ? D 08:47 1:53 [nfsd]
    root 2085 0.2 0.0 0 0 ? D 08:47 1:53 [nfsd]
    root 2086 0.1 0.0 0 0 ? D 08:47 1:41 [nfsd]
    root 2089 0.0 0.0 21408 1196 ? Ss 08:47 0:00 rpc.mountd
    root 2112 0.0 0.0 6748 428 ? Ss 08:47 0:00 /usr/sbin/mcelog –daemon
    sge 2233 0.1 0.0 259936 6136 ? Sl 08:47 1:05 /opt/gridengine/bin/lx26-amd64/sge_qmaster
    root 2281 0.0 0.0 196896 4576 ? S 08:47 0:14 /usr/sbin/snmpd -LS0-6d -Lf /dev/null -p /var/run/snmpd.pid
    root 2292 0.0 0.0 64068 1116 ? Ss 08:47 0:00 /usr/sbin/sshd
    root 2300 0.0 0.0 22076 988 ? Ss 08:47 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
    ntp 2308 0.0 0.0 30144 1632 ? Ss 08:47 0:00 ntpd -A -u ntp:ntp -p /var/run/ntpd.pid
    root 2328 0.0 0.0 110248 1484 ? S 08:47 0:00 /bin/sh /opt/rocks/bin/mysqld_safe –defaults-file=/opt/rocks/etc/my.cnf –datadir=/var/opt/rocks/mysql –pid-file=/var/opt/rocks/mysql/fly.ham
    rocksdb 2417 0.0 0.0 249492 10448 ? Sl 08:47 0:00 /opt/rocks/libexec/mysqld –defaults-file=/opt/rocks/etc/my.cnf –basedir=/opt/rocks –datadir=/var/opt/rocks/mysql –user=rocksdb –log-error=
    dhcpd 2418 0.0 0.0 44460 1988 ? Ss 08:47 0:00 /usr/sbin/dhcpd -user dhcpd -group dhcpd eth0
    nagios 2435 0.0 0.0 38860 1192 ? Ss 08:47 0:03 nrpe -c /etc/nagios/nrpe.cfg -d
    root 2514 0.0 0.0 78228 3300 ? Ss 08:47 0:00 /usr/libexec/postfix/master
    postfix 2523 0.0 0.0 78496 3424 ? S 08:47 0:00 qmgr -l -t fifo -u
    root 2527 0.0 0.0 284640 10800 ? Ss 08:47 0:01 /usr/sbin/httpd
    root 2535 0.0 0.0 117184 1240 ? Ss 08:47 0:00 crond
    root 2546 0.0 0.0 21424 472 ? Ss 08:47 0:00 /usr/sbin/atd
    root 2578 0.0 0.0 9180 2072 ? Sl 08:47 0:01 ./PixarLicenseServer -x ./pixar.license -v -log /var/tmp/PixarLicenseServer.log
    root 2593 0.0 0.0 72972 3540 ? S 08:47 0:00 /opt/rocks/bin/tracker-server
    root 2597 0.0 0.0 123520 2100 ? Ss 08:47 0:00 /usr/sbin/gdm-binary -nodaemon
    root 2604 0.0 0.0 4048 536 tty2 Ss+ 08:47 0:00 /sbin/mingetty /dev/tty2
    root 2606 0.0 0.0 4048 536 tty3 Ss+ 08:47 0:00 /sbin/mingetty /dev/tty3
    root 2609 0.0 0.0 10756 836 ? S< 08:47 0:00 /sbin/udevd -d
    root 2610 0.0 0.0 4048 540 tty4 Ss+ 08:47 0:00 /sbin/mingetty /dev/tty4
    root 2612 0.0 0.0 4048 540 tty5 Ss+ 08:47 0:00 /sbin/mingetty /dev/tty5
    root 2614 0.0 0.0 4048 540 tty6 Ss+ 08:47 0:00 /sbin/mingetty /dev/tty6
    apache 2620 0.0 0.0 284552 5960 ? S 08:47 0:00 /usr/sbin/httpd
    apache 2622 0.0 0.1 304796 29272 ? S 08:47 0:34 /usr/sbin/httpd
    apache 2626 0.0 0.0 287696 13140 ? S 08:47 0:33 /usr/sbin/httpd
    apache 2627 0.0 0.1 303040 27764 ? S 08:47 0:34 /usr/sbin/httpd
    apache 2628 0.0 0.1 307724 32892 ? S 08:47 0:35 /usr/sbin/httpd
    apache 2629 0.0 0.1 302700 27712 ? S 08:47 0:33 /usr/sbin/httpd
    apache 2632 0.0 0.1 303028 28168 ? S 08:47 0:34 /usr/sbin/httpd
    apache 2634 0.0 0.0 286200 10784 ? S 08:47 0:31 /usr/sbin/httpd
    apache 2636 0.0 0.1 305628 29444 ? S 08:47 0:33 /usr/sbin/httpd
    nobody 2818 0.2 0.1 164956 21496 ? Ss 08:47 2:26 /usr/sbin/gmond
    root 2829 0.0 0.0 154120 2984 ? S 08:47 0:00 /usr/libexec/gdm-simple-slave –display-id /org/gnome/DisplayManager/Display1 –force-active-vt
    root 2831 0.0 0.0 114124 14884 tty1 Ss+ 08:47 0:07 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-VespAL/database -nolisten tcp vt1
    root 2959 0.0 0.0 4113820 3376 ? Sl 08:47 0:00 /usr/sbin/console-kit-daemon –no-daemon
    gdm 3033 0.0 0.0 20020 644 ? S 08:47 0:00 /usr/bin/dbus-launch –exit-with-session
    gdm 3034 0.0 0.0 21500 936 ? Ss 08:47 0:00 /bin/dbus-daemon –fork –print-pid 5 –print-address 7 –session
    gdm 3035 0.0 0.0 263556 7396 ? Ssl 08:47 0:00 /usr/bin/gnome-session –autostart=/usr/share/gdm/autostart/LoginWindow/
    root 3044 0.0 0.0 45088 2556 ? S 08:47 0:00 /usr/libexec/devkit-power-daemon
    gdm 3050 0.0 0.0 134236 4400 ? S 08:47 0:00 /usr/libexec/gconfd-2
    gdm 3071 0.0 0.0 119580 4252 ? S 08:47 0:00 /usr/libexec/at-spi-registryd
    gdm 3072 0.0 0.0 343160 11792 ? Ssl 08:47 0:01 /usr/libexec/gnome-settings-daemon –gconf-prefix=/apps/gdm/simple-greeter/settings-manager-plugins
    gdm 3076 0.0 0.0 436156 3056 ? Ssl 08:47 0:00 /usr/libexec/bonobo-activation-server –ac-activate –ior-output-fd=12
    gdm 3090 0.0 0.0 132452 1824 ? S 08:47 0:00 /usr/libexec/gvfsd
    gdm 3093 0.0 0.0 277492 9340 ? S 08:47 0:00 metacity
    gdm 3095 0.0 0.0 275428 11648 ? S 08:47 0:00 plymouth-log-viewer –icon
    gdm 3096 0.0 0.0 309036 10720 ? S 08:47 0:00 gnome-power-manager
    gdm 3097 0.0 0.0 374652 15324 ? S 08:47 0:01 /usr/libexec/gdm-simple-greeter
    gdm 3098 0.0 0.0 242740 6944 ? S 08:47 0:00 /usr/libexec/polkit-gnome-authentication-agent-1
    root 3104 0.0 0.0 49644 3788 ? S 08:47 0:00 /usr/libexec/polkit-1/polkitd
    root 3128 0.0 0.0 139304 2044 ? S 08:47 0:00 pam: gdm-password
    root 3132 0.0 0.0 452188 1860 ? Ssl 08:47 0:00 automount –pid-file /var/run/autofs.pid
    root 3472 0.0 0.0 0 0 ? S< 08:54 0:00 [kslowd000]
    root 3473 0.0 0.0 0 0 ? S< 08:54 0:00 [kslowd001]
    root 3474 0.0 0.0 0 0 ? S 08:54 0:00 [nfsiod]
    root 3522 0.0 0.0 27368 824 ? Ss 08:54 0:00 rpc.idmapd
    root 3550 0.0 0.0 2535888 13544 ? Sl 08:54 0:49 /opt/pixar/tractor-engine-1.6.3/tractor-engine –port=8000 –configdir=/helga/global/wc/config/pixar/tractor –log=/var/log/tractor-engine.log
    apache 4700 0.0 0.1 303312 27852 ? S 09:15 0:32 /usr/sbin/httpd
    apache 5754 0.0 0.0 286072 10604 ? S 09:25 0:28 /usr/sbin/httpd
    apache 7719 0.0 0.1 303308 28524 ? S 09:40 0:31 /usr/sbin/httpd
    jdavila 8189 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-1 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8191 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-1-2 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8193 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-3 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8195 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-4 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8197 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-5 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8201 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-1-7 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8203 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-8 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8206 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-1-9 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8208 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-10 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8210 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-11 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8212 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-12 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8214 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-1-13 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8216 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-14 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8218 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-15 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8220 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-16 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8222 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-1-17 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8224 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-1-18 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8226 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-1 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8228 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-2 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8230 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-3 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8232 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-4 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8234 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-5 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8236 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-6 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8238 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-2-7 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8240 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-8 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8242 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-2-9 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8244 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-2-10 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8246 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-4-1 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8248 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-4-2 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8250 0.0 0.0 57696 3192 ? S 09:45 0:00 ssh compute-4-3 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8252 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-4-4 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    jdavila 8254 0.0 0.0 57696 3188 ? S 09:45 0:00 ssh compute-4-5 python /home/jdavila/run_parallel_experiments_here/jaimes_little_emergent_pusher.py
    apache 18976 0.0 0.0 286072 10608 ? S 12:10 0:24 /usr/sbin/httpd
    postfix 26169 0.0 0.0 78308 3240 ? S 22:05 0:00 pickup -l -t fifo -u
    root 29071 0.0 0.0 140040 1672 ? S 23:01 0:00 CROND
    root 29072 0.0 0.0 9216 1176 ? Ss 23:01 0:00 /bin/bash /usr/bin/run-parts /etc/cron.hourly
    root 29099 0.0 0.0 9216 1040 ? S 23:01 0:00 /bin/bash /etc/cron.hourly/mcelog.cron
    root 29100 0.0 0.0 9072 760 ? S 23:01 0:00 awk -v progname /etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? { print; }
    root 29101 0.0 0.0 4640 648 ? D 23:01 0:00 /usr/sbin/mcelog –ignorenodev –filter
    root 29606 0.1 0.0 97816 3820 ? S 23:10 0:00 sshd: josiah [priv]
    josiah 29611 0.0 0.0 97816 1816 ? S 23:10 0:00 sshd: josiah@pts/0
    josiah 29612 0.2 0.0 110500 1920 pts/0 Ss 23:10 0:00 -bash
    josiah 29736 2.0 0.0 110196 1080 pts/0 R+ 23:10 0:00 ps aux
    [josiah@fly ~]$ top

  5.   wjens Says:

    It passed memtest86+ with flying colors. Right now I have it booted with the righthand CPU as you look at it from the front missing. If it crashes again, I will put that one back in, remove the other one, and go from there…

  6.   wjens Says:

    Fly crashed again at around 3:46AM yesterday morning. It’s the blank-console, caps-lock and scroll-lock blinking kind this time. Swapping processors and rebooting. Next step: kernel upgrade.

Leave a Reply

You must be logged in to post a comment.