weird eth2 issue

May 21, 2014
by admin

eth2 on fly (fly1) went down, with the following error message in dmesg:

————[ cut here ]————
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Hardware name: PowerEdge 2950
NETDEV WATCHDOG: eth2 (bnx2): transmit queue 0 timed out
Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc p4_clockmod freq_table speedstep_lib ipt_REJECT xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ses enclosure bnx2 sg serio_raw lpc_ich mfd_core i5000_edac edac_core i5k_amb shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix usb_storage radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]
Pid: 0, comm: swapper Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Call Trace:
[] ? warn_slowpath_common+0x87/0xc0
[] ? warn_slowpath_fmt+0x46/0x50
[] ? dev_watchdog+0x26b/0x280
[] ? scheduler_tick+0x11e/0x260
[] ? dev_watchdog+0x0/0x280
[] ? run_timer_softirq+0x197/0x340
[] ? tick_dev_program_event+0x65/0xc0
[] ? __do_softirq+0xc1/0x1e0
[] ? tick_program_event+0x2a/0x30
[] ? call_softirq+0x1c/0x30
[] ? do_softirq+0x65/0xa0
[] ? irq_exit+0x85/0x90
[] ? smp_apic_timer_interrupt+0x4a/0x60
[] ? apic_timer_interrupt+0x13/0x20
[] ? mwait_idle+0x77/0xd0
[] ? atomic_notifier_call_chain+0x1a/0x20
[] ? cpu_idle+0xb6/0x110
[] ? start_secondary+0x2ac/0x2ef
—[ end trace 0131d3805b9feaaf ]—
bnx2 0000:05:00.0: eth2:
bnx2 0000:05:00.0: eth2: RV2P_PFTQ_CTL 00010000
bnx2 0000:05:00.0: eth2: RV2P_TFTQ_CTL 00020000
bnx2 0000:05:00.0: eth2: RV2P_MFTQ_CTL 00020000
bnx2 0000:05:00.0: eth2: TBDR_FTQ_CTL 00004002
bnx2 0000:05:00.0: eth2: TDMA_FTQ_CTL 00010002
bnx2 0000:05:00.0: eth2: TXP_FTQ_CTL 00010002
bnx2 0000:05:00.0: eth2: TXP_FTQ_CTL 00010002
bnx2 0000:05:00.0: eth2: TPAT_FTQ_CTL 00010000
bnx2 0000:05:00.0: eth2: RXP_CFTQ_CTL 00008000
bnx2 0000:05:00.0: eth2: RXP_FTQ_CTL 00100000
bnx2 0000:05:00.0: eth2: COM_COMXQ_FTQ_CTL 00010000
bnx2 0000:05:00.0: eth2: COM_COMTQ_FTQ_CTL 00020000
bnx2 0000:05:00.0: eth2: COM_COMQ_FTQ_CTL 00010000
bnx2 0000:05:00.0: eth2: CP_CPQ_FTQ_CTL 00008000
bnx2 0000:05:00.0: eth2: CPU states:
bnx2 0000:05:00.0: eth2: 045000 mode b84c state 80001000 evt_mask 500 pc 8000bf8 pc 8000bf0 instr 1f82821
bnx2 0000:05:00.0: eth2: 085000 mode b84c state 80001000 evt_mask 500 pc 800068c pc 8000694 instr 3c180800
bnx2 0000:05:00.0: eth2: 0c5000 mode b84c state 80001000 evt_mask 500 pc 80044c4 pc 80044c8 instr 32a20003
bnx2 0000:05:00.0: eth2: 105000 mode b84c state 80001000 evt_mask 500 pc 8000774 pc 800074c instr af8a0014
bnx2 0000:05:00.0: eth2: 145000 mode b880 state 80000000 evt_mask 500 pc 8004e10 pc 8000f58 instr 3e00008
bnx2 0000:05:00.0: eth2: 185000 mode b84c state 80008000 evt_mask 500 pc 80006f8 pc 800042c instr 3c0c0800
bnx2 0000:05:00.0: eth2:
bnx2 0000:05:00.0: eth2:
bnx2 0000:05:00.0: eth2: TBDC free cnt: 32
bnx2 0000:05:00.0: eth2: LINE CID BIDX CMD VALIDS
bnx2 0000:05:00.0: eth2: 00 000800 c488 00 [0]
bnx2 0000:05:00.0: eth2: 01 000800 c488 00 [0]
bnx2 0000:05:00.0: eth2: 02 000800 3060 00 [0]
bnx2 0000:05:00.0: eth2: 03 000800 9aa8 00 [0]
bnx2 0000:05:00.0: eth2: 04 000800 9ab0 00 [0]
bnx2 0000:05:00.0: eth2: 05 000800 7528 00 [0]
bnx2 0000:05:00.0: eth2: 06 000800 7538 00 [0]
bnx2 0000:05:00.0: eth2: 07 000800 7500 00 [0]
bnx2 0000:05:00.0: eth2: 08 000800 73c8 00 [0]
bnx2 0000:05:00.0: eth2: 09 000800 73e8 00 [0]
bnx2 0000:05:00.0: eth2: 0a 000800 c408 00 [0]
bnx2 0000:05:00.0: eth2: 0b 000800 c410 00 [0]
bnx2 0000:05:00.0: eth2: 0c 000800 c390 00 [0]
bnx2 0000:05:00.0: eth2: 0d 000800 c3a8 00 [0]
bnx2 0000:05:00.0: eth2: 0e 000800 c3b0 00 [0]
bnx2 0000:05:00.0: eth2: 0f 000800 d238 00 [0]
bnx2 0000:05:00.0: eth2: 10 000800 d268 00 [0]
bnx2 0000:05:00.0: eth2: 11 000800 d2c8 00 [0]
bnx2 0000:05:00.0: eth2: 12 000800 d2b8 00 [0]
bnx2 0000:05:00.0: eth2: 13 000800 d2c0 00 [0]
bnx2 0000:05:00.0: eth2: 14 000800 2ff8 00 [0]
bnx2 0000:05:00.0: eth2: 15 000800 3048 00 [0]
bnx2 0000:05:00.0: eth2: 16 000800 3098 00 [0]
bnx2 0000:05:00.0: eth2: 17 000800 3d40 00 [0]
bnx2 0000:05:00.0: eth2: 18 15e600 e640 cb [0]
bnx2 0000:05:00.0: eth2: 19 0d4200 db90 91 [0]
bnx2 0000:05:00.0: eth2: 1a 110e00 1a70 18 [0]
bnx2 0000:05:00.0: eth2: 1b 0e6f00 cc08 42 [0]
bnx2 0000:05:00.0: eth2: 1c 08b800 0ae0 0f [0]
bnx2 0000:05:00.0: eth2: 1d 1c1680 8208 18 [0]
bnx2 0000:05:00.0: eth2: 1e 0a8800 82c0 17 [0]
bnx2 0000:05:00.0: eth2: 1f 109180 fdd8 81 [0]
bnx2 0000:05:00.0: eth2:
bnx2 0000:05:00.0: eth2: DEBUG: intr_sem[0] PCI_CMD[02b8055e]
bnx2 0000:05:00.0: eth2: DEBUG: PCI_PM[1d002000] PCI_MISC_CFG[81020088]
bnx2 0000:05:00.0: eth2: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
bnx2 0000:05:00.0: eth2: DEBUG: RPM_MGMT_PKT_CTRL[00000000]
bnx2 0000:05:00.0: eth2: DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
bnx2 0000:05:00.0: eth2:
bnx2 0000:05:00.0: eth2: DEBUG: MCP_STATE_P0[00000106] MCP_STATE_P1[ffffffff]
bnx2 0000:05:00.0: eth2: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500]
bnx2 0000:05:00.0: eth2: DEBUG: pc[08007298] pc[08004cb4] instr[afbf0014]
bnx2 0000:05:00.0: eth2: DEBUG: shmem states:
bnx2 0000:05:00.0: eth2: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f] drv_pulse_mb[00005e5b]
bnx2 0000:05:00.0: eth2: DEBUG: dev_info_signature[44564905] reset_type[01005254] condition[00000106]
bnx2 0000:05:00.0: eth2: DEBUG: 000001c0: 01005254 42530000 00000106 fbffffff
bnx2 0000:05:00.0: eth2: DEBUG: 000003cc: 44444444 44444444 44444444 00000a28
bnx2 0000:05:00.0: eth2: DEBUG: 000003dc: 0004ffff 00000000 00000000 00000000
bnx2 0000:05:00.0: eth2: DEBUG: 000003ec: 00000000 00000000 00000000 00a22fa0
bnx2 0000:05:00.0: eth2: DEBUG: 0x3fc[0000ffff]
bnx2 0000:05:00.0: eth2:
bnx2 0000:05:00.0: eth2: NIC Copper Link is Down

ifdown eth2 and then ifup eth2 appears to have fixed the issue… for now. I’m going to post to the ROCKS list to see if there’s anything else I should do about it.



2 Responses to “weird eth2 issue”

  1.   admin Says:

    It happened again.

  2.   admin Says:

    Thanks, that’s exactly my bug, even on the same hardware!
    I’ll reboot with intremap=off and hopefully it’ll fix it.
    -Josiah

    On 5/21/14 12:55 PM, Luca Clementi wrote:
    > On Wed, May 21, 2014 at 7:21 AM, Wm. Josiah Erikson
    > wrote:
    >> Hi guys,
    >> Perhaps I should post this elsewhere, because it’s not exactly a ROCKS
    >> issue, but it is an issue with the version of the kernel in ROCKS 6.1.1, so
    >> I thought I’d post it here first.
    >> I recently installed ROCKS 6.1.1 on a Dell PowerEdge 2950 with 32GB of
    >> RAM and a second dual-port gigabit ethernet card using the bnx2 driver. My
    >> cluster’s third interface, eth2, which is on went down this morning, with
    >> the following error in the logs. Doing an ifdown/ifup fixed it, but if
    >> anybody knows of anything else I should do to prevent it from happening
    >> again, please let me know:
    >
    > On this bug:
    > https://bugs.centos.org/view.php?id=6011
    > They also suggest to boot with intremap=off parameters.
    >
    > Clem
    >

Leave a Reply

You must be logged in to post a comment.