Re: [E1000-devel] Detected Tx Unit Hang

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

re-added the list for tracking...

I think I see the issue, you have more than 4GB ram, and it appears that your system doesn't handle dual address cycles correctly, or our adapter doesn't work quite right for some reason.

Force the OS to never allow addresses > 4GB to our hardware using this patch:
https://sourceforge.net/tracker2/download.php?group_id=42302&atid=447449&file_id=283326&aid=2007017

its the e1000_disable_dac.patch file.

________________________________
From: Gary W. Smith [mailto:ga...@pr...]
Sent: Thursday, March 12, 2009 12:55 PM
To: Brandeburg, Jesse
Subject: RE: [E1000-devel] Detected Tx Unit Hang

Jesse,

Included is the messages log with the debug patch.  It only took a couple seconds to get it to trigger the problem even with the modprobe.conf changes.

options e1000 TxDescriptorStep=4,4
alias eth0 e1000
alias eth1 e1000
Anyway, I did update the BIOS about a month back to try to see if that would resolve the problem but it did not.  It does have the latest.  We say a similar problem under Windows 2003 with SP1+ but ruled it as being part of the TCP offload /DOS patch bug they had and I didn't think much of it (as it affected several other servers).  The problem under Windows existed whether or not we used the onboard nic.  In fact, we used a seperate BroadComm 1GB adapter (thinking it was the TCP offload) and it didn't resolve it either.

I'm really hopping that this isn't a hardware issue (as it's not a warranteed box) but if it is then we will just have to deal with that seperately.

Thanks for alll of the help,

Gary

________________________________
From: Brandeburg, Jesse [mailto:jes...@in...]
Sent: Thu 3/12/2009 9:33 AM
To: Gary W. Smith
Cc: e10...@li...
Subject: RE: [E1000-devel] Detected Tx Unit Hang

sorry, go to the home page http://sourceforge.net/projects/e1000
click Tracker
click patches
click tx hang debug code (all releases) - 1460945
download the e1000_806_dump.patch, it should apply with fuzz to your e1000 driver directory with the command

download file.patch...
patch -d e1000-8.0.* -p1 < file.patch

here is the download link
https://sourceforge.net/tracker2/download.php?group_id=42302&atid=447451&file_id=298629&aid=1460945

________________________________
From: Gary W. Smith [mailto:ga...@pr...]
Sent: Thursday, March 12, 2009 9:16 AM
To: Brandeburg, Jesse
Cc: e10...@li...
Subject: RE: [E1000-devel] Detected Tx Unit Hang

Excuse my ignorance, but which patches? ;).  There's a lot of stuff on the download page.  I assume you are talking about the I/OAT driver & kernel patch but I want to make sure before doing it.

>
> Mar 11 18:50:01 vcsoaknas01 kernel: e1000: eth0: e1000_clean_tx_irq:
> Detected Tx Unit Hang
> Mar 11 18:50:01 vcsoaknas01 kernel:   Tx Queue             <0>
> Mar 11 18:50:01 vcsoaknas01 kernel:   TDH                  <f7>
> Mar 11 18:50:01 vcsoaknas01 kernel:   TDT                  <f7>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_use          <f7>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_clean        <24>
> Mar 11 18:50:01 vcsoaknas01 kernel: buffer_info[next_to_clean]
> Mar 11 18:50:01 vcsoaknas01 kernel:   time_stamp           <1004de0b1>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_watch        <24>
> Mar 11 18:50:01 vcsoaknas01 kernel:   jiffies              <1004dec18>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_watch.status <0>

this really indicates that the adapter is finishing all the work but that
the descriptor is not making it back to main memory indicating the work
was completed.  We have seen this a lot with AMD systems, in particular
ones with VIA chipsets.  There is a bad bug in those machines when an IO
device and the processor both write to the same cache line.

also, if the above workaround doesn't help we'll want you to install the
dump patch from the patches section of e1000.sourceforge.net and send us
the output when you get a tx hang.

hope this helps,
 Jesse

Re: [E1000-devel] Detected Tx Unit Hang

Moved to github.com/intel

Re: [E1000-devel] Detected Tx Unit Hang