From: Mike W. <we...@cs...> - 2006-11-29 17:41:44
|
A few years back Iphase was kind enough to provide me the source for their unified (5515/5575) Solaris driver. I de-unified the driver and produced Linux derivatives for both the 5515 and the 5575. For the last few years I have used my 5515 driver under a variety of heavy loads on UP and SMP systems with no observed failures. In contrast I can readily induce failure of BOTH my 5575 driver AND the "iphase.c" 5575 driver by using either in an NFS server that is subject to heavy loads. Oddly neither will fail under "always-on" TCP transfers in either directions. So I infer that the problem is associated with the "stop and wait" nature of NFS. I also have observed a related anomaly running a remote xterm on a system containing the 5575. The following code (taken from my 5575 driver) describes my musings on that issue. My conjecture (unproven) is that it is some manner of hardware race between rfred and the CPU and thus very difficult to fix in a driver! (I am unable to produce any of the problems on the 5515) Mike /* * Packet received */ if (status & (R_PKT_RCVD | R_LRG_FREEQ_MT)) { #ifdef DBG_RECV printk("IA 5575: RFred reception. State is %x \n", state); #endif state = ia_get32 (&rf->state_reg) & 0xffff; while (!(state & R_PCQ_EMPTY)) { count += 1; if (ia_proc_rcv_pkt (softc) == IA_FAIL) { printk("IA 5575: proc rcvd packet returned errror \n"); break; } /* This sorry state of affairs is related to what appears to be */ /* RFred's ability to do things atomically and this fast CPU's */ /* ability to outrun RFred. Here is what has been observed: */ /* When running text editor with a remote xterm displayed on */ /* the system with the 5575, one can visually see delays on */ /* order of 1/2 second before the bottom couple of lines are */ /* updated when paging through a file. Simultaneously looking */ /* at both ends with tcpdump one can see the remote system */ /* transmit a small packet.. delay half a second or so and */ /* timeout and retransmit. On RFreds system we see both */ /* packets arrive virtually simultaneously after a 1/2 second */ /* quiet period... The situation seems most common when the */ /* second packet is quite small.. leading to the theory that */ /* the packet has arrived before processing of its predecessor */ /* completes.. but in some sort of way that causes it neither */ /* to turn off the R_PCQ_EMPTY bit before we exit here and */ /* also NOT to hang an interrupt when it finally does get it */ /* done?? This hack solves the problem and also provides */ /* better capability for handling multiple packets per int */ /* under heavy loads. The #if 1 hack didn't solve the NFS */ /* problem though */ #if 1 { int i; for (i = 0; i < 10; i++) { state = ia_get32 (&rf->state_reg) & 0xffff; if (!(state & R_PCQ_EMPTY)) break; udelay(2); } } #endif state = ia_get32 (&rf->state_reg) & 0xffff; } } |