From: chas w. - C. <ch...@cm...> - 2006-10-14 00:13:12
|
In message <ECF...@em...>,"Talbert, Scott" writes: >In kernel 2.6.11, it appears that you re-wrote the LEC ARP code to use >lec_arp_lock exclusively (and eliminated lec_arp_users). Was the old >method causing a problem (data corruption/panics) or did you change it >just for better design. well, the history goes something like this: http://lkml.org/lkml/2003/2/20/164 so it was called a lock originally, but it was really more like a reference count. however, nothing in lec_arp_timer keeps another part of the code from running. on a single processor, this was fine since lec_arp_timer ran inside an interrupt and no one else could run. on smp, this isnt enough. early versions of the 2.4 locking conversions isnt quite right: http://oss.sgi.com/archives/netdev/2005-01/msg00154.html >I ask because we've run into a problem where our kernel 2.4 boxes seem >to be getting corrupted LEC ARP caches and eventually panicing when a >certain device on our network sends out a large burst of LE_ARP traffic. >I wonder if I should try putting your changes into our 2.4 kernel. i can see how this would happen with versions of the 2.4 code. the latest 2.6 kernel has a new lec.c that's properly indented and the lec_arp_cache has been converted to an hlist with reference counting. the reference counting is a bit weak but it keeps you need to hold lec_arp_lock for "long" operations. the hlist conversion makes the code a bit easier to follow and should keep you from having to debug the custom link list code. |