Re: [Linux-ATM-General] LEC ARP Cache Locking

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

In message <ECF...@em...>,"Talbert, Scott" writes:
>In kernel 2.6.11, it appears that you re-wrote the LEC ARP code to use
>lec_arp_lock exclusively (and eliminated lec_arp_users).  Was the old
>method causing a problem (data corruption/panics) or did you change it
>just for better design.

well, the history goes something like this:

http://lkml.org/lkml/2003/2/20/164

so it was called a lock originally, but it was really more like
a reference count.  however, nothing in lec_arp_timer keeps another
part of the code from running.  on a single processor, this was
fine since lec_arp_timer ran inside an interrupt and no one else
could run.  on smp, this isnt enough.

early versions of the 2.4 locking conversions isnt quite right:

http://oss.sgi.com/archives/netdev/2005-01/msg00154.html

>I ask because we've run into a problem where our kernel 2.4 boxes seem
>to be getting corrupted LEC ARP caches and eventually panicing when a
>certain device on our network sends out a large burst of LE_ARP traffic.
>I wonder if I should try putting your changes into our 2.4 kernel.

i can see how this would happen with versions of the 2.4 code.  the latest
2.6 kernel has a new lec.c that's properly indented and the lec_arp_cache
has been converted to an hlist with reference counting.  the reference
counting is a bit weak but it keeps you need to hold lec_arp_lock for
"long" operations.  the hlist conversion makes the code a bit easier to
follow and should keep you from having to debug the custom link list code.