[mpls-linux-devel] Re: Here it comes... big patch for 2.6.1-mpls-1.910 pre series
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@da...> - 2004-01-31 03:21:32
|
I'm looking forward to testing this one. Lots of good stuff. Just a couple of notes: -I'm cool with the name change (mpls_instruction -> mpls_instr :-) I think the reason I used mpls_instruction versus mpls_instr is that at one time I had a structure member called mpls_instr, so I was trying to minimize confusion. I think this is no longer the case. -I agree locking is an issue. Mitigating access to the tree is not enough, we need to guard against multiple 'threads' changing the same MOI/MII, but per MOI/MII locking seems overkill. What about just enforcing a write lock on the tree when a 'thread' wants to modify a MOI/MII? -I'm still not sure about the linked list of instructions .. -We can already 'share' instructions via indirection with the FWD operation (multiple MOIs/MIIs with a FWD to a common MOI) and it already maintains the notion of 'parenthood' -size, each instruction now has an extra pointer compared to just one int holding the size of the array -if the elements of the link list are not in the same memory locality we will get cache misses which will ultimatly cause poor performace in the fast path. Although the data for the instructions might be in a different locality, so cache misses are inevitable. But a couple of cache misses versus potential cache flailing is better. In the end I'm not sure if any of this matters, but it is just my gut feeling that a dynamically allocated array is better then a linked list in this case. Let's test what you got in the patch and maybe down the road we can to some profiling and see if it matters. On Thu, Jan 29, 2004 at 04:01:54PM +0100, Ramon Casellas wrote: > > > Hi James, all > (I don't know if others read the list though, and I cannot > see the list of subscribers) > > Well, here it comes :) let the serious testing for the 2.6_pre series > begin! (I am still interested in UML for 2.6). This patch aplies to you r > head as of today, it may be 'bug prone' and it is not well tested enough, > but w.f.m (works for me).Remark: I have removed support for some legacy > ioctls (debug, labelspaces, mii proto) use sysfs. > > your comments are of course, welcome. > > Other than testing, we need now to focus on locking schemes. I *think* > that current locking is not enough, and we may need to keep 'per MII/MOI > object locking' and not only the tree with a read/write lock. > > Scenarios: > 1. softirq (bh) forwarding data path reader grabs tree lock and > gets a ref to a MII object, releases lock. If, in any case, a writer > (process context) maybe in another CPU locks the tree, gets a ref to the > same MII object and releases the lock (in other words the lock just > protected tree access) Can data corruption occur? > > > 2. Two writers in different CPUs (or in same CPU) both in process context > grab a ref to the same MII object for updating. lock serializes access to > the tree, but once they both have a reference, data corruption/races may > occur. (Maybe I am braindead, but this one looks clear to me....) > Solution: add a per MII/MOI object readers (softirq) writers > (ioctls/process context) lock. This locking granularity should be enough. > > Not sure, I still need to think about all this. > > > Cheers, > Ramon > > > > > Notes: > =================== > * mpls_instruction was to long for me ... :) I hope you dont mind saying > mpls_instr. > > * Instructions are now reference counted objects that use kobj. As any ref > counted object, and in special kobjects, we need to provide a 'release' > method (mpls_instr_release), so, given a kobj, we obtain the container and > all cleanup is done (and memory is freed) there. That's why I had to add > the 'parent' and 'direction' attributes. > > * Instruction sets for MII/MOI objects are linked lists, and IMHO, the > advantage of dynamic arrays is simplicity in error paths, but linked lists > will allow us later to do intruction sharing (see below) > > * Since they are now refcounted objects and instruction sets are > implemented as linked lists, a nice idea for the future (for post 2.6) is > to _share_ opcodes: there exists a single MPLS_OP_SET that forwards the > labelled packet to the next hop and can be referenced by several MOI > objects (LSP merging, for example). Opcodes form a directed graph, each > MOI/MII object follows a path, and graph nodes are refcounted. > We willneed to solve some 'parenthood' issues, though, but it is something we > should beep in mind (and, fortunately, sysfs has the notion of symlink > !))) > > > > example: usage > > # mplsadm3 -A -I gen:40:0 > /sys/mpls/ > |-- debug > |-- fullname > |-- labelspace > |-- mpls_instr > |-- mpls_mii > | `-- mii-gen:40:0-163841 > | |-- OP-00-POP > | | `-- data > | |-- OP-01-PEEK > | | `-- data > | |-- age > | |-- bytes > | |-- drops > | |-- key > | |-- labelspace > | |-- packets > | |-- proto > | `-- refcnt > |-- mpls_moi > |-- mpls_tunnel > `-- version > > # mplsadm3 -A -O 0 > /sys/mpls/ > |-- debug > |-- fullname > |-- labelspace > |-- mpls_instr > |-- mpls_mii > | `-- mii-gen:40:0-163841 > | |-- OP-00-POP > | | `-- data > | |-- OP-01-PEEK > | | `-- data > | |-- age > | |-- bytes > | |-- drops > | |-- key > | |-- labelspace > | |-- packets > | |-- proto > | `-- refcnt > |-- mpls_moi > | `-- moi-2 > | |-- age > | |-- bytes > | |-- drops > | |-- key > | |-- mtu > | |-- mtu_limit > | |-- packets > | |-- propagate_ttl > | `-- refcnt > |-- mpls_tunnel > `-- version > > > # echo "ath0:0" > /sys/mpls/labelspace > # cat /proc/net/mpls/labelspaces > Iface LSpc Refcnt > lo -1 12 > eth0 -1 3 > ath0 0 12 > > # mplsadm3 -A -T mpls0 > > |-- mpls_tunnel > | `-- mtp-mpls0 > | |-- dev > | `-- moi > > > > # mplsadm3 -O 2 -o push:gen:666:set:ath0:ipv4:192.168.0.1 > /sys/mpls/ > |-- debug > |-- fullname > |-- labelspace > |-- mpls_instr > |-- mpls_mii > | `-- mii-gen:40:0-163841 > | |-- OP-00-POP > | | `-- data > | |-- OP-01-PEEK > | | `-- data > | |-- age > | |-- bytes > | |-- drops > | |-- key > | |-- labelspace > | |-- packets > | |-- proto > | `-- refcnt > |-- mpls_moi > | `-- moi-2 > | |-- OP-00-PUSH > | | `-- data > | |-- OP-01-SET > | | `-- data > | |-- age > | |-- bytes > | |-- drops > | |-- key > | |-- mtu > | |-- mtu_limit > | |-- packets > | |-- propagate_ttl > | `-- refcnt > |-- mpls_tunnel > | `-- mtp-mpls0 > | |-- dev > | `-- moi > `-- version > > ATTRIBUTES > ================================ > # echo "2" > /sys/mpls/mpls_tunnel/mtp-mpls0/moi > # cat /sys/mpls/mpls_tunnel/mtp-mpls0/moi > moi: 2(0x00000002) 0/0/0 5 > > # echo "0x0800" > /sys/mpls/mpls_mii/mii-gen\:40\:0-163841/proto > # cat /sys/mpls/mpls_mii/mii-gen\:40\:0-163841/proto > 0x0800 > > CROSSCONNECTS > ================================= > # tree /sys/mpls > (cut) > | `-- mii-gen:40:0-163841 > | |-- OP-00-POP > | | `-- data > | |-- OP-01-PEEK > | | `-- data > > # mplsadm3 -B -I gen:40:0 -O 2 > > | `-- mii-gen:40:0-163841 > | |-- OP-00-POP > | | `-- data > | |-- OP-01-FWD > | | `-- data > > # mplsadm3 -U -I gen:40:0 -O 2 > |-- mpls_mii > | `-- mii-gen:40:0-163841 > | |-- OP-00-POP > | | `-- data > | |-- OP-01-PEEK > > > -- James R. Leu jl...@mi... |