Re: [mpls-linux-devel] Merging into the kernel?
Status: Beta
Brought to you by:
jleu
From: root <ro...@so...> - 2006-02-15 20:54:02
|
Hi, Sorry for taking so long to respond, here are a number of comments and questions.... this is also an opportunity for me to express all the ideas I've had over the last few weeks, so please excuse me if I ramble on too much :-) Firstly thanks for sending me the two patches. I've spoken to Jamal and the genetlink solution is certainly the right way to go, so I'll talk only about that particular patch from now on. You mentioned that RCU needed looking at in a previous email and I've taken a quick look over it and I'd agree that it looks like the current code is part way between RCU and a fully locked solution. Probably the simplest thing to do is just to change the list_add_rcu() in shim.c:shim_proto_add() back to a standard list_add() until the list can be converted completely (and likewise list_del_rcu() in shim.c:shim_proto_remove()) although I'd agree that an RCU implementation would be much preferable, and possibly even a prerequisite for kernel inclusion due to the already (mostly) lockless routing code. One question occurs to me though... did you consider using the xfrm code in order to interface with the higher layer protocols? I took a look at this recently since it seems to be in the right place in the stack to do this. Its fairly complicated and there seems to be at least a loose fit but I can see that some changes would be required. The issue of selecting a forwarding class being the biggest potential issue. Still it appears to me to be a lot cleaner to modify xfrm a bit and I suspect more likely to meet with more general approval. One thing which struck me about the patch was although it does put in place a lot of the groundwork, I think a more ambitious approach would be more likely to be accepted. Going for small patches is good, but also the initial patch should add some (standalone) functionality to the kernel, so there is also "too small". I'd suggest going for the minimum sized patch which can add a useful feature, say for example, adding just enough code to do MPLS forwarding and leaving other bits (the tunnel device which makes a nice patch on its own and the interface with the higher level protocols) until later. This also brings me neatly on to the forwarding code. I see that this has been implemented in three parts, which if you'll excuse my ascii artistry, or lack of it, interact as follows: /-----\ /----\ /-------\ input from ----| ILM |---| XC |---| nhlfe |---- output to netdevice \-----/ \----/ \-------/ netdevice | | | | to higher from higher protocols protocols Now both the ILM and nhlfe are composed of radix tree tables and I'm curious as to why you chose this particular system over (say, for example) a plain hash table. The cross connect interface which updates the final instruction in the ILM entry to point to an nhlfe entry is also confusing me slightly as I don't see why the ILM can't have a forwarding instruction added to it directly when its created via the netlink message. Why the extra interface? I have been giving some thought as to the efficiency of the forwarding process itself recently, with the idea of "transcoding" the instructions as provided via netlink into an efficient byte code to allow faster execution. The would appear to be considerable scope for merging certain instructions (e.g. a pull followed by a push) into one internal instruction (i.e. the interface would be the same and the effect the same so it wouldn't break the protocol at all). It should be possible to calculate, for any given instruction sequence the maximum headroom required (in the skb) to execute it in advance such that it would be possible to guarantee that only a single call to skb_realloc_headroom would be required for any particular packet. A future development might even request that device drivers always send packets with a certain amount of headroom to prevent even this call being required. Currently I see that its assumed that 32 bytes will cover all eventualities, which seems a reasonable bet for most uses. Another feature of transcoding would be to remove useless instructions (NOP) from the execution path. Is there actually any practical purpose for this instruction? I'm afraid I can't see one. The various instructions to set/get tcindex and nfmark seem like a very good plan. I'm considering writing a patch to add setting nfmark through the ipv4/6/decnet routing tables which I think would be a generally useful plan. I wonder also if using one or the other or both of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm isn't a bad idea either. If nfmark could be 1:1 with mpls fec, then it might be possible to use it together with xfrm as the interface for higher level protocols. I also have the basics of a DECnet interface to mpls (not tested yet, I'm afraid) which I've roughed out. I'll send that to you if you are interested. It doesn't look a lot different to the ipv4 one which is no surprise since thats where DECnet's routing code comes from. Steve. |