Re: [mpls-linux-devel] Merging into the kernel?
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@mi...> - 2006-03-04 03:34:09
|
Hello Steven, On Sun, Feb 26, 2006 at 05:47:22PM +0000, Steven Whitehouse wrote: > Hi, >=20 > On Wed, Feb 15, 2006 at 09:40:57PM -0600, James R. Leu wrote: <snip original XFRM discussion> > Yes - I wonder though whether we could use a different selector mechanism > but keeping some of the general framework. When I looked at it, the main > thing which struck me was that the difficulty in changing the selector > mechanism was mostly down to the interface (via netlink) to userland. > Actually changing it on the kernel side is not impossible I think. If this can be done in an efficient manor, I think it would be more readily accepted by the powers that be. Although, every time I look at the problem it still comes down to something has to be attached to a node in the L3 rou= ting table. I thought about adding a XFRM reference to the IPv4|6 nodes, but ca= me to the conclusion that a generic system (the shim layer) would be more flexible for other protocols. Also, I thought it would be easier to implement a new 'shim' hook in other L3 protocols, as opposed to implementi= ng a XFRM interface for them. Perhaps I've overlooked something. Let me know if you have any ideas about how to go about this. <snip XC/radix discussion> > And of it occured to me that in order to find this out we'd need some > tools to test against. Please find attached a patch for pktgen (as > current in davem's net-2.6.17 git tree at kernel.org) to generate > MPLS packets. >=20 > The extension allows you to add a stack of labels onto the packets its > sending out. There is one extra hack which I included: since we know > how many labels there are in the stack, I've used the bottom of stack > bit to indicate whether the label should be randomly generated or not. >=20 > You can thus push a stack of (up to 16 labels) where each label in the > stack is either a fixed value or random. >=20 > pgset "mpls 0001000a,0002000a,0000000a" >=20 > for example pushes labels 16, 32 and 0 (ipv4 null) each with a ttl of 10. > If you set the bottom of stack bit in one of the labels it will turn on > the MPLS_RND flag. You can also set and/or reset that flag in the=20 > normal way as well. >=20 > Patches to pktgen have become very popular of late it seems > so I'm going to wait until the latest set which are pending at the > moment have made it into Dave's tree before making a final diff to send > to Robert Olsson, the maintainer of pktgen. >=20 > Also if anyone has feedback about this feature, please let me know. Excellent! I will play around with this. > > > I have been giving some thought as to the efficiency of the forwarding > > > process itself recently, with the idea of "transcoding" the instructi= ons > > > as provided via netlink into an efficient byte code to allow faster > > > execution. The would appear to be considerable scope for merging cert= ain > > > instructions (e.g. a pull followed by a push) into one internal instr= uction > > > (i.e. the interface would be the same and the effect the same so it > > > wouldn't break the protocol at all). > >=20 > > I like the idea. This is much like what I'm used to in the hardware > > forwarding world. What you're kind of hinting at it a packet translation > > engine, this would make it easier to map the forwarding of packets onto= FPGA > > or ASIC based hardware (isn't there a couple of projects doing this > > for packet filtering? nf-HIPAC) > > > Its possible it might make it easier. I have to say that although I'm a > hardware engineer by training I've never really got into details of > network interfaces and what its possible to do on the cards. I wouldn't > be at all surprised if it was the case though and it would be nice to > do :-) I think this is a great idea, but would like to worry about getting the base MPLS code accepted first. After that we can work on the optimizations. > > > The various instructions to set/get tcindex and nfmark seem like a > > > very good plan. I'm considering writing a patch to add setting nfmark > > > through the ipv4/6/decnet routing tables which I think would be a > > > generally useful plan. I wonder also if using one or the other or both > > > of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm > > > isn't a bad idea either. > >=20 > > That might be against the RFCs. I know I'm already overstepping the > > RFCs by allowing the EXP bits to determine a NHLFE. > > > I wouldn't worry too much about overstepping what the RFCs say so long > as the result makes sense and the stack can still comply with them on > all the required points. The main worry with schemes like this is really > just a question of forwarding speed and whether it will slow things down > too much. >=20 > > > If nfmark could be 1:1 with mpls fec, then it might be possible to use > > > it together with xfrm as the interface for higher level protocols. > >=20 > > Not sure I follow you here. Currently with the shim setup there is no > > NHLFE lookup in the forward path, the NHLFE is bound to the IPv4|6 rout= e or > > the eb|iptables rule. > > > Ok, let me explain a bit more then.... I'm assuming a scenario where the > NHFLE is determined based upon nfmark and nfmark is set in the route > (of whatever protocol). If nfmark were also a key for xfrm then it > should be possible to "bundle" a set of dst_entry with the MPLS nhlfe > as the last entry in the stack. OK. I understand now. The existing nffwd instruction handles this but via a second lookup. Your idea would eliminate the second lookup. My technique allows the nfmark to be used at any node in a LSP, not just on the ingress LER. > I haven't got any further with the DECnet interface since I last posted > but I may well make that my next project, Let me know if I can be of assistance. > Steve. --=20 James R. Leu jl...@mi... |