Thread: [mpls-linux-devel] Merging into the kernel?
Status: Beta
Brought to you by:
jleu
From: Steven W. <st...@ch...> - 2006-01-23 10:59:57
|
Hi, I've been looking recently at the MPLS for Linux project as I'd like to see an implementation of MPLS merged into the standard Linux kernel. I've read a number of the mailing list archives and googled what I can find on the subject. I spotted an item in Dave M's network TODO list indicating that currently plans were "stuck in the mud". Is this currently still the case? If so then I'd like to help lend a hand in moving things forward, always assuming that adding another developer would be helpful of course! :-) Steve. |
From: Ramon C. <cas...@in...> - 2006-01-23 11:09:54
|
On Mon, 23 Jan 2006, Steven Whitehouse wrote: > Hi, > > I spotted an item in Dave M's network TODO list indicating that currently > plans were "stuck in the mud". Is this currently still the case? there were some issues with regard to two concurrent implementations. I assume you read some exchanged emails, but James would be in a better position to discuss this. > If so then I'd like to help lend a hand in moving things forward, always > assuming that adding another developer would be helpful of course! :-) I must admit I did not follow what happened after, work seems to have focused on James' implementation. Afaik, DaveM and Jamal proposed some code, and James applied some of their ideas to mpls-linux James? R. |
From: James R. L. <jl...@mi...> - 2006-01-23 16:40:33
|
Before the holidays I had started down path to getting mpls-linux into the kernel (again). I emailed jamal. My first goal is to get the infrastructure that mpls-linux needs to interact with L3 protocols put in place. Im my implementation this is the 'shim' infrastructure (look in the devel achieves for a patch I've posted). I sent two patches for jamal's review, davem's technique for interacting with L3 and the mpls-linux technique. They are very similar, if you ask me I think the mpls-linux method is more elegant and less intrusive. That is where we stand. I'm sure jamal forgot about it with the holidays and all. I plan on picking up this task again soon. I was trying to get 1.950 finished (quagga support is broken still), but there is not reason this process can't be done at the same time. So how can others help? Review the code. Test the code. In particular the locking scheme (RCU) needs to be reviewed. In addition there is some known issues with the netdevice notification handler (the list of NHLFE is not being maintained correctly with respect to instructions, add/delete, and device changes). There was a bug report posted to general late last summer that had some details. On Mon, Jan 23, 2006 at 11:13:14AM +0000, Steven Whitehouse wrote: > Hi, >=20 > I've been looking recently at the MPLS for Linux project as I'd like > to see an implementation of MPLS merged into the standard Linux kernel. > I've read a number of the mailing list archives and googled what I > can find on the subject. >=20 > I spotted an item in Dave M's network TODO list indicating that currently > plans were "stuck in the mud". Is this currently still the case? >=20 > If so then I'd like to help lend a hand in moving things forward, always > assuming that adding another developer would be helpful of course! :-) >=20 > Steve. >=20 >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D103432&bid=3D230486&dat= =3D121642 > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel --=20 James R. Leu jl...@mi... |
From: Steven W. <st...@ch...> - 2006-01-31 17:30:49
|
Hi, Thanks for the info on the current situation. I'm sorry for the slow reply (this is a spare-time project for me at least at the moment so it has to fill odd moments that I can find). I've started to take a more in depth look at the code. I'll send a patch or two once I get a chance to do a bit of testing. In the meantime, here are a few more questions/comments: On Mon, Jan 23, 2006 at 10:39:41AM -0600, James R. Leu wrote: > Before the holidays I had started down path to getting mpls-linux > into the kernel (again). I emailed jamal. My first goal is to get the > infrastructure that mpls-linux needs to interact with L3 protocols put > in place. Im my implementation this is the 'shim' infrastructure (look > in the devel achieves for a patch I've posted). > This sounds like a good plan. Its always better to break things in to smaller units provided each is useful in its own right, when submitting to the kernel. > I sent two patches for jamal's review, davem's technique for > interacting with L3 and the mpls-linux technique. They are very similar, > if you ask me I think the mpls-linux method is more elegant and less > intrusive. > What is davem's technique in this context? > That is where we stand. I'm sure jamal forgot about it with the holidays > and all. I plan on picking up this task again soon. I was trying to get > 1.950 finished (quagga support is broken still), but there is not reason > this process can't be done at the same time. > > So how can others help? Review the code. Test the code. In particular > the locking scheme (RCU) needs to be reviewed. In addition there is > some known issues with the netdevice notification handler (the list of NHLFE > is not being maintained correctly with respect to instructions, add/delete, > and device changes). There was a bug report posted to general late last > summer that had some details. > Ok. I will take a look at those as soon as I can get properly familiar with the code. Can I presume that your perforce archive contains in the mpls-kernel-1.1 directory all your latest kernel code so that I can just do a sync from time to time to keep uptodate? Or are there any patches elsewhere I should know about? Steve. |
From: James R. L. <jl...@mi...> - 2006-01-31 17:46:19
|
On Tue, Jan 31, 2006 at 05:44:14PM +0000, Steven Whitehouse wrote: > Hi, >=20 > Thanks for the info on the current situation. I'm sorry for the slow > reply (this is a spare-time project for me at least at the moment so it h= as to > fill odd moments that I can find). Understood. Same here. > I've started to take a more in depth look at the code. I'll send a > patch or two once I get a chance to do a bit of testing. In the > meantime, here are a few more questions/comments: >=20 > On Mon, Jan 23, 2006 at 10:39:41AM -0600, James R. Leu wrote: > > Before the holidays I had started down path to getting mpls-linux > > into the kernel (again). I emailed jamal. My first goal is to get the > > infrastructure that mpls-linux needs to interact with L3 protocols put > > in place. Im my implementation this is the 'shim' infrastructure (look > > in the devel achieves for a patch I've posted). > > > This sounds like a good plan. Its always better to break things in to > smaller units provided each is useful in its own right, when submitting > to the kernel. > =20 > > I sent two patches for jamal's review, davem's technique for > > interacting with L3 and the mpls-linux technique. They are very simila= r, > > if you ask me I think the mpls-linux method is more elegant and less > > intrusive. > > > What is davem's technique in this context? I'll send you two patches, one showing the shim technique the other showing the equivalent code from DaveM's implementation. > > That is where we stand. I'm sure jamal forgot about it with the holida= ys > > and all. I plan on picking up this task again soon. I was trying to g= et > > 1.950 finished (quagga support is broken still), but there is not reason > > this process can't be done at the same time. I've talked with Jamal, and he is working on a way to help out in this effort. > > So how can others help? Review the code. Test the code. In particular > > the locking scheme (RCU) needs to be reviewed. In addition there is > > some known issues with the netdevice notification handler (the list of = NHLFE > > is not being maintained correctly with respect to instructions, add/del= ete, > > and device changes). There was a bug report posted to general late last > > summer that had some details. > > > Ok. I will take a look at those as soon as I can get properly familiar > with the code. Can I presume that your perforce archive contains in > the mpls-kernel-1.1 directory all your latest kernel code so that I can j= ust > do a sync from time to time to keep uptodate? Or are there any patches > elsewhere I should know about? Head of line from my perforce tree is the latest greatest.=20 If you use a view of: =20 //depot/iproute2-mpls-1.1/... //client-name/iproute2/... //depot/mpls-kernel-1.1/... //client-name/kernel/... You would get the minimum tools needed to setup basic MPLS LSPs and map IPv4/6 traffic to them. =20 >=20 > Steve. > =20 --=20 James R. Leu jl...@mi... |
From: root <ro...@so...> - 2006-02-15 20:54:02
|
Hi, Sorry for taking so long to respond, here are a number of comments and questions.... this is also an opportunity for me to express all the ideas I've had over the last few weeks, so please excuse me if I ramble on too much :-) Firstly thanks for sending me the two patches. I've spoken to Jamal and the genetlink solution is certainly the right way to go, so I'll talk only about that particular patch from now on. You mentioned that RCU needed looking at in a previous email and I've taken a quick look over it and I'd agree that it looks like the current code is part way between RCU and a fully locked solution. Probably the simplest thing to do is just to change the list_add_rcu() in shim.c:shim_proto_add() back to a standard list_add() until the list can be converted completely (and likewise list_del_rcu() in shim.c:shim_proto_remove()) although I'd agree that an RCU implementation would be much preferable, and possibly even a prerequisite for kernel inclusion due to the already (mostly) lockless routing code. One question occurs to me though... did you consider using the xfrm code in order to interface with the higher layer protocols? I took a look at this recently since it seems to be in the right place in the stack to do this. Its fairly complicated and there seems to be at least a loose fit but I can see that some changes would be required. The issue of selecting a forwarding class being the biggest potential issue. Still it appears to me to be a lot cleaner to modify xfrm a bit and I suspect more likely to meet with more general approval. One thing which struck me about the patch was although it does put in place a lot of the groundwork, I think a more ambitious approach would be more likely to be accepted. Going for small patches is good, but also the initial patch should add some (standalone) functionality to the kernel, so there is also "too small". I'd suggest going for the minimum sized patch which can add a useful feature, say for example, adding just enough code to do MPLS forwarding and leaving other bits (the tunnel device which makes a nice patch on its own and the interface with the higher level protocols) until later. This also brings me neatly on to the forwarding code. I see that this has been implemented in three parts, which if you'll excuse my ascii artistry, or lack of it, interact as follows: /-----\ /----\ /-------\ input from ----| ILM |---| XC |---| nhlfe |---- output to netdevice \-----/ \----/ \-------/ netdevice | | | | to higher from higher protocols protocols Now both the ILM and nhlfe are composed of radix tree tables and I'm curious as to why you chose this particular system over (say, for example) a plain hash table. The cross connect interface which updates the final instruction in the ILM entry to point to an nhlfe entry is also confusing me slightly as I don't see why the ILM can't have a forwarding instruction added to it directly when its created via the netlink message. Why the extra interface? I have been giving some thought as to the efficiency of the forwarding process itself recently, with the idea of "transcoding" the instructions as provided via netlink into an efficient byte code to allow faster execution. The would appear to be considerable scope for merging certain instructions (e.g. a pull followed by a push) into one internal instruction (i.e. the interface would be the same and the effect the same so it wouldn't break the protocol at all). It should be possible to calculate, for any given instruction sequence the maximum headroom required (in the skb) to execute it in advance such that it would be possible to guarantee that only a single call to skb_realloc_headroom would be required for any particular packet. A future development might even request that device drivers always send packets with a certain amount of headroom to prevent even this call being required. Currently I see that its assumed that 32 bytes will cover all eventualities, which seems a reasonable bet for most uses. Another feature of transcoding would be to remove useless instructions (NOP) from the execution path. Is there actually any practical purpose for this instruction? I'm afraid I can't see one. The various instructions to set/get tcindex and nfmark seem like a very good plan. I'm considering writing a patch to add setting nfmark through the ipv4/6/decnet routing tables which I think would be a generally useful plan. I wonder also if using one or the other or both of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm isn't a bad idea either. If nfmark could be 1:1 with mpls fec, then it might be possible to use it together with xfrm as the interface for higher level protocols. I also have the basics of a DECnet interface to mpls (not tested yet, I'm afraid) which I've roughed out. I'll send that to you if you are interested. It doesn't look a lot different to the ipv4 one which is no surprise since thats where DECnet's routing code comes from. Steve. |
From: James R. L. <jl...@mi...> - 2006-02-16 03:41:51
|
Hey there Steve, On Wed, Feb 15, 2006 at 09:07:50PM +0000, root wrote: >=20 > Hi, >=20 > Sorry for taking so long to respond, here are a number of comments and > questions.... this is also an opportunity for me to express all the ideas > I've had over the last few weeks, so please excuse me if I ramble on too > much :-) >=20 > Firstly thanks for sending me the two patches. I've spoken to Jamal and > the genetlink solution is certainly the right way to go, so I'll talk > only about that particular patch from now on. You mentioned that RCU > needed looking at in a previous email and I've taken a quick look over > it and I'd agree that it looks like the current code is part way between > RCU and a fully locked solution. Agreed. My understand of RCU is not complete. I've been trying to use the existing implementations throughout the kernel as examples, but none seem to fit my scenario exactly. I thought I had the shim_* stuff correct (I used the dev_remove_pack/dev_add_pack as a guide). > Probably the simplest thing to do is just to change the list_add_rcu() > in shim.c:shim_proto_add() back to a standard list_add() until the list > can be converted completely (and likewise list_del_rcu() in=20 > shim.c:shim_proto_remove()) although I'd agree that an RCU implementation > would be much preferable, and possibly even a prerequisite for kernel > inclusion due to the already (mostly) lockless routing code. Understood. I've converted back to list_del() and list_add(). > One question occurs to me though... did you consider using the xfrm code > in order to interface with the higher layer protocols? I took a look at > this recently since it seems to be in the right place in the stack to > do this. Its fairly complicated and there seems to be at least a loose > fit but I can see that some changes would be required. The issue of selec= ting > a forwarding class being the biggest potential issue. Still it appears to= me > to be a lot cleaner to modify xfrm a bit and I suspect more likely to meet > with more general approval. Yes. I recently spent a significant amount of time understanding the XFRM code just to realize that it cannot be tied to a specific route. The selector mechanism is much like netfilter, ie it does not do longest prefix match. Infact implementing a XFRM shim module would bring route based IPSEC VPNs to linux (without having to use a virtual interface). I also looked at tc actions, but there too, tc is more like netfilter then a LPM. > One thing which struck me about the patch was although it does put > in place a lot of the groundwork, I think a more ambitious approach would > be more likely to be accepted. Going for small patches is good, but also > the initial patch should add some (standalone) functionality to the kerne= l, > so there is also "too small". I'd suggest going for the minimum sized pat= ch > which can add a useful feature, say for example, adding just enough code = to do > MPLS forwarding and leaving other bits (the tunnel device which makes a n= ice > patch on its own and the interface with the higher level protocols) until > later. Agreed. The first patch should contain the minimal MPLS implementation. > This also brings me neatly on to the forwarding code. I see that this has > been implemented in three parts, which if you'll excuse my ascii artistry, > or lack of it, interact as follows: >=20 >=20 > /-----\ /----\ /-------\ > input from ----| ILM |---| XC |---| nhlfe |---- output to > netdevice \-----/ \----/ \-------/ netdevice > | | > | | > to higher from higher > protocols protocols > > Now both the ILM and nhlfe are composed of radix tree tables and I'm > curious as to why you chose this particular system over (say, for example) > a plain hash table. The cross connect interface which updates the final > instruction in the ILM entry to point to an nhlfe entry is also confusing > me slightly as I don't see why the ILM can't have a forwarding instruction > added to it directly when its created via the netlink message. Why the > extra interface? The XC netlink interface is there to assist signaling protocols. It is very common to create an ILM that terminates locally and then at a later time XCs to a NHLFE, and at even a later time, swing the XC to a different NHLFE. With that being said, there is nothing that the XC netlink interface does that cannot be done by just modifying the instructions via the ILM netlink interface. Why use a radix tree? Originally it was just for ease of implementation. Now it is because the radix tree lookup for the ILM provides deterministic search times. That being said, I have no problem with changing to a multi tier hash scheme as long as it can provide better performance (includ= ing corner cases). > I have been giving some thought as to the efficiency of the forwarding > process itself recently, with the idea of "transcoding" the instructions > as provided via netlink into an efficient byte code to allow faster > execution. The would appear to be considerable scope for merging certain > instructions (e.g. a pull followed by a push) into one internal instructi= on > (i.e. the interface would be the same and the effect the same so it > wouldn't break the protocol at all). I like the idea. This is much like what I'm used to in the hardware forwarding world. What you're kind of hinting at it a packet translation engine, this would make it easier to map the forwarding of packets onto FPGA or ASIC based hardware (isn't there a couple of projects doing this for packet filtering? nf-HIPAC) > It should be possible to calculate, for any given instruction sequence > the maximum headroom required (in the skb) to execute it in advance > such that it would be possible to guarantee that only a single call > to skb_realloc_headroom would be required for any particular packet. > A future development might even request that device drivers always send > packets with a certain amount of headroom to prevent even this call > being required. Currently I see that its assumed that 32 bytes will > cover all eventualities, which seems a reasonable bet for most uses. Currently the size of the stack is being tracked to handle MTU issues. The really painful case for head allocation is ethernet over MPLS (22 bytes= ). > Another feature of transcoding would be to remove useless instructions > (NOP) from the execution path. Is there actually any practical purpose > for this instruction? I'm afraid I can't see one. There is no purpose to the NOOP code. It is left over cruft that can be removed. > The various instructions to set/get tcindex and nfmark seem like a > very good plan. I'm considering writing a patch to add setting nfmark > through the ipv4/6/decnet routing tables which I think would be a > generally useful plan. I wonder also if using one or the other or both > of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm > isn't a bad idea either. That might be against the RFCs. I know I'm already overstepping the RFCs by allowing the EXP bits to determine a NHLFE. > If nfmark could be 1:1 with mpls fec, then it might be possible to use > it together with xfrm as the interface for higher level protocols. Not sure I follow you here. Currently with the shim setup there is no NHLFE lookup in the forward path, the NHLFE is bound to the IPv4|6 route or the eb|iptables rule. > I also have the basics of a DECnet interface to mpls (not tested yet, > I'm afraid) which I've roughed out. I'll send that to you if you are > interested. It doesn't look a lot different to the ipv4 one which is > no surprise since thats where DECnet's routing code comes from. Cool. Having another person implement to the shim interface is a good test of the functionality. Like I mentioned above, I have the beginning of a XFRM shim implementation for the purpose of route based IPSEC VPNs. > Steve. >=20 >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D103432&bid=3D230486&dat= =3D121642 > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel --=20 James R. Leu jl...@mi... |
From: Steven W. <st...@ch...> - 2006-02-26 17:33:17
Attachments:
pktgen-mpls.diff
|
Hi, On Wed, Feb 15, 2006 at 09:40:57PM -0600, James R. Leu wrote: > Hey there Steve, > [some things cut to avoid reposting too much] > > > One question occurs to me though... did you consider using the xfrm code > > in order to interface with the higher layer protocols? I took a look at > > this recently since it seems to be in the right place in the stack to > > do this. Its fairly complicated and there seems to be at least a loose > > fit but I can see that some changes would be required. The issue of selecting > > a forwarding class being the biggest potential issue. Still it appears to me > > to be a lot cleaner to modify xfrm a bit and I suspect more likely to meet > > with more general approval. > > Yes. I recently spent a significant amount of time understanding the XFRM > code just to realize that it cannot be tied to a specific route. The > selector mechanism is much like netfilter, ie it does not do longest prefix > match. Infact implementing a XFRM shim module would bring route based > IPSEC VPNs to linux (without having to use a virtual interface). > > I also looked at tc actions, but there too, tc is more like netfilter then > a LPM. > Yes - I wonder though whether we could use a different selector mechanism but keeping some of the general framework. When I looked at it, the main thing which struck me was that the difficulty in changing the selector mechanism was mostly down to the interface (via netlink) to userland. Actually changing it on the kernel side is not impossible I think. > > > This also brings me neatly on to the forwarding code. I see that this has > > been implemented in three parts, which if you'll excuse my ascii artistry, > > or lack of it, interact as follows: > > > > > > /-----\ /----\ /-------\ > > input from ----| ILM |---| XC |---| nhlfe |---- output to > > netdevice \-----/ \----/ \-------/ netdevice > > | | > > | | > > to higher from higher > > protocols protocols > > > > Now both the ILM and nhlfe are composed of radix tree tables and I'm > > curious as to why you chose this particular system over (say, for example) > > a plain hash table. The cross connect interface which updates the final > > instruction in the ILM entry to point to an nhlfe entry is also confusing > > me slightly as I don't see why the ILM can't have a forwarding instruction > > added to it directly when its created via the netlink message. Why the > > extra interface? > > The XC netlink interface is there to assist signaling protocols. It is > very common to create an ILM that terminates locally and then at a later > time XCs to a NHLFE, and at even a later time, swing the XC to a different > NHLFE. With that being said, there is nothing that the XC netlink interface > does that cannot be done by just modifying the instructions via the ILM > netlink interface. > > Why use a radix tree? Originally it was just for ease of implementation. > Now it is because the radix tree lookup for the ILM provides deterministic > search times. That being said, I have no problem with changing to a > multi tier hash scheme as long as it can provide better performance (including > corner cases). > And of it occured to me that in order to find this out we'd need some tools to test against. Please find attached a patch for pktgen (as current in davem's net-2.6.17 git tree at kernel.org) to generate MPLS packets. The extension allows you to add a stack of labels onto the packets its sending out. There is one extra hack which I included: since we know how many labels there are in the stack, I've used the bottom of stack bit to indicate whether the label should be randomly generated or not. You can thus push a stack of (up to 16 labels) where each label in the stack is either a fixed value or random. pgset "mpls 0001000a,0002000a,0000000a" for example pushes labels 16, 32 and 0 (ipv4 null) each with a ttl of 10. If you set the bottom of stack bit in one of the labels it will turn on the MPLS_RND flag. You can also set and/or reset that flag in the normal way as well. Patches to pktgen have become very popular of late it seems so I'm going to wait until the latest set which are pending at the moment have made it into Dave's tree before making a final diff to send to Robert Olsson, the maintainer of pktgen. Also if anyone has feedback about this feature, please let me know. > > I have been giving some thought as to the efficiency of the forwarding > > process itself recently, with the idea of "transcoding" the instructions > > as provided via netlink into an efficient byte code to allow faster > > execution. The would appear to be considerable scope for merging certain > > instructions (e.g. a pull followed by a push) into one internal instruction > > (i.e. the interface would be the same and the effect the same so it > > wouldn't break the protocol at all). > > I like the idea. This is much like what I'm used to in the hardware > forwarding world. What you're kind of hinting at it a packet translation > engine, this would make it easier to map the forwarding of packets onto FPGA > or ASIC based hardware (isn't there a couple of projects doing this > for packet filtering? nf-HIPAC) > Its possible it might make it easier. I have to say that although I'm a hardware engineer by training I've never really got into details of network interfaces and what its possible to do on the cards. I wouldn't be at all surprised if it was the case though and it would be nice to do :-) > > > The various instructions to set/get tcindex and nfmark seem like a > > very good plan. I'm considering writing a patch to add setting nfmark > > through the ipv4/6/decnet routing tables which I think would be a > > generally useful plan. I wonder also if using one or the other or both > > of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm > > isn't a bad idea either. > > That might be against the RFCs. I know I'm already overstepping the > RFCs by allowing the EXP bits to determine a NHLFE. > I wouldn't worry too much about overstepping what the RFCs say so long as the result makes sense and the stack can still comply with them on all the required points. The main worry with schemes like this is really just a question of forwarding speed and whether it will slow things down too much. > > If nfmark could be 1:1 with mpls fec, then it might be possible to use > > it together with xfrm as the interface for higher level protocols. > > Not sure I follow you here. Currently with the shim setup there is no > NHLFE lookup in the forward path, the NHLFE is bound to the IPv4|6 route or > the eb|iptables rule. > Ok, let me explain a bit more then.... I'm assuming a scenario where the NHFLE is determined based upon nfmark and nfmark is set in the route (of whatever protocol). If nfmark were also a key for xfrm then it should be possible to "bundle" a set of dst_entry with the MPLS nhlfe as the last entry in the stack. I haven't got any further with the DECnet interface since I last posted but I may well make that my next project, Steve. |
From: James R. L. <jl...@mi...> - 2006-03-04 03:34:09
|
Hello Steven, On Sun, Feb 26, 2006 at 05:47:22PM +0000, Steven Whitehouse wrote: > Hi, >=20 > On Wed, Feb 15, 2006 at 09:40:57PM -0600, James R. Leu wrote: <snip original XFRM discussion> > Yes - I wonder though whether we could use a different selector mechanism > but keeping some of the general framework. When I looked at it, the main > thing which struck me was that the difficulty in changing the selector > mechanism was mostly down to the interface (via netlink) to userland. > Actually changing it on the kernel side is not impossible I think. If this can be done in an efficient manor, I think it would be more readily accepted by the powers that be. Although, every time I look at the problem it still comes down to something has to be attached to a node in the L3 rou= ting table. I thought about adding a XFRM reference to the IPv4|6 nodes, but ca= me to the conclusion that a generic system (the shim layer) would be more flexible for other protocols. Also, I thought it would be easier to implement a new 'shim' hook in other L3 protocols, as opposed to implementi= ng a XFRM interface for them. Perhaps I've overlooked something. Let me know if you have any ideas about how to go about this. <snip XC/radix discussion> > And of it occured to me that in order to find this out we'd need some > tools to test against. Please find attached a patch for pktgen (as > current in davem's net-2.6.17 git tree at kernel.org) to generate > MPLS packets. >=20 > The extension allows you to add a stack of labels onto the packets its > sending out. There is one extra hack which I included: since we know > how many labels there are in the stack, I've used the bottom of stack > bit to indicate whether the label should be randomly generated or not. >=20 > You can thus push a stack of (up to 16 labels) where each label in the > stack is either a fixed value or random. >=20 > pgset "mpls 0001000a,0002000a,0000000a" >=20 > for example pushes labels 16, 32 and 0 (ipv4 null) each with a ttl of 10. > If you set the bottom of stack bit in one of the labels it will turn on > the MPLS_RND flag. You can also set and/or reset that flag in the=20 > normal way as well. >=20 > Patches to pktgen have become very popular of late it seems > so I'm going to wait until the latest set which are pending at the > moment have made it into Dave's tree before making a final diff to send > to Robert Olsson, the maintainer of pktgen. >=20 > Also if anyone has feedback about this feature, please let me know. Excellent! I will play around with this. > > > I have been giving some thought as to the efficiency of the forwarding > > > process itself recently, with the idea of "transcoding" the instructi= ons > > > as provided via netlink into an efficient byte code to allow faster > > > execution. The would appear to be considerable scope for merging cert= ain > > > instructions (e.g. a pull followed by a push) into one internal instr= uction > > > (i.e. the interface would be the same and the effect the same so it > > > wouldn't break the protocol at all). > >=20 > > I like the idea. This is much like what I'm used to in the hardware > > forwarding world. What you're kind of hinting at it a packet translation > > engine, this would make it easier to map the forwarding of packets onto= FPGA > > or ASIC based hardware (isn't there a couple of projects doing this > > for packet filtering? nf-HIPAC) > > > Its possible it might make it easier. I have to say that although I'm a > hardware engineer by training I've never really got into details of > network interfaces and what its possible to do on the cards. I wouldn't > be at all surprised if it was the case though and it would be nice to > do :-) I think this is a great idea, but would like to worry about getting the base MPLS code accepted first. After that we can work on the optimizations. > > > The various instructions to set/get tcindex and nfmark seem like a > > > very good plan. I'm considering writing a patch to add setting nfmark > > > through the ipv4/6/decnet routing tables which I think would be a > > > generally useful plan. I wonder also if using one or the other or both > > > of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm > > > isn't a bad idea either. > >=20 > > That might be against the RFCs. I know I'm already overstepping the > > RFCs by allowing the EXP bits to determine a NHLFE. > > > I wouldn't worry too much about overstepping what the RFCs say so long > as the result makes sense and the stack can still comply with them on > all the required points. The main worry with schemes like this is really > just a question of forwarding speed and whether it will slow things down > too much. >=20 > > > If nfmark could be 1:1 with mpls fec, then it might be possible to use > > > it together with xfrm as the interface for higher level protocols. > >=20 > > Not sure I follow you here. Currently with the shim setup there is no > > NHLFE lookup in the forward path, the NHLFE is bound to the IPv4|6 rout= e or > > the eb|iptables rule. > > > Ok, let me explain a bit more then.... I'm assuming a scenario where the > NHFLE is determined based upon nfmark and nfmark is set in the route > (of whatever protocol). If nfmark were also a key for xfrm then it > should be possible to "bundle" a set of dst_entry with the MPLS nhlfe > as the last entry in the stack. OK. I understand now. The existing nffwd instruction handles this but via a second lookup. Your idea would eliminate the second lookup. My technique allows the nfmark to be used at any node in a LSP, not just on the ingress LER. > I haven't got any further with the DECnet interface since I last posted > but I may well make that my next project, Let me know if I can be of assistance. > Steve. --=20 James R. Leu jl...@mi... |