Thread: [mpls-linux-devel] Re: 2.6 Spec: Random comments.
Status: Beta
Brought to you by:
jleu
From: Jamal H. S. <ha...@zn...> - 2004-02-13 14:11:02
|
Maybe we can move this discussion to the list and leave Dave alone; we can ping him when we need to verify things from him. I am trying to cc the list as a test. On Fri, 2004-02-13 at 06:26, Ramon Casellas wrote: > Some comments, > > (I'm still reading the spec, and slowly looking at the main entry points > and hooks in the code, so please be patient and bear with me) > > > RCAS 20040213: I see the utility of FEC Id, but I am not fond of the name. > The name Fec Id implies that it is "a FEC identifier". Essentially it is an identifier of a NHLFE entry. So you are right naming it a FEC identifier may not be the best. > What worries me is > the mapping FECId -> NHLFE (for example, in LSP merging, two FECids could > be mapped to the same NHLFE index), and the fact that a FECId should be a > member of a NHLFE entry... I dont wanna call the so-far-called fecid lspid but it is close. > Moreover, core LSRs should be FEC agnostic. This was my main comment last > time. Basically, the FECid is the label itself. The label implicitely > identifies the FEC as is the "key" to use to forward the packet. Otherwise > you have at the same time label mappings and fecid management (signalling > protocols) I wouldnt call it a label at all. It is the key used to search the NHLFE. Some implementations dont allow setting of such a parameter (one of the vendors i looked at did actually) - they will tell you what it is. Essentially it is an identifier of a NHLFE entry (not index). A collection of these NHLFE entries could be used by the same LSP. There is a further entry that can be used to store extra LSP info (refer to parameter "index") Given the above info, suggest a new name. Maybe NHid? > > > "ILM and FTN derive a FECid from their respective lookups" > > I would propose : "ILM and FTN derive a [list of - multipath] NHLFE index > [es] from their respective lookups [...] These indexes and incoming > labelspaces are then used to lookup the NHLFE to determine how to forward > the packet." > Ok, that is useful. I have not tested multipath but it should work with Linux routing ECMP at least. I wouldnt call it NHLFE indices rather these identifiers so far called fecid; Also i would think most of these lists would contain a single entry. BTW, the ILM is not multihop ready. We should be able to add easily. Also there is no controil on how the multihop selection is done with the linux routing table - whatever Linux ECMP does goes. We should be able to fix the ILM with an algorith selector. > A standard structure for NHLFE contains: > - FEC id > > RCAS: Is this field really necessary)? a NHLFE entry could be shared by > several 'FECsId'... > Look at my description above. The ability to select this value by policy allows us to be able to select the NHLFE entries from other subsystems; eg a u32 classifier on ingress could select all IP addresses from 10.1.1.1/24 to have a fecid of 10. The skb->fecid is then set to 10. When the packet gets to the point of NHLFE entry selection this value is used to override/select the NHLFE entry. > And a couple of questions (please consider them as questions from someone > who has limited experience in kernel programming) > > * I think that adding struct mpls_nhlfe_route *mpls to a dst_entry is a > little intrusive, and somehow the "genericity" of the DST is being lost. > Would not it be better to use : > > struct mpls_dst > { > union > { > struct dst_entry dst; > struct mpls_dst *md_next; > } u; > .... > > and manage MPLS dsts from the mpls subsystem? I understand that using your > approach it is easier to get MPLS information from skb->dst->mpls but I > don't know, it seems a too strong coupling between MPLS and generic dst > management. Well, just food for thoughts. dsts are still managed from the MPLS code. There is some generic stuff (create, destriy, gc etc) for which there is no point in recreating in the MPLS code The way it is right now works fine. What could probably have been a better approach is to stack dsts. It would require some surgery and i am not sure i have the patience for it. Mayeb we can ask Dave on his thoughts on this. cheers, jamal |
From: Ramon C. <cas...@in...> - 2004-02-13 15:21:53
|
On Fri, 13 Feb 2004, Jamal Hadi Salim wrote: disclaimer: from now on, all mails will be sent to mpl...@li... ( that is what I wa going to do, but then i received your email about the mailing list failing) > Essentially it is an identifier of a NHLFE entry. > So you are right naming it a FEC identifier may not be the best. So the relationship is: 1 FEC F to N available objects 1 Incoming Label/Labelspace to N available objects > I dont wanna call the so-far-called fecid lspid but it is close. I see what you mean. Let us check what the RFC says: The "Incoming Label Map" (ILM) maps each incoming label to a set of NHLFEs. (...) If the ILM maps a particular label to a set of NHLFEs that contains more than one element, exactly one element of the set must be chosen before the packet is forwarded. The procedures for choosing an element from the set are beyond the scope of this document. Having the ILM map a label to a set containing more than one NHLFE may be useful if, e.g., it is desired to do load balancing over multiple equal-cost paths. (RCAS: N.B. you don't need equal cost paths)... > Given the above info, suggest a new name. Maybe NHid? I would say something like a "fwd_id" from "Forward Id", or "out_id" it should not prelude one or other Next Hop. > > Ok, that is useful. I have not tested multipath but it should > Also i would think most of these lists would contain a single entry. Yes, unicast MPLS with no load sharing enabled. However, you may need them when doing multicast and/or load sharing. > BTW, the ILM is not multihop ready. We should be able to add easily. Do you really need it to? Just hold a set of fwd_ids. The policy to select one should be configurable (discipline) & pluggable. A common impl. Is a hash table. The RFC also defines the interaction with routing in this case. (although vaguely) > The ability to select this value by policy allows us to be able to > select the NHLFE entries from other subsystems; eg a u32 classifier > on ingress could select all IP addresses from 10.1.1.1/24 to have a > fecid of 10. The skb->fecid is then set to 10. When the packet gets to I see, but as long as it is not called fec_id, it's fine :) call it fwd_id. > > dsts are still managed from the MPLS code. There is some generic stuff > (create, destriy, gc etc) for which there is no point in recreating in > the MPLS code I am not sure that you need to. This is what was done in James' impl. mpls_dst. The only thing you need is a means to allocate mpls_dsts and hang the reference into the skb's dst. The advantage is that you don't add another member to dst (I still don't like adding a mpls ptr to a generic dst, but I assume you are far more knowledgeable than I am), but of course, you still have to modify the skb dst. (e.g. release it and hold a new reference). > The way it is right now works fine. What could probably have been a > better approach is to stack dsts. It would require some surgery and i am > not sure i have the patience for it. Well, I though we agreed on doing it the right way :) I am not stating which one it is though. In mpls_unicast_forward lt = (struct ltable *)skb->dst; skb->dst = <->u.dst; would not it be possible here to allocate a mpls_dst with a new dst_ops with the right size? comment: I *do* think that mpls_tunnel.c from James impl can directly be used and it's very convenient. Just %s/moi/fwd_id/g Ramon. |
From: Jamal H. S. <ha...@zn...> - 2004-02-13 16:50:24
|
On Fri, 2004-02-13 at 10:17, Ramon Casellas wrote: > On Fri, 13 Feb 2004, Jamal Hadi Salim wrote: > > disclaimer: > > from now on, all mails will be sent to mpl...@li... ( > that is what I wa going to do, but then i received your email about the > mailing list failing) > Following on your statement - removed Dave. I like ccing original sender in case mailing list goes down .. > > > Essentially it is an identifier of a NHLFE entry. > > So you are right naming it a FEC identifier may not be the best. > > So the relationship is: > > 1 FEC F to N available objects ^ is that F a typo? > 1 Incoming Label/Labelspace to N available objects Essentially yes if the F is a typo. > > I dont wanna call the so-far-called fecid lspid but it is close. > > I see what you mean. Let us check what the RFC says: > The "Incoming Label Map" (ILM) maps each incoming label to a set of > NHLFEs. (...) > [..] > (RCAS: N.B. you don't need equal cost paths)... We can do it , so lets just add it. > > Given the above info, suggest a new name. Maybe NHid? > > I would say something like a "fwd_id" from "Forward Id", or "out_id" it > should not prelude one or other Next Hop. you are sure you dont want nh somewhere in there? since this is a reference to the NHlfe; heck why dont we just call it nhlfe_id ? ;-> > > Also i would think most of these lists would contain a single entry. > > Yes, unicast MPLS with no load sharing enabled. However, you may need them > when doing multicast and/or load sharing. > You mentioning multicast has given me some interesting thoughts. Essentially multicast would be just another algorithm in the thought that i previously posted (response to James). > > > BTW, the ILM is not multihop ready. We should be able to add easily. > Do you really need it to? Just hold a set of fwd_ids. The policy to select > one should be configurable (discipline) & pluggable. A common impl. Is a > hash table. Almost like you read my mind. Refer to my earlier email for the suggestions i made. > The RFC also defines the interaction with routing in this case. (although > vaguely) any routing/IP details in my opinion are NHLFE related. Example a neighbor needs to have an IP address. > > > The ability to select this value by policy allows us to be able to > > select the NHLFE entries from other subsystems; eg a u32 classifier > > on ingress could select all IP addresses from 10.1.1.1/24 to have a > > fecid of 10. The skb->fecid is then set to 10. When the packet gets to > > I see, but as long as it is not called fec_id, it's fine :) call it > fwd_id. check my earlier view above. Toss a coin and pick something and lets stick with it. > > > > > dsts are still managed from the MPLS code. There is some generic stuff > > (create, destriy, gc etc) for which there is no point in recreating in > > the MPLS code > > I am not sure that you need to. This is what was done in James' impl. > mpls_dst. The only thing you need is a means to allocate mpls_dsts and > hang the reference into the skb's dst. The advantage is that you don't add > another member to dst (I still don't like adding a mpls ptr to a generic > dst, but I assume you are far more knowledgeable than I am), but of > course, you still have to modify the skb dst. (e.g. release it and hold a > new reference). > Ok i will need to look at the code. > > > The way it is right now works fine. What could probably have been a > > better approach is to stack dsts. It would require some surgery and i am > > not sure i have the patience for it. > > Well, I though we agreed on doing it the right way :) I am not stating > which one it is though. Absolutely, but that also means not sticking unnecessary ifdefs in 20 files just so that you can supports some funky xfrm approach. > In mpls_unicast_forward > > lt = (struct ltable *)skb->dst; > skb->dst = <->u.dst; > > would not it be possible here to allocate a mpls_dst with a new dst_ops > with the right size? Yes, this is the dirtiest scene in the usage of the skb->dst in that code. It is not too too bad as far as obscenity level is concerned and. if there are better ways to do this, lets move on to those approaches. The big challenge would be the other issues associated with it such as hh, neighbors etc. > comment: I *do* think that mpls_tunnel.c from James impl can directly be > used and it's very convenient. Just %s/moi/fwd_id/g What is the mpls_tunnel.c for? Is it a netdevice? What is it used for? cheers, jamal |
From: Ramon C. <cas...@in...> - 2004-02-13 17:12:13
|
On 13 Feb 2004, Jamal Hadi Salim wrote: > you are sure you dont want nh somewhere in there? > since this is a reference to the NHlfe; > heck why dont we just call it nhlfe_id ? ;-> nhlfe_id is fine for me. > Refer to my earlier email for the suggestions i made. Agreed. > > > comment: I *do* think that mpls_tunnel.c from James impl can directly be > > used and it's very convenient. Just %s/moi/fwd_id/g > > What is the mpls_tunnel.c for? Is it a netdevice? What is it used for? Yes. It is a virtual netdevice that is allocated upon request and basically holds a MOI (the equivalent of a nhlfe_id). User sees it as a unidirectional netdevice (ifconfig, etc), Take a look at the file if you happen to find some spare time (Indeed, it can be improved and the sysfs integration was a little hairy) but I think it is very convenient and extensively used when RSVP-TE sets up LSPs. regards, Ramon |
From: Jamal H. S. <ha...@zn...> - 2004-02-13 22:39:57
|
On Fri, 2004-02-13 at 12:10, Ramon Casellas wrote: > > > > What is the mpls_tunnel.c for? Is it a netdevice? What is it used for? > > Yes. It is a virtual netdevice that is allocated upon request and > basically holds a MOI (the equivalent of a nhlfe_id). User sees it as a > unidirectional netdevice (ifconfig, etc), What do you mean with "by request" - is it created by policy or packet arrival? I think i may be able to visualize this, if ia m right - what is happening is a packet gets redirected to this device which then does some MPLS work on it before sending out some device with proper encapsulation? Is this typically an IP packet? > Take a look at the file if you happen to find some spare time (Indeed, it > can be improved and the sysfs integration was a little hairy) but I think > it is very convenient and extensively used when RSVP-TE sets up LSPs. Nobody has pointed a URL to me yet of whenre the code is. cheers, jamal |
From: Ramon C. <cas...@in...> - 2004-02-13 22:53:41
|
On 13 Feb 2004, Jamal Hadi Salim wrote: > On Fri, 2004-02-13 at 12:10, Ramon Casellas wrote: > > > What do you mean with "by request" - is it created by policy or packet > arrival? By policy, tyically from userspace. This may clarify it a little http://perso.enst.fr/~casellas/mpls-linux/ch02s04.html http://perso.enst.fr/~casellas/mpls-linux/ch02s07.html http://perso.enst.fr/~casellas/mpls-linux/ch10.html > I think i may be able to visualize this, if ia m right - what is > happening is a packet gets redirected to this device which then > does some MPLS work on it before sending out some device with proper > encapsulation? Is this typically an IP packet? Yes and Yes. not bad :). And you can use it to stack. > Nobody has pointed a URL to me yet of whenre the code is. sorry about that http://sourceforge.net/project/showfiles.php?group_id=15443 Best regards, R. |
From: James R. L. <jl...@mi...> - 2004-02-13 17:34:18
|
The MPLS tunnel interface fits well into the 'cisco' model of TE LSPS, which represents them as a ptp Tunnel interface with a peer address of the end-point of the LSP. The 'juniper' model represents TE LSPs and just another route in the MPLS 'routing' table (/32 route for the end-point of the TE LSP). I personally perfer the 'cisco' model, it provides more flexibility (anything that can work with a netdevice can use it). On Fri, Feb 13, 2004 at 06:10:13PM +0100, Ramon Casellas wrote: > On 13 Feb 2004, Jamal Hadi Salim wrote: > > > you are sure you dont want nh somewhere in there? > > since this is a reference to the NHlfe; > > heck why dont we just call it nhlfe_id ? ;-> > > > nhlfe_id is fine for me. > > > > Refer to my earlier email for the suggestions i made. > > Agreed. > > > > > > > comment: I *do* think that mpls_tunnel.c from James impl can directly be > > > used and it's very convenient. Just %s/moi/fwd_id/g > > > > What is the mpls_tunnel.c for? Is it a netdevice? What is it used for? > > > Yes. It is a virtual netdevice that is allocated upon request and > basically holds a MOI (the equivalent of a nhlfe_id). User sees it as a > unidirectional netdevice (ifconfig, etc), > > Take a look at the file if you happen to find some spare time (Indeed, it > can be improved and the sysfs integration was a little hairy) but I think > it is very convenient and extensively used when RSVP-TE sets up LSPs. > > regards, > Ramon > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: Jamal H. S. <ha...@zn...> - 2004-02-13 23:17:34
|
On Fri, 2004-02-13 at 12:32, James R. Leu wrote: > The MPLS tunnel interface fits well into the 'cisco' model of TE LSPS, which > represents them as a ptp Tunnel interface with a peer address of the > end-point of the LSP. The 'juniper' model represents TE LSPs and just > another route in the MPLS 'routing' table (/32 route for the end-point of > the TE LSP). I personally perfer the 'cisco' model, it provides more > flexibility (anything that can work with a netdevice can use it). > Ok. So i may be getting a better idea. Essentially by being a netdevice it gets the advantage of being routable etc. Just because CISCO has it is good reason to add it. We should also support the Juniper approach. We are Linux after all ;-> One piece i said earlier was missing that may enable this is the tc-action code[1]. With this i can do at pre-IP level do something along the lines of: tc filter add dev eth0 parent ffff: protocol ip prio 1 \ u32 match ip src 10.0.0.21/32 flowid 1:15 \ action set nhlfe_id 10 \ action mpls_tunnel \ action mirred egress redirect dev eth2 and then use the skb->nhlfe_id in the mpls_tunnel before redirecting the packet out eth2. Of course i could let routing take care of redirecting to dev eth2. cheers, jamal [1]This code is going in; just lazy to scrub it at this point http://www.cyberus.ca/~hadi/patches/action/README |
From: Ramon C. <cas...@in...> - 2004-02-13 23:28:23
|
On 13 Feb 2004, Jamal Hadi Salim wrote: > On Fri, 2004-02-13 at 12:32, James R. Leu wrote: > > The MPLS tunnel interface fits well into the 'cisco' model of TE LSPS, which > > represents them as a ptp Tunnel interface with a peer address of the > > end-point of the LSP. The 'juniper' model represents TE LSPs and just > > another route in the MPLS 'routing' table (/32 route for the end-point of > > the TE LSP). I personally perfer the 'cisco' model, it provides more > > flexibility (anything that can work with a netdevice can use it). > > > > Ok. So i may be getting a better idea. Essentially by being a netdevice > it gets the advantage of being routable etc. > Just because CISCO has it is good reason to add it. Jamal, Thanks for being open to ideas and thoughts. May I suggest you setting up (when you find some time) a CVS so it is easiear for us to sync to the latest tree? Not right now of course. REgards, r. |
From: James R. L. <jl...@mi...> - 2004-02-13 14:41:23
|
Comments in line On Fri, Feb 13, 2004 at 09:09:08AM -0500, Jamal Hadi Salim wrote: > > Maybe we can move this discussion to the list and leave > Dave alone; we can ping him when we need to verify things from him. > I am trying to cc the list as a test. > > On Fri, 2004-02-13 at 06:26, Ramon Casellas wrote: > > Some comments, > > > > (I'm still reading the spec, and slowly looking at the main entry points > > and hooks in the code, so please be patient and bear with me) > > > > > > RCAS 20040213: I see the utility of FEC Id, but I am not fond of the name. > > The name Fec Id implies that it is "a FEC identifier". > > Essentially it is an identifier of a NHLFE entry. > So you are right naming it a FEC identifier may not be the best. Agreed. > > > What worries me is > > the mapping FECId -> NHLFE (for example, in LSP merging, two FECids could > > be mapped to the same NHLFE index), and the fact that a FECId should be a > > member of a NHLFE entry... > > I dont wanna call the so-far-called fecid lspid but it is close. > > > Moreover, core LSRs should be FEC agnostic. This was my main comment last > > time. Basically, the FECid is the label itself. The label implicitely > > identifies the FEC as is the "key" to use to forward the packet. Otherwise > > you have at the same time label mappings and fecid management (signalling > > protocols) > > I wouldnt call it a label at all. It is the key used to search the > NHLFE. Some implementations dont allow setting of such a parameter (one > of the vendors i looked at did actually) - they will tell you what it > is. Essentially it is an identifier of a NHLFE entry (not index). > A collection of these NHLFE entries could be used by the same LSP. > There is a further entry that can be used to store extra LSP info > (refer to parameter "index") > Given the above info, suggest a new name. Maybe NHid? Much better then FECid ;-) (although it is just a name ...) > > "ILM and FTN derive a FECid from their respective lookups" > > > > I would propose : "ILM and FTN derive a [list of - multipath] NHLFE index > > [es] from their respective lookups [...] These indexes and incoming > > labelspaces are then used to lookup the NHLFE to determine how to forward > > the packet." > > > > Ok, that is useful. I have not tested multipath but it should > work with Linux routing ECMP at least. > I wouldnt call it NHLFE indices rather these identifiers so far > called fecid; > Also i would think most of these lists would contain a single entry. > BTW, the ILM is not multihop ready. We should be able to add easily. > Also there is no controil on how the multihop selection is done with > the linux routing table - whatever Linux ECMP does goes. > We should be able to fix the ILM with an algorith selector. After looking at the code I would agreed that whatever linux multiple does at the ingress LER, this code will follow. The real question is how to go about supporting multipath as an LSR? (one ILM needs to load balance over multiple NHLFE). Or dare I suggest p-mp LSPs? > > A standard structure for NHLFE contains: > > - FEC id > > > > RCAS: Is this field really necessary)? a NHLFE entry could be shared by > > several 'FECsId'... > > > > Look at my description above. > The ability to select this value by policy allows us to be able to > select the NHLFE entries from other subsystems; eg a u32 classifier > on ingress could select all IP addresses from 10.1.1.1/24 to have a > fecid of 10. The skb->fecid is then set to 10. When the packet gets to > the point of NHLFE entry selection this value is used to override/select > the NHLFE entry. I know you mentioned it is "not an index" but to me it seems like it really _is_ an index for the NHLFE. Can multiple NHids correspond to the same NHLFE? If it is a 1 to 1 mapping for all intents an purposes it is an index :-) > > And a couple of questions (please consider them as questions from someone > > who has limited experience in kernel programming) > > > > * I think that adding struct mpls_nhlfe_route *mpls to a dst_entry is a > > little intrusive, and somehow the "genericity" of the DST is being lost. > > Would not it be better to use : > > > > struct mpls_dst > > { > > union > > { > > struct dst_entry dst; > > struct mpls_dst *md_next; > > } u; > > .... > > > > and manage MPLS dsts from the mpls subsystem? I understand that using your > > approach it is easier to get MPLS information from skb->dst->mpls but I > > don't know, it seems a too strong coupling between MPLS and generic dst > > management. Well, just food for thoughts. > > > dsts are still managed from the MPLS code. There is some generic stuff > (create, destriy, gc etc) for which there is no point in recreating in > the MPLS code > The way it is right now works fine. What could probably have been a > better approach is to stack dsts. It would require some surgery and i am > not sure i have the patience for it. Mayeb we can ask Dave on his > thoughts on this. Currently we use dst stacking. The 'child' dst is actually a static member of the 'out going label info' (NHLFE). So when the skb reaches the the exit of IPv4|6 a check for the child is done. The skb->dst is replaced with the child dst and the child output funtion is called (which sends it into MPLS land). The entrace to MPLS land use "container_of" macro to get the NHLFE to used to forward the packet. How the stacked dst is created is similar to your scheme. I was wondering is XFRM is a better scheme to use for all of this? > cheers, > jamal > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: Jamal H. S. <ha...@zn...> - 2004-02-13 16:22:03
|
On Fri, 2004-02-13 at 09:39, James R. Leu wrote: > > Given the above info, suggest a new name. Maybe NHid? > > Much better then FECid ;-) (although it is just a name ...) True, but has to map to the semantics; Ok NHid for now until something with a better ring shows up. > > > > Ok, that is useful. I have not tested multipath but it should > > work with Linux routing ECMP at least. > > I wouldnt call it NHLFE indices rather these identifiers so far > > called fecid; > > Also i would think most of these lists would contain a single entry. > > BTW, the ILM is not multihop ready. We should be able to add easily. > > Also there is no controil on how the multihop selection is done with > > the linux routing table - whatever Linux ECMP does goes. > > We should be able to fix the ILM with an algorith selector. > > After looking at the code I would agreed that whatever linux multiple does > at the ingress LER, this code will follow. The real question is how to > go about supporting multipath as an LSR? (one ILM needs to load balance over > multiple NHLFE). Or dare I suggest p-mp LSPs? > Ive actually done some background compute on this in my head at least. Here are my thoughts on paper or electrons: ILM table entry (struct ltable in the code) should have a new structure, call it nh_choice, which has the following entries: function selector(); struct nh_info nh_list; nh_list would look like: struct gnet_stats stats; /* stats */ u32 lt_fecid; /* change that to ilm_nhid */ Note the above two entries currently reside in struct ltable. A packet coming in will have the usual lookup; the entries nh_choice->selector() will be invoked. It will return the nhid. The idea behind the selector() is we can attach different algorithms via policy and make them take care of things like paths being down etc. I can think of two simple algorithms right away: random selection and RR. The idea is to open these algorithms to innovation. >From user space this would look like: l2c mpls ilm add dev eth0 label 22 nhalg roundrobin nhid 2 nhid 3 nhid 4 etc. Thoughts? > > I know you mentioned it is "not an index" but to me it seems like it really > _is_ an index for the NHLFE. Can multiple NHids correspond to the same NHLFE? > If it is a 1 to 1 mapping for all intents an purposes it is an index :-) Ok;-> how about NHkey ? maybe a prefix of mpls_ would also be good. > > dsts are still managed from the MPLS code. There is some generic stuff > > (create, destriy, gc etc) for which there is no point in recreating in > > the MPLS code > > The way it is right now works fine. What could probably have been a > > better approach is to stack dsts. It would require some surgery and i am > > not sure i have the patience for it. Mayeb we can ask Dave on his > > thoughts on this. > > Currently we use dst stacking. The 'child' dst is actually a static member > of the 'out going label info' (NHLFE). So when the skb reaches the the exit > of IPv4|6 a check for the child is done. The skb->dst is replaced with the > child dst and the child output funtion is called (which sends it into > MPLS land). The entrace to MPLS land use "container_of" macro to get the > NHLFE to used to forward the packet. How the stacked dst is created is > similar to your scheme. I was wondering is XFRM is a better scheme to use > for all of this? sorry i meant XFRM. I am indifferent whether we change it to your scheme or leave it as is. I will have to look at your code to make better judgement. My thinking would be the end goal should be NOT to touch the IPV4/6 code with ifdefs unless necessary. If theres not a huge difference in terms of eficiency or code beautifaction i would rather stick to the current code. BTW if you point me to the latest code i will print it and read offline over the weekend if possible. I may be a bit slow responding now since i am at work. cheers, jamal |
From: James R. L. <jl...@mi...> - 2004-02-13 17:14:39
|
On Fri, Feb 13, 2004 at 11:20:06AM -0500, Jamal Hadi Salim wrote: > On Fri, 2004-02-13 at 09:39, James R. Leu wrote: > > > > Given the above info, suggest a new name. Maybe NHid? > > > > Much better then FECid ;-) (although it is just a name ...) > > True, but has to map to the semantics; > Ok NHid for now until something with a better ring shows up. > > > > > > > > Ok, that is useful. I have not tested multipath but it should > > > work with Linux routing ECMP at least. > > > I wouldnt call it NHLFE indices rather these identifiers so far > > > called fecid; > > > Also i would think most of these lists would contain a single entry. > > > BTW, the ILM is not multihop ready. We should be able to add easily. > > > Also there is no controil on how the multihop selection is done with > > > the linux routing table - whatever Linux ECMP does goes. > > > We should be able to fix the ILM with an algorith selector. > > > > After looking at the code I would agreed that whatever linux multiple does > > at the ingress LER, this code will follow. The real question is how to > > go about supporting multipath as an LSR? (one ILM needs to load balance over > > multiple NHLFE). Or dare I suggest p-mp LSPs? > > > > Ive actually done some background compute on this in my head at least. > Here are my thoughts on paper or electrons: > ILM table entry (struct ltable in the code) should have a new structure, > call it nh_choice, which has the following entries: > > function selector(); > struct nh_info nh_list; > > nh_list would look like: > struct gnet_stats stats; /* stats */ > u32 lt_fecid; /* change that to ilm_nhid */ > > Note the above two entries currently reside in struct ltable. > > A packet coming in will have the usual lookup; the entries > nh_choice->selector() will be invoked. It will return the > nhid. > The idea behind the selector() is we can attach different algorithms > via policy and make them take care of things like paths being down etc. > I can think of two simple algorithms right away: random selection and > RR. The idea is to open these algorithms to innovation. > > >From user space this would look like: > > l2c mpls ilm add dev eth0 label 22 nhalg roundrobin nhid 2 nhid 3 nhid 4 What about adding a new func ptr to the protocol driver. Then we could do protocol dependent stuff like hashing the IPv4|6 header or ethernet header (ethernet over MPLS). The task is trival if the stack only has one label, for more then one label we would have to be creative. Hashing the label stack, or use the PW ID (suggestion in PWE3 WG which adds a word after the labelstack to indicate what protocol lies below.) The PW ID could be used to lookup the protocol driver to generate the hash. Or of course we could just add an options for which algo to use. > > etc. > > Thoughts? > > > > > I know you mentioned it is "not an index" but to me it seems like it really > > _is_ an index for the NHLFE. Can multiple NHids correspond to the same NHLFE? > > If it is a 1 to 1 mapping for all intents an purposes it is an index :-) > > Ok;-> > how about NHkey ? maybe a prefix of mpls_ would also be good. > > > > dsts are still managed from the MPLS code. There is some generic stuff > > > (create, destriy, gc etc) for which there is no point in recreating in > > > the MPLS code > > > The way it is right now works fine. What could probably have been a > > > better approach is to stack dsts. It would require some surgery and i am > > > not sure i have the patience for it. Mayeb we can ask Dave on his > > > thoughts on this. > > > > Currently we use dst stacking. The 'child' dst is actually a static member > > of the 'out going label info' (NHLFE). So when the skb reaches the the exit > > of IPv4|6 a check for the child is done. The skb->dst is replaced with the > > child dst and the child output funtion is called (which sends it into > > MPLS land). The entrace to MPLS land use "container_of" macro to get the > > NHLFE to used to forward the packet. How the stacked dst is created is > > similar to your scheme. I was wondering is XFRM is a better scheme to use > > for all of this? > > sorry i meant XFRM. > I am indifferent whether we change it to your scheme or leave it as is. > I will have to look at your code to make better judgement. My thinking > would be the end goal should be NOT to touch the IPV4/6 code with ifdefs > unless necessary. If theres not a huge difference in terms of eficiency > or code beautifaction i would rather stick to the current code. > BTW if you point me to the latest code i will print it and read offline > over the weekend if possible. Here are some snippits. I think XFRM may remove the need for these, but for now it works. Setup the dst stacking ---------------------- net/mpls/mpls_output.c int mpls_set_nexthop (struct dst_entry *dst, u32 nh_data, struct spec_nh *spec) { struct mpls_out_info *moi = NULL; MPLS_ENTER; moi = mpls_get_moi(nh_data); if (unlikely(!moi)) return -1; dst->metrics[RTAX_MTU-1] = moi->moi_mtu; dst->child = dst_clone(&moi->moi_dst); MPLS_DEBUG("moi: %p mtu: %d dst: %p\n", moi, moi->moi_mtu, &moi->moi_dst); MPLS_EXIT; return 0; } mpls_set_nexthop is called from ipv4:rt_set_nexthop and from ipv6:ip6_route_add (I have a 'special nextop' system developed which would be replaced by XFRM). It is very similar to your RTA_MPLS_FEC, but has 2 pieces of data a RTA_SPEC_PROTO and RTA_SPEC_DATA. It is intended for multiple protocols to be able to register special nexthop. Right now only MPLS registers :-) Again I have every intention of ripping it out in favor XFRM. Using the dst stack ------------------- net/ipv4/ip_output.c static inline int ip_finish_output2(struct sk_buff *skb) { struct dst_entry *dst = skb->dst; struct hh_cache *hh = dst->hh; struct net_device *dev = dst->dev; int hh_len = LL_RESERVED_SPACE(dev); if (dst->child) { skb->dst = dst_pop(skb->dst); return skb->dst->output(skb); } ... Something very similar exists in net/ipv6/ip6_output.c ip6_output_finish() > > I may be a bit slow responding now since i am at work. > > cheers, > jamal -- James R. Leu jl...@mi... |
From: Jamal H. S. <ha...@zn...> - 2004-02-13 23:00:11
|
On Fri, 2004-02-13 at 12:12, James R. Leu wrote: > > >From user space this would look like: > > > > l2c mpls ilm add dev eth0 label 22 nhalg roundrobin nhid 2 nhid 3 nhid 4 > > What about adding a new func ptr to the protocol driver. Then we could > do protocol dependent stuff like hashing the IPv4|6 header or ethernet > header (ethernet over MPLS). Ok, so you are looking at only IP packets at the edge of an MPLS network. Describe a little packet walk. Are you planning to not use the ECMP features? > The task is trival if the stack only has one label, for more then one label > we would have to be creative. Hashing the label stack, or use the PW ID > (suggestion in PWE3 WG which adds a word after the labelstack to indicate > what protocol lies below.) The PW ID could be used to lookup the protocol > driver to generate the hash. Point me to some doc if you dont mind. Is this for some of the VPN encapsulations? > Or of course we could just add an options for which algo to use. Note what i suggested is only for ILM level; And there you could add any algorithms you want. With the protocol driver are you suggesting to do something at the IPV4/6 FTN level only? > Here are some snippits. I think XFRM may remove the need for these, > but for now it works. > Setup the dst stacking > ---------------------- > > net/mpls/mpls_output.c > > int > mpls_set_nexthop (struct dst_entry *dst, u32 nh_data, struct spec_nh *spec) > { > struct mpls_out_info *moi = NULL; I take it mpls_out_info is an nhlfe entry? > MPLS_ENTER; > moi = mpls_get_moi(nh_data); > if (unlikely(!moi)) > return -1; > > dst->metrics[RTAX_MTU-1] = moi->moi_mtu; > dst->child = dst_clone(&moi->moi_dst); > MPLS_DEBUG("moi: %p mtu: %d dst: %p\n", moi, moi->moi_mtu, > &moi->moi_dst); > MPLS_EXIT; > return 0; > } > > mpls_set_nexthop is called from ipv4:rt_set_nexthop and from > ipv6:ip6_route_add (I have a 'special nextop' system developed which > would be replaced by XFRM). It is very similar to your RTA_MPLS_FEC, > but has 2 pieces of data a RTA_SPEC_PROTO and RTA_SPEC_DATA. It is > intended for multiple protocols to be able to register special nexthop. > Right now only MPLS registers :-) Again I have every intention of > ripping it out in favor XFRM. > > Using the dst stack > ------------------- > > net/ipv4/ip_output.c > > static inline int ip_finish_output2(struct sk_buff *skb) > { > struct dst_entry *dst = skb->dst; > struct hh_cache *hh = dst->hh; > struct net_device *dev = dst->dev; > int hh_len = LL_RESERVED_SPACE(dev); > > if (dst->child) { > skb->dst = dst_pop(skb->dst); > return skb->dst->output(skb); > } > ... > > Something very similar exists in net/ipv6/ip6_output.c ip6_output_finish() > On the outset this does look a bit cleaner but i would have to ping my brain on Daves approach. Take a look at his code. Q: Can you stack more than one of those dsts? If yes, then it may be even safer to have the nhlfe_route in the dst instead, no? i.e how sure can you be that child will be MPLS related; in other case it is guaranteed to (it does say dst->xxmplsxx). There are a few pieces for the current approach that i didnt like ; example the net_output_maybe_reroute() thing. Or having to mod dst.c to add ifdefs for MPLS. There could be a marriage of the two approaches maybe? cheers, jamal |
From: James R. L. <jl...@mi...> - 2004-02-15 07:28:25
|
On Fri, Feb 13, 2004 at 05:58:03PM -0500, Jamal Hadi Salim wrote: > On Fri, 2004-02-13 at 12:12, James R. Leu wrote: > > > > >From user space this would look like: > > > > > > l2c mpls ilm add dev eth0 label 22 nhalg roundrobin nhid 2 nhid 3 nhid 4 > > > > What about adding a new func ptr to the protocol driver. Then we could > > do protocol dependent stuff like hashing the IPv4|6 header or ethernet > > header (ethernet over MPLS). > > Ok, so you are looking at only IP packets at the edge of an MPLS > network. Describe a little packet walk. Are you planning to > not use the ECMP features? It could be any protocol we map onto an LSP (ie ethernet/atm/fr over MPLS), you just have to add a protocol driver for it. The ECMP feature only help you at the ingress LER. You need something to handle load balancing in the core of the MPLS domain. ECMP example: ------- ------- | | | | .--1G-----| LSR 1 |---100M----| LSR 2 |----1G---. / | | | | \ ---------/ ------- ------- \-------- | Ingress | | Ingress | | LER | | LER | ---------\ ------- ------- /-------- \ | | | | / `--1G-----| LSR 3 |---100M----| LSR 4 |----1G---' | | | | ------- ------- In the above case ECMP will allow a max traffic of 200M between ingress and egress. Load balancing example: --------- ------- ------- -------- | | | |---100M----| | | | | Ingress |----1G-----| LSR 1 |---100M----| LSR 2 |----1G-----| Egress | | LER | | |---100M----| | | LER | --------- ------- ------- -------- Without load balancing LDP would create 1 LSP for traffic going from ingress to egress. The max traffic you could sent from ingress to egress is 100M. With load balancing LDP still sets up 1 LSP from igress to egress, but when LSR2 advertises a label to LSR1, LSR1 realizes it has 3 adj to LSR2 and creates 3 NHLFEs, on on each of the links. It then uses some mechanism to load balance traffic arriving on it's 1 ILM onto the 3 NHLFEs. In the single label case, looking at the protocol ID associated with the ILM and doing a little layer violation ;-) and we can do per flow hashing and map flows to the various NHLFEs. Now the max traffic between ingress and egress is 300M. > > The task is trival if the stack only has one label, for more then one label > > we would have to be creative. Hashing the label stack, or use the PW ID > > (suggestion in PWE3 WG which adds a word after the labelstack to indicate > > what protocol lies below.) The PW ID could be used to lookup the protocol > > driver to generate the hash. > > Point me to some doc if you dont mind. Is this for some of the VPN > encapsulations? http://www.ietf.org/internet-drafts/draft-allan-mpls-pid-00.txt > > Or of course we could just add an options for which algo to use. > > Note what i suggested is only for ILM level; And there you could add any > algorithms you want. With the protocol driver are you suggesting to do > something at the IPV4/6 FTN level only? To be able to load balance and guarentee packet order, you need to know what is underneath the label stack. With just one label it is trivial to figure out what is under the label stack. With more then one, it isn't so easy (the LSR that needs to do the load balancing was not involved in the signaling of any of the labels past the first one). Currently vendors do some nasty hacking. Look at the first nibble after the label stack, if it is a 4, they assume IPv4. They build the appropriate hash and use that to select the outgoing NHLFE. > > Here are some snippits. I think XFRM may remove the need for these, > > but for now it works. > > > Setup the dst stacking > > ---------------------- > > > > net/mpls/mpls_output.c > > > > int > > mpls_set_nexthop (struct dst_entry *dst, u32 nh_data, struct spec_nh *spec) > > { > > struct mpls_out_info *moi = NULL; > > I take it mpls_out_info is an nhlfe entry? > > > MPLS_ENTER; > > moi = mpls_get_moi(nh_data); > > if (unlikely(!moi)) > > return -1; > > > > dst->metrics[RTAX_MTU-1] = moi->moi_mtu; > > dst->child = dst_clone(&moi->moi_dst); > > MPLS_DEBUG("moi: %p mtu: %d dst: %p\n", moi, moi->moi_mtu, > > &moi->moi_dst); > > MPLS_EXIT; > > return 0; > > } > > > > mpls_set_nexthop is called from ipv4:rt_set_nexthop and from > > ipv6:ip6_route_add (I have a 'special nextop' system developed which > > would be replaced by XFRM). It is very similar to your RTA_MPLS_FEC, > > but has 2 pieces of data a RTA_SPEC_PROTO and RTA_SPEC_DATA. It is > > intended for multiple protocols to be able to register special nexthop. > > Right now only MPLS registers :-) Again I have every intention of > > ripping it out in favor XFRM. > > > > Using the dst stack > > ------------------- > > > > net/ipv4/ip_output.c > > > > static inline int ip_finish_output2(struct sk_buff *skb) > > { > > struct dst_entry *dst = skb->dst; > > struct hh_cache *hh = dst->hh; > > struct net_device *dev = dst->dev; > > int hh_len = LL_RESERVED_SPACE(dev); > > > > if (dst->child) { > > skb->dst = dst_pop(skb->dst); > > return skb->dst->output(skb); > > } > > ... > > > > Something very similar exists in net/ipv6/ip6_output.c ip6_output_finish() > > > > On the outset this does look a bit cleaner but i would have to ping my > brain on Daves approach. Take a look at his code. > Q: Can you stack more than one of those dsts? If yes, then it may be > even safer to have the nhlfe_route in the dst instead, no? > i.e how sure can you be that child will be MPLS related; in other case > it is guaranteed to (it does say dst->xxmplsxx). Since we use the childs output pointer, IPv4|6 don't care if it is MPLS. I suppose the same check for child could be made in MPLS output, then yes you could have more the one child stacked. I'm not sure if this would be very optimal for create hierarchical LSPs (I think that is what your eluding to). > There are a few pieces for the current approach that i didnt like ; > example the net_output_maybe_reroute() thing. Or having to mod dst.c > to add ifdefs for MPLS. There could be a marriage of the two approaches > maybe? After getting the feedback from David, XFRM will have to wait and I think the dst stacking is cleaner. > cheers, > jamal -- James R. Leu jl...@mi... |
From: Jamal H. S. <ha...@zn...> - 2004-02-16 14:23:34
|
On Sun, 2004-02-15 at 02:25, James R. Leu wrote: > It could be any protocol we map onto an LSP (ie ethernet/atm/fr over MPLS), > you just have to add a protocol driver for it. And the reason you want to do it at the protocol level is because you can classify better? > The ECMP feature only help you at the ingress LER. You need something > to handle load balancing in the core of the MPLS domain. Agreed, so in my earlier email i said we had no control over ECMP i.e at the mercy of linux V4/6 ECMP. At the ILM level on the other hand (for LSRs) we do have more control. > ECMP example: > ------- ------- | | | | .--1G-----| LSR 1 |---100M----| LSR 2 |----1G---. / | | | | \ ---------/ ------- ------- \-------- | Ingress | | Ingress | | LER | | LER | ---------\ ------- ------- /-------- \ | | | | / `--1G-----| LSR 3 |---100M----| LSR 4 |----1G---' | | | | ------- ------- > > In the above case ECMP will allow a max traffic of 200M between > ingress and egress. Ok > Load balancing example: > --------- ------- ------- -------- | | | |---100M----| | | | | Ingress |----1G-----| LSR 1 |---100M----| LSR 2 |----1G-----| Egress | | LER | | |---100M----| | | LER | --------- ------- ------- -------- > > > > Without load balancing LDP would create 1 LSP for traffic going > from ingress to egress. The max traffic you could sent from ingress > to egress is 100M. With load balancing LDP still sets up 1 LSP from > igress to egress, but when LSR2 advertises a label to LSR1, LSR1 realizes > it has 3 adj to LSR2 and creates 3 NHLFEs, on on each of the links. It then > uses some mechanism to load balance traffic arriving on it's 1 ILM onto > the 3 NHLFEs. In the single label case, looking at the protocol ID > associated with the ILM and doing a little layer violation ;-) and we > can do per flow hashing and map flows to the various NHLFEs. Now the > max traffic between ingress and egress is 300M. > Gotcha. so that balancing is done at the ILM level, correct? So that little violation or peeking is i take it the reason you want the protocol extension to be added? > > > The task is trival if the stack only has one label, for more then one label > > > we would have to be creative. Hashing the label stack, or use the PW ID > > > (suggestion in PWE3 WG which adds a word after the labelstack to indicate > > > what protocol lies below.) The PW ID could be used to lookup the protocol > > > driver to generate the hash. > > > > Point me to some doc if you dont mind. Is this for some of the VPN > > encapsulations? > http://www.ietf.org/internet-drafts/draft-allan-mpls-pid-00.txt I'll read the draft; i know the author from my nortel days. If i understood correctly, this is now introducing an extra piece of data in the packet? Note, as i described earlier, we should be able to just look at anything on the packet with the u32 classifier which can be activated before MPLS ILM is consulted. Also based on the top label we can do a classification again to peek into further packet data before making a decision the next hop. > > > Or of course we could just add an options for which algo to use. > > > > Note what i suggested is only for ILM level; And there you could add any > > algorithms you want. With the protocol driver are you suggesting to do > > something at the IPV4/6 FTN level only? > > To be able to load balance and guarentee packet order, you need to know > what is underneath the label stack. With just one label it is trivial to > figure out what is under the label stack. With more then one, it isn't > so easy (the LSR that needs to do the load balancing was not involved in the > signaling of any of the labels past the first one). Currently vendors do > some nasty hacking. Look at the first nibble after the label stack, if it > is a 4, they assume IPv4. They build the appropriate hash and use that > to select the outgoing NHLFE. Why cant you look? Is this because ASICS are already built? You know precisely where the label stack is going to end, no? Can you not then offset to that position and figure what the next data level is? > Since we use the childs output pointer, IPv4|6 don't care if it is MPLS. > I suppose the same check for child could be made in MPLS output, then yes > you could have more the one child stacked. I'm not sure if this would > be very optimal for create hierarchical LSPs (I think that is what > your eluding to). Ok, that sounds reasonable. For starters dont even talk about hierachical LSPs ;-> Out challenge is to get rid of dst->mpls .. then go to David with this one change - I think its above 5% value add;->. Are you going to make the change? cheers, jamal |
From: James R. L. <jl...@mi...> - 2004-02-19 04:33:58
|
Comments in line. On Mon, Feb 16, 2004 at 09:19:49AM -0500, Jamal Hadi Salim wrote: > On Sun, 2004-02-15 at 02:25, James R. Leu wrote: > > > It could be any protocol we map onto an LSP (ie ethernet/atm/fr over MPLS), > > you just have to add a protocol driver for it. > > And the reason you want to do it at the protocol level is because > you can classify better? To avoid packet re-ordering. By using IPv4 header info, packets that are apart of the same flow will take the same path. Similar with IPv6 and ethernet over mpls (use src dst mac addrs for hash) > > > The ECMP feature only help you at the ingress LER. You need something > > to handle load balancing in the core of the MPLS domain. > > Agreed, so in my earlier email i said we had no control over ECMP > i.e at the mercy of linux V4/6 ECMP. > At the ILM level on the other hand (for LSRs) we do have more control. We have the oppurtunity for more control :-) > > ECMP example: > > > > ------- ------- > | | | | > .--1G-----| LSR 1 |---100M----| LSR 2 |----1G---. > / | | | | \ > ---------/ ------- ------- \-------- > | Ingress | | Ingress > | > | LER | | LER > | > ---------\ ------- ------- /-------- > \ | | | | / > `--1G-----| LSR 3 |---100M----| LSR 4 |----1G---' > | | | | > ------- ------- > > > > > In the above case ECMP will allow a max traffic of 200M between > > ingress and egress. > > Ok > > > Load balancing example: > > > --------- ------- ------- -------- > | | | |---100M----| | | | > | Ingress |----1G-----| LSR 1 |---100M----| LSR 2 |----1G-----| Egress | > | LER | | |---100M----| | | LER | > --------- ------- ------- -------- > > > > > > > > Without load balancing LDP would create 1 LSP for traffic going > > from ingress to egress. The max traffic you could sent from ingress > > to egress is 100M. With load balancing LDP still sets up 1 LSP from > > igress to egress, but when LSR2 advertises a label to LSR1, LSR1 realizes > > it has 3 adj to LSR2 and creates 3 NHLFEs, on on each of the links. It then > > uses some mechanism to load balance traffic arriving on it's 1 ILM onto > > the 3 NHLFEs. In the single label case, looking at the protocol ID > > associated with the ILM and doing a little layer violation ;-) and we > > can do per flow hashing and map flows to the various NHLFEs. Now the > > max traffic between ingress and egress is 300M. > > > > Gotcha. so that balancing is done at the ILM level, correct? yes. > So that little violation or peeking is i take it the reason you want > the protocol extension to be added? It makes the layer violation less of a hack, and more deterministic. I actually have my own idea, which still has a layer violation, but not nearly as nastly as what the draft states. > > > > The task is trival if the stack only has one label, for more then one label > > > > we would have to be creative. Hashing the label stack, or use the PW ID > > > > (suggestion in PWE3 WG which adds a word after the labelstack to indicate > > > > what protocol lies below.) The PW ID could be used to lookup the protocol > > > > driver to generate the hash. > > > > > > Point me to some doc if you dont mind. Is this for some of the VPN > > > encapsulations? > > http://www.ietf.org/internet-drafts/draft-allan-mpls-pid-00.txt > > I'll read the draft; i know the author from my nortel days. > If i understood correctly, this is now introducing an extra piece > of data in the packet? yes. > Note, as i described earlier, we should be able to just look at > anything on the packet with the u32 classifier which can be activated > before MPLS ILM is consulted. Also based on the top label we can > do a classification again to peek into further packet data before making > a decision the next hop. What do that work for ILM which are just a swap? It should only be done when needed, otherwise what's the point of the LS in MPLS? > > > > > Or of course we could just add an options for which algo to use. > > > > > > Note what i suggested is only for ILM level; And there you could add any > > > algorithms you want. With the protocol driver are you suggesting to do > > > something at the IPV4/6 FTN level only? > > > > To be able to load balance and guarentee packet order, you need to know > > what is underneath the label stack. With just one label it is trivial to > > figure out what is under the label stack. With more then one, it isn't > > so easy (the LSR that needs to do the load balancing was not involved in the > > signaling of any of the labels past the first one). Currently vendors do > > some nasty hacking. Look at the first nibble after the label stack, if it > > is a 4, they assume IPv4. They build the appropriate hash and use that > > to select the outgoing NHLFE. > > Why cant you look? Is this because ASICS are already built? > You know precisely where the label stack is going to end, no? > Can you not then offset to that position and figure what the next > data level is? You can look, but how do you know what is there? It could be a MAC address, it could be voice data it could be anything. If you mis-interpret it, you could end-up re-ordering voice packets ... not good. So in simple terms it comes down to deteministically _not_ re-ordering packets :-) > > > > Since we use the childs output pointer, IPv4|6 don't care if it is MPLS. > > I suppose the same check for child could be made in MPLS output, then yes > > you could have more the one child stacked. I'm not sure if this would > > be very optimal for create hierarchical LSPs (I think that is what > > your eluding to). > > Ok, that sounds reasonable. For starters dont even talk about hierachical > LSPs ;-> Out challenge is to get rid of dst->mpls .. then go to David > with this one change - I think its above 5% value add;->. > Are you going to make the change? I'll make the change. I'll send it to the list for review. I'd just like to note that by ignoring hierachy, we're designing/developing by only looking at about 25% of the requirements. As I'm sure your familiar with, it usually requires a re-spin to support the other 75% :-) > > cheers, > jamal > -- James R. Leu jl...@mi... |
From: Jamal H. S. <ha...@zn...> - 2004-02-19 14:31:36
|
On Wed, 2004-02-18 at 23:28, James R. Leu wrote: [..] > > Note, as i described earlier, we should be able to just look at > > anything on the packet with the u32 classifier which can be activated > > before MPLS ILM is consulted. Also based on the top label we can > > do a classification again to peek into further packet data before making > > a decision the next hop. > > What do that work for ILM which are just a swap? It should only be done > when needed, otherwise what's the point of the LS in MPLS? > I was thinking more of you would use the u32 to detect which flows and use flowid to help in in the m-hop. We could always do this later in addition to what you are proposing. Your proposal is the quickest way to get us there. BTW, look at my earlier email (Ramon reached a similar conclusion) on the algorithm plugins for selecting the Nhops. If we agree on that algorithm plugin, lets have someone raise their hand to implememt. If neitehr of you guys raise your hands i can take it. > I'll make the change. I'll send it to the list for review. I'd just > like to note that by ignoring hierachy, we're designing/developing by only > looking at about 25% of the requirements. As I'm sure your familiar with, > it usually requires a re-spin to support the other 75% :-) Sorry, didnt mean ignore it in the design - unless it overcomplicates things immensely. What i meant is when we sell it to Davem (which i dont see as a complication given the dst stacking is useful) is not to sell the LSP hierachies to start with. But lets see the code then we can make the call. Please do factor in the hierachies. cheers, jamal |
From: James R. L. <jl...@mi...> - 2004-02-19 16:24:48
|
On Thu, Feb 19, 2004 at 09:26:01AM -0500, Jamal Hadi Salim wrote: > On Wed, 2004-02-18 at 23:28, James R. Leu wrote: > [..] > > > Note, as i described earlier, we should be able to just look at > > > anything on the packet with the u32 classifier which can be activated > > > before MPLS ILM is consulted. Also based on the top label we can > > > do a classification again to peek into further packet data before making > > > a decision the next hop. > > > > What do that work for ILM which are just a swap? It should only be done > > when needed, otherwise what's the point of the LS in MPLS? > > > > I was thinking more of you would use the u32 to detect which flows and > use flowid to help in in the m-hop. > We could always do this later in addition to what you are proposing. > Your proposal is the quickest way to get us there. > BTW, look at my earlier email (Ramon reached a similar conclusion) > on the algorithm plugins for selecting the Nhops. If we agree on that > algorithm plugin, lets have someone raise their hand to implememt. > If neitehr of you guys raise your hands i can take it. Since it is a nice modular piece. Maybe Ramon or you can work on it. I'll focus on the dst stacking stuff. > > I'll make the change. I'll send it to the list for review. I'd just > > like to note that by ignoring hierachy, we're designing/developing by only > > looking at about 25% of the requirements. As I'm sure your familiar with, > > it usually requires a re-spin to support the other 75% :-) > > Sorry, didnt mean ignore it in the design - unless it overcomplicates > things immensely. What i meant is when we sell it to Davem (which i dont > see as a complication given the dst stacking is useful) is not to sell > the LSP hierachies to start with. But lets see the code then we can make > the call. Please do factor in the hierachies. The key for implementing hierachy is indirection and the realization that not all traffic that arrives on a particular in-segment gets the same actions applied (this goes back to the idea that the meaning of 'pop' is determined by which position in the stack it is being applied to). > > cheers, > jamal > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: James R. L. <jl...@mi...> - 2004-02-13 14:48:49
|
Last time CC'ing David. I just wanted to get David's take on using XFRM for the Layer 3 to MPLS mapping which would utilize dst stacking? It XFRM capable of doing this, any pointers as to where to start? On Fri, Feb 13, 2004 at 09:09:08AM -0500, Jamal Hadi Salim wrote: > > Maybe we can move this discussion to the list and leave > Dave alone; we can ping him when we need to verify things from him. > I am trying to cc the list as a test. > > On Fri, 2004-02-13 at 06:26, Ramon Casellas wrote: > > Some comments, > > > > (I'm still reading the spec, and slowly looking at the main entry points > > and hooks in the code, so please be patient and bear with me) > > > > > > RCAS 20040213: I see the utility of FEC Id, but I am not fond of the name. > > The name Fec Id implies that it is "a FEC identifier". > > Essentially it is an identifier of a NHLFE entry. > So you are right naming it a FEC identifier may not be the best. > > > What worries me is > > the mapping FECId -> NHLFE (for example, in LSP merging, two FECids could > > be mapped to the same NHLFE index), and the fact that a FECId should be a > > member of a NHLFE entry... > > I dont wanna call the so-far-called fecid lspid but it is close. > > > Moreover, core LSRs should be FEC agnostic. This was my main comment last > > time. Basically, the FECid is the label itself. The label implicitely > > identifies the FEC as is the "key" to use to forward the packet. Otherwise > > you have at the same time label mappings and fecid management (signalling > > protocols) > > I wouldnt call it a label at all. It is the key used to search the > NHLFE. Some implementations dont allow setting of such a parameter (one > of the vendors i looked at did actually) - they will tell you what it > is. Essentially it is an identifier of a NHLFE entry (not index). > A collection of these NHLFE entries could be used by the same LSP. > There is a further entry that can be used to store extra LSP info > (refer to parameter "index") > > Given the above info, suggest a new name. Maybe NHid? > > > > > > > "ILM and FTN derive a FECid from their respective lookups" > > > > I would propose : "ILM and FTN derive a [list of - multipath] NHLFE index > > [es] from their respective lookups [...] These indexes and incoming > > labelspaces are then used to lookup the NHLFE to determine how to forward > > the packet." > > > > Ok, that is useful. I have not tested multipath but it should > work with Linux routing ECMP at least. > I wouldnt call it NHLFE indices rather these identifiers so far > called fecid; > Also i would think most of these lists would contain a single entry. > BTW, the ILM is not multihop ready. We should be able to add easily. > Also there is no controil on how the multihop selection is done with > the linux routing table - whatever Linux ECMP does goes. > We should be able to fix the ILM with an algorith selector. > > > A standard structure for NHLFE contains: > > - FEC id > > > > RCAS: Is this field really necessary)? a NHLFE entry could be shared by > > several 'FECsId'... > > > > Look at my description above. > The ability to select this value by policy allows us to be able to > select the NHLFE entries from other subsystems; eg a u32 classifier > on ingress could select all IP addresses from 10.1.1.1/24 to have a > fecid of 10. The skb->fecid is then set to 10. When the packet gets to > the point of NHLFE entry selection this value is used to override/select > the NHLFE entry. > > > > And a couple of questions (please consider them as questions from someone > > who has limited experience in kernel programming) > > > > * I think that adding struct mpls_nhlfe_route *mpls to a dst_entry is a > > little intrusive, and somehow the "genericity" of the DST is being lost. > > Would not it be better to use : > > > > struct mpls_dst > > { > > union > > { > > struct dst_entry dst; > > struct mpls_dst *md_next; > > } u; > > .... > > > > and manage MPLS dsts from the mpls subsystem? I understand that using your > > approach it is easier to get MPLS information from skb->dst->mpls but I > > don't know, it seems a too strong coupling between MPLS and generic dst > > management. Well, just food for thoughts. > > > dsts are still managed from the MPLS code. There is some generic stuff > (create, destriy, gc etc) for which there is no point in recreating in > the MPLS code > The way it is right now works fine. What could probably have been a > better approach is to stack dsts. It would require some surgery and i am > not sure i have the patience for it. Mayeb we can ask Dave on his > thoughts on this. > > cheers, > jamal -- James R. Leu jl...@mi... |
From: David S. M. <da...@re...> - 2004-02-13 17:10:00
|
On Fri, 13 Feb 2004 08:46:53 -0600 "James R. Leu" <jl...@mi...> wrote: > I just wanted to get David's take on using XFRM for the Layer 3 to MPLS > mapping which would utilize dst stacking? > > It XFRM capable of doing this, any pointers as to where to start? XFRM wants to work with protocol stacking at the protocol level (ie. things within ipv4, or ipv6). We could tweak it to do this, but I advise against this initially because this way we can stick the MPLS stack more simply into 2.4.x if we wanted to (and I certainly might want to do that). After we're done, and did a 2.4.x backport if desired, we can look into using XFRM. But I don't advise this now. |
From: James R. L. <jl...@mi...> - 2004-02-13 17:21:13
|
Thanks for the feed back. We'll leave you alone now :-) On Fri, Feb 13, 2004 at 09:07:53AM -0800, David S. Miller wrote: > On Fri, 13 Feb 2004 08:46:53 -0600 > "James R. Leu" <jl...@mi...> wrote: > > > I just wanted to get David's take on using XFRM for the Layer 3 to MPLS > > mapping which would utilize dst stacking? > > > > It XFRM capable of doing this, any pointers as to where to start? > > XFRM wants to work with protocol stacking at the protocol level (ie. things > within ipv4, or ipv6). > > We could tweak it to do this, but I advise against this initially because > this way we can stick the MPLS stack more simply into 2.4.x if we wanted > to (and I certainly might want to do that). > > After we're done, and did a 2.4.x backport if desired, we can look into > using XFRM. But I don't advise this now. > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |