Thread: [mpls-linux-general] Better to use nfmark vs tc_index?
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@mi...> - 2001-11-29 04:20:45
|
After looking at iptable a bit more I see that it can set a nfmark via the MARK rule. Should I use this as oppsed to tc_index to influence the LSP and EXP? (note that DSCP will still be an options) I think I now understand what Olivier did, he created something similar to MARK but for MPLS. If we are going to continue to use that I would like to change it alittle. Instead of storing the mpls_index, I think it should build a dst and store it with the rule. This dst will direct the skb to mpls_output() and will have the outgoung label info attached. When it gets to mpls_output() MPLS processing will occur like normal. The dst will be slapped on to any packet that matches the rule. Do you think that by using nfmark we can accomplish the same thing? We would have to relay upon another mean of getting data to mpls_output() like a MPLS tunnel interface or a entry in the FIB that has been marked for MPLS. Once it gets to mpls_output() the nfmark could be used to influence the LSP or EXP. Ofcourse maybe it's just safer to have both options availble :-) Now to the matter of tc_index. It seems that nfmark can be used by a scheduling classifier, but it looks like the classifier for tc_index is better. So it might be that nfmark (or a MPLS mark) is used to influence LSP and EXP descisions (note that DSCP will still be option) and that tc_index is used to influence scheduling. Does that make any sense? It's late. I'm going to sleep and think about it some more. Jim -- James R. Leu jl...@mi... |
From: Steven V. d. B. <ste...@in...> - 2001-11-29 07:53:14
|
On Thu, 2001-11-29 at 05:20, James R. Leu wrote: > After looking at iptable a bit more I see that it can set a nfmark via > the MARK rule. Should I use this as oppsed to tc_index to influence the > LSP and EXP? (note that DSCP will still be an options) Would be a clean choice (note: you'll still need part of Olivier's patch to implement a FEC like behavior per iptables result). > > I think I now understand what Olivier did, he created something similar > to MARK but for MPLS. If we are going to continue to use that I would > like to change it alittle. Instead of storing the mpls_index, I think it > should build a dst and store it with the rule. This dst will direct the > skb to mpls_output() and will have the outgoung label info attached. > When it gets to mpls_output() MPLS processing will occur like normal. > The dst will be slapped on to any packet that matches the rule. Iptables has hooks in several places along the forwarding path: before, in or after the "routing". Is there no possibility that dev will be overwritten by the ip-code? I still feel, we should be able to bypass routing completely (i'm searching what can be done to handle fragmentation etc.) > > Do you think that by using nfmark we can accomplish the same thing? > We would have to relay upon another mean of getting data to mpls_output() > like a MPLS tunnel interface or a entry in the FIB that has been marked > for MPLS. Once it gets to mpls_output() the nfmark could be used to influence > the LSP or EXP. > first guess (warning: statistically proven it's mostly wrong): possible approach. > Ofcourse maybe it's just safer to have both options availble :-) > > Now to the matter of tc_index. It seems that nfmark can be used by a > scheduling classifier, but it looks like the classifier for tc_index is > better. So it might be that nfmark (or a MPLS mark) is used to influence > LSP and EXP descisions (note that DSCP will still be option) and that > tc_index is used to influence scheduling. > tc_index is indeed no "must". There are other solutions possible to get the same effect, especially in the ingress (in the core, we don't want to use iptables for anything). The only thing from the original diffserv for ip the sch_dsmark + tc_index combination offers you is: 1 you can read the (ip) dscp and put it in tc_index (enqueue operation) 2 you can classify based on tc_index 3 you can modify the value of tc_index during the traffic control 4 you can re-write the dscp based on tc_index in the ip-header when exiting sch_dsmark (dequeue operation) 1 and 2 can be done using iptables and fw_mark classifier 3 and 4 need more study. At first sight, i'd say we can do this either in ingress policing or add it to iptables (configuration would be something like: if offered load <1Mbps set fwmark 2, if offered load <2Mbps set fwmark 1, else set fwmark 0). just a personal opinion: i would keep the sch_dsmark approach in LSR and egress, but for ingress (where the more complex TC functions should be performed, to achieve scaleability), forgetting about sch_dsmark and tc_index would be just fine. Does that make any sense? It's late. I'm going to sleep and think about > it some more. > why, i've just woken up :) cheers, Steven -- -- Steven Van den Berghe ste...@in... Workgroup Broadband Communication Networks Department Information Technology Ghent University - Belgium Phone: +32 (0)9 267 35 86 | Fax : +32 (0)9 267 35 99 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* DiffServ over MPLS for Linux: http://dsmpls.atlantis.rug.ac.be *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* A computer is like an Old Testament god, with a lot of rules and no mercy. - Joseph Campbell *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* |
From: Olivier D. <Oli...@rd...> - 2001-11-29 22:25:26
|
Hi Jim, James R. Leu wrote: > After looking at iptable a bit more I see that it can set a nfmark via > the MARK rule. Should I use this as oppsed to tc_index to influence the > LSP and EXP? (note that DSCP will still be an options) > Look at our patch. We have post a full and a small one. The small one use nfmark and is very close to the actual kernel. The only reason that we have developpe a similar approach with mpls_index is the ability to use both nfmark route and mpls_index iptable classification : Our small patch which use nfmark as mpls key override all time the nfmark route selection. With iptable, you can mark packet ie. nfmark field in the skbuff and used this nfmark to enhance routing stuff. In the ip_route_input(net/ipv4/route.c) routine only @ip dst, @ip src, interface number (input or output) and tos field are used to compute the hash route table key. Look at the CONFIG_IP_ROUTE_FWMARK flag (line 1686 to 1688) and you can saw that nfmark can be used to enhance this key calculation. We have mimic this for mpls_index mark. So with the full patch you can used both mpls_index and/or nfmark as enhancement for the hash key route table computation. > I think I now understand what Olivier did, he created something similar > to MARK but for MPLS. If we are going to continue to use that I would > like to change it alittle. Instead of storing the mpls_index, I think it > should build a dst and store it with the rule. This dst will direct the > skb to mpls_output() and will have the outgoung label info attached. > When it gets to mpls_output() MPLS processing will occur like normal. > The dst will be slapped on to any packet that matches the rule. > > Do you think that by using nfmark we can accomplish the same thing? > We would have to relay upon another mean of getting data to mpls_output() > like a MPLS tunnel interface or a entry in the FIB that has been marked > for MPLS. Once it gets to mpls_output() the nfmark could be used to influence > the LSP or EXP. > > Ofcourse maybe it's just safer to have both options availble :-) > > Now to the matter of tc_index. It seems that nfmark can be used by a > scheduling classifier, but it looks like the classifier for tc_index is > better. So it might be that nfmark (or a MPLS mark) is used to influence > LSP and EXP descisions (note that DSCP will still be option) and that > tc_index is used to influence scheduling. Actually for the TC part, we use mpls_index. My latest patch (not publish yet) use mpls_index when it has been configured in the kernel_config and directly the label in the other case. We can recopy this index into the tc_index. The TC mpls classifier has been written for this purpose. The original way we want code is to use u32 classifier. But there is two pb. 1/ the label is not accessible by u32 classifier. They only start at the ip header not the shim header. 2/ Why classify again the packet (CPU power ....) if it has already been classified with iptable or another process ? The shim header (formely the label and/or the EXP fields of the shim header) can be used as a filter mark. > > Does that make any sense? It's late. I'm going to sleep and think about > it some more. > > Jim > Hope you this help. Olivier -- FTR&D/DAC/CPN Technopole Anticipa | mailto:Oli...@fr... 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 F-22307 LANNION | Fax: +(33) 2 96 05 18 52 |
From: Steven V. d. B. <ste...@in...> - 2001-11-29 22:52:25
|
Hi Olivier, Jim, On Thu, 2001-11-29 at 17:32, Olivier Dugeon wrote: > Hi Jim, > <did some cutting here> > > Now to the matter of tc_index. It seems that nfmark can be used by a > > scheduling classifier, but it looks like the classifier for tc_index is > > better. So it might be that nfmark (or a MPLS mark) is used to influence > > LSP and EXP descisions (note that DSCP will still be option) and that > > tc_index is used to influence scheduling. > > > Actually for the TC part, we use mpls_index. My latest patch (not > publish yet) use mpls_index when it has been configured in the > kernel_config and directly the label in the other case. We can recopy > this index into the tc_index. The TC mpls classifier has been written > for this purpose. The original way we want code is to use u32 > classifier. But there is two pb. > > 1/ the label is not accessible by u32 classifier. They only start at the > ip header not the shim header. > > 2/ Why classify again the packet (CPU power ....) if it has already been > classified with iptable or another process ? The shim header (formely > the label and/or the EXP fields of the shim header) can be used as a > filter mark. > agree, there is no use in doing the same job twice. I think for TC, the bottom line is that we want to be flexible on the type of classifier to use (especially in the ingress): tc_index, fw_mark,... Cheers, Steven -- -- Steven Van den Berghe ste...@in... Workgroup Broadband Communication Networks Department Information Technology Ghent University - Belgium Phone: +32 (0)9 267 35 86 | Fax : +32 (0)9 267 35 99 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* DiffServ over MPLS for Linux: http://dsmpls.atlantis.rug.ac.be *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* A computer is like an Old Testament god, with a lot of rules and no mercy. - Joseph Campbell *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* |
From: James R. L. <jl...@mi...> - 2001-11-30 04:15:32
|
Comments at the bottom: On Thu, Nov 29, 2001 at 11:53:01PM +0100, Steven Van den Berghe wrote: > Hi Olivier, Jim, > > On Thu, 2001-11-29 at 17:32, Olivier Dugeon wrote: > > Hi Jim, > > > <did some cutting here> > > > > Now to the matter of tc_index. It seems that nfmark can be used by a > > > scheduling classifier, but it looks like the classifier for tc_index is > > > better. So it might be that nfmark (or a MPLS mark) is used to influence > > > LSP and EXP descisions (note that DSCP will still be option) and that > > > tc_index is used to influence scheduling. > > > > > > Actually for the TC part, we use mpls_index. My latest patch (not > > publish yet) use mpls_index when it has been configured in the > > kernel_config and directly the label in the other case. We can recopy > > this index into the tc_index. The TC mpls classifier has been written > > for this purpose. The original way we want code is to use u32 > > classifier. But there is two pb. > > > > 1/ the label is not accessible by u32 classifier. They only start at the > > ip header not the shim header. > > > > 2/ Why classify again the packet (CPU power ....) if it has already been > > classified with iptable or another process ? The shim header (formely > > the label and/or the EXP fields of the shim header) can be used as a > > filter mark. > > > agree, there is no use in doing the same job twice. I think for TC, the > bottom line is that we want to be flexible on the type of classifier to > use (especially in the ingress): tc_index, fw_mark,... It seems you and Olivier have something against the Linux IPv4 stack, you guys want to bypass it in any way possible :-) I'm going to hop up on my soap box a give you my opinion about this, mind you this is just opinion and further education about iptabels and TC may change my mind, but: MPLS is not meant to take the place of IP routing, it is mean to augment and work with IP routing to do path selection upfront, and not per packet. The same is true with traffic engineering or differentiated services, they mean nothing without traditional IPv4 routing. So to say "we want to bypass IP routing" is not a statment that is true to the definitions of MPLS or TE or DiffServ. We need to figure out what is the correct way to take what IP routing tells us, augment it, and then find the correct place to apply it. Granted I'm definitly not an TE or DiffServ expert, but I have leared the underlying mechanics of a number of MPLS implementation (commercial). I tend to think that marking route entries and using MPLS tunnel interface are the only mechanisms you need to direct traffic into the MPLS stack. We do need additional info per packet, provided via iptables, but iptables cannot replace the IPv4 stack. Anyway, I'll get off my soap box now. Jim -- James R. Leu jl...@mi... |
From: James R. L. <jl...@mi...> - 2001-11-30 03:42:19
|
Comments within ... On Thu, Nov 29, 2001 at 05:32:17PM +0100, Olivier Dugeon wrote: > Hi Jim, > > James R. Leu wrote: > > > After looking at iptable a bit more I see that it can set a nfmark via > > the MARK rule. Should I use this as oppsed to tc_index to influence the > > LSP and EXP? (note that DSCP will still be an options) > > > > > Look at our patch. We have post a full and a small one. The small one > use nfmark and is very close to the actual kernel. The only reason that > we have developpe a similar approach with mpls_index is the ability to > use both nfmark route and mpls_index iptable classification : > > Our small patch which use nfmark as mpls key override all time the > nfmark route selection. With iptable, you can mark packet ie. nfmark > field in the skbuff and used this nfmark to enhance routing stuff. In > the ip_route_input(net/ipv4/route.c) routine only @ip dst, @ip src, > interface number (input or output) and tos field are used to compute the > hash route table key. Look at the CONFIG_IP_ROUTE_FWMARK flag (line 1686 > to 1688) and you can saw that nfmark can be used to enhance this key > calculation. We have mimic this for mpls_index mark. So with the full > patch you can used both mpls_index and/or nfmark as enhancement for the > hash key route table computation. So you're saying that by using nfmark for MPLS we'd be overloading nfmark and wouldn't be able to do specialized route lookups (with nfmark) and have nfmark choose the LSP. I guess I understand that. So we need something like MARK (MPLS) and stores the value on the skb (mpls_index). I might agrees with that, but the details of what it stores in the skb are still unclear to me. (as in, I need to think about it more) > > I think I now understand what Olivier did, he created something similar > > to MARK but for MPLS. If we are going to continue to use that I would > > like to change it alittle. Instead of storing the mpls_index, I think it > > should build a dst and store it with the rule. This dst will direct the > > skb to mpls_output() and will have the outgoung label info attached. > > When it gets to mpls_output() MPLS processing will occur like normal. > > The dst will be slapped on to any packet that matches the rule. > > > > Do you think that by using nfmark we can accomplish the same thing? > > We would have to relay upon another mean of getting data to mpls_output() > > like a MPLS tunnel interface or a entry in the FIB that has been marked > > for MPLS. Once it gets to mpls_output() the nfmark could be used to influence > > the LSP or EXP. > > > > Ofcourse maybe it's just safer to have both options availble :-) > > > > Now to the matter of tc_index. It seems that nfmark can be used by a > > scheduling classifier, but it looks like the classifier for tc_index is > > better. So it might be that nfmark (or a MPLS mark) is used to influence > > LSP and EXP descisions (note that DSCP will still be option) and that > > tc_index is used to influence scheduling. > > > Actually for the TC part, we use mpls_index. My latest patch (not > publish yet) use mpls_index when it has been configured in the > kernel_config and directly the label in the other case. We can recopy > this index into the tc_index. The TC mpls classifier has been written > for this purpose. The original way we want code is to use u32 > classifier. But there is two pb. > > 1/ the label is not accessible by u32 classifier. They only start at the > ip header not the shim header. > > 2/ Why classify again the packet (CPU power ....) if it has already been > classified with iptable or another process ? The shim header (formely > the label and/or the EXP fields of the shim header) can be used as a > filter mark. Would using the tc_index sched classifier solve #1? As for number 2 you need both. iptables simply marks it, no scheduling is actually done. The sched classifier is mearly trying to sort the marked packets into the appropriate "queues". Jim -- James R. Leu jl...@mi... |
From: Olivier D. <Oli...@rd...> - 2001-11-30 16:14:06
|
Hi Jim, James R. Leu wrote: > Comments within ... > > On Thu, Nov 29, 2001 at 05:32:17PM +0100, Olivier Dugeon wrote: > >>Hi Jim, >> >>James R. Leu wrote: >> >> >>>After looking at iptable a bit more I see that it can set a nfmark via >>>the MARK rule. Should I use this as oppsed to tc_index to influence the >>>LSP and EXP? (note that DSCP will still be an options) >>> >>> >> >>Look at our patch. We have post a full and a small one. The small one >>use nfmark and is very close to the actual kernel. The only reason that >>we have developpe a similar approach with mpls_index is the ability to >>use both nfmark route and mpls_index iptable classification : >> >>Our small patch which use nfmark as mpls key override all time the >>nfmark route selection. With iptable, you can mark packet ie. nfmark >>field in the skbuff and used this nfmark to enhance routing stuff. In >>the ip_route_input(net/ipv4/route.c) routine only @ip dst, @ip src, >>interface number (input or output) and tos field are used to compute the >>hash route table key. Look at the CONFIG_IP_ROUTE_FWMARK flag (line 1686 >>to 1688) and you can saw that nfmark can be used to enhance this key >>calculation. We have mimic this for mpls_index mark. So with the full >>patch you can used both mpls_index and/or nfmark as enhancement for the >>hash key route table computation. >> > > So you're saying that by using nfmark for MPLS we'd be overloading nfmark > and wouldn't be able to do specialized route lookups (with nfmark) and > have nfmark choose the LSP. I guess I understand that. So we need something > like MARK (MPLS) and stores the value on the skb (mpls_index). I might > agrees with that, but the details of what it stores in the skb are still > unclear to me. (as in, I need to think about it more) > mplx_index in the original patch (until to v0.3) store the RADIX_TREE index. After v0.4 (include) we store the label. So, it's more user friendly, we haven't the nedd of retrieve the RADIX_TREE key from /proc/net/mpls_xxx. From the label we recompute the RADIX_TREE key in the (net/ipv4/route.c)rt_set_next_hop routine. So, we are abble to retrieve the moi from the RADIX_TREE and setup the route key ops_data field. This is execute only once per flow. After executing the (net/ipv4/route.c)rt_set_next_hop routine, the (net/ipv4/route.c)ip_route_input_slow routine finish to compute the route hash key. So, the moi is store in this structure, and the next packet are directly process. The mpls_index is used like the nfmark to setup different route hash key and distinguish different packet labelling comming from a same or to a same IP address. To convince Steven, mpls long stuff is made only once per flow. Activate the debug and look at the trace. You can see that rt_set_next_hop mpls stuff is call only once per flow. I made some test. A ping without MPLS between 2 node take around 80 micro-second. With MPLS + iptables + TC, the first ping packet take around 150-180 micro-second and the subsequent one around 120-130 micro-second. As you not in your previous mail, i doesn't want to bypass the ipv4 stack. I think its a bad think and we can't do this because in MPLS, the box is first of all a router. So, it must process the packet as a normal router. It's just at the end that we decide to labelled the packet. You and me respect this both with the FIB and the iptable stuff. > >>>I think I now understand what Olivier did, he created something similar >>>to MARK but for MPLS. If we are going to continue to use that I would >>>like to change it alittle. Instead of storing the mpls_index, I think it >>>should build a dst and store it with the rule. This dst will direct the >>>skb to mpls_output() and will have the outgoung label info attached. >>>When it gets to mpls_output() MPLS processing will occur like normal. >>>The dst will be slapped on to any packet that matches the rule. >>> >>>Do you think that by using nfmark we can accomplish the same thing? >>>We would have to relay upon another mean of getting data to mpls_output() >>>like a MPLS tunnel interface or a entry in the FIB that has been marked >>>for MPLS. Once it gets to mpls_output() the nfmark could be used to influence >>>the LSP or EXP. >>> >>>Ofcourse maybe it's just safer to have both options availble :-) >>> >>>Now to the matter of tc_index. It seems that nfmark can be used by a >>>scheduling classifier, but it looks like the classifier for tc_index is >>>better. So it might be that nfmark (or a MPLS mark) is used to influence >>>LSP and EXP descisions (note that DSCP will still be option) and that >>>tc_index is used to influence scheduling. >>> >> >>Actually for the TC part, we use mpls_index. My latest patch (not >>publish yet) use mpls_index when it has been configured in the >>kernel_config and directly the label in the other case. We can recopy >>this index into the tc_index. The TC mpls classifier has been written >>for this purpose. The original way we want code is to use u32 >>classifier. But there is two pb. >> >>1/ the label is not accessible by u32 classifier. They only start at the >>ip header not the shim header. >> >>2/ Why classify again the packet (CPU power ....) if it has already been >>classified with iptable or another process ? The shim header (formely >>the label and/or the EXP fields of the shim header) can be used as a >>filter mark. >> > > Would using the tc_index sched classifier solve #1? As for number 2 you > need both. iptables simply marks it, no scheduling is actually done. > The sched classifier is mearly trying to sort the marked packets into the > appropriate "queues". > > Jim > > -- FTR&D/DAC/CPN Technopole Anticipa | mailto:Oli...@fr... 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 F-22307 LANNION | Fax: +(33) 2 96 05 18 52 |
From: James R. L. <jl...@mi...> - 2001-11-30 03:45:28
|
I want to make sure I've got your opinion correct: On an ingress LER your think iptables -> nfmark is sufficient. On a transit LSR and egress LER tc_index is needed (to augment dsmask and scheduling). In addition you would like the option of using iptables to do all marking/classifing AND bypass normal routing lookups, but it may be possible to use the existing mechanisms (aux_proto on a FIB entry or a MPLS tunnel interface) to achive the same functionality, but not necessarily the same level of efficiency. Jim On Thu, Nov 29, 2001 at 08:50:56AM +0100, Steven Van den Berghe wrote: > On Thu, 2001-11-29 at 05:20, James R. Leu wrote: > > After looking at iptable a bit more I see that it can set a nfmark via > > the MARK rule. Should I use this as oppsed to tc_index to influence the > > LSP and EXP? (note that DSCP will still be an options) > Would be a clean choice (note: you'll still need part of Olivier's patch > to implement a FEC like behavior per iptables result). > > > > I think I now understand what Olivier did, he created something similar > > to MARK but for MPLS. If we are going to continue to use that I would > > like to change it alittle. Instead of storing the mpls_index, I think it > > should build a dst and store it with the rule. This dst will direct the > > skb to mpls_output() and will have the outgoung label info attached. > > When it gets to mpls_output() MPLS processing will occur like normal. > > The dst will be slapped on to any packet that matches the rule. > Iptables has hooks in several places along the forwarding path: before, > in or after the "routing". Is there no possibility that dev will be > overwritten by the ip-code? I still feel, we should be able to bypass > routing completely (i'm searching what can be done to handle > fragmentation etc.) > > > > Do you think that by using nfmark we can accomplish the same thing? > > We would have to relay upon another mean of getting data to mpls_output() > > like a MPLS tunnel interface or a entry in the FIB that has been marked > > for MPLS. Once it gets to mpls_output() the nfmark could be used to influence > > the LSP or EXP. > > > first guess (warning: statistically proven it's mostly wrong): possible > approach. > > > Ofcourse maybe it's just safer to have both options availble :-) > > > > Now to the matter of tc_index. It seems that nfmark can be used by a > > scheduling classifier, but it looks like the classifier for tc_index is > > better. So it might be that nfmark (or a MPLS mark) is used to influence > > LSP and EXP descisions (note that DSCP will still be option) and that > > tc_index is used to influence scheduling. > > > tc_index is indeed no "must". There are other solutions possible to get > the same effect, especially in the ingress (in the core, we don't want > to use iptables for anything). The only thing from the original diffserv > for ip the sch_dsmark + tc_index combination offers you is: > > 1 you can read the (ip) dscp and put it in tc_index (enqueue operation) > 2 you can classify based on tc_index > 3 you can modify the value of tc_index during the traffic control > 4 you can re-write the dscp based on tc_index in the ip-header when > exiting sch_dsmark (dequeue operation) > > 1 and 2 can be done using iptables and fw_mark classifier > 3 and 4 need more study. At first sight, i'd say we can do this either > in ingress policing or add it to iptables (configuration would be > something like: if offered load <1Mbps set fwmark 2, if offered load > <2Mbps set fwmark 1, else set fwmark 0). > > just a personal opinion: i would keep the sch_dsmark approach in LSR and > egress, but for ingress (where the more complex TC functions should be > performed, to achieve scaleability), forgetting about sch_dsmark and > tc_index would be just fine. > > > Does that make any sense? It's late. I'm going to sleep and think about > > it some more. > > > why, i've just woken up :) > > > cheers, > Steven > -- > -- > Steven Van den Berghe > ste...@in... > Workgroup Broadband Communication Networks > Department Information Technology > Ghent University - Belgium > Phone: +32 (0)9 267 35 86 | Fax : +32 (0)9 267 35 99 > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > DiffServ over MPLS for Linux: http://dsmpls.atlantis.rug.ac.be > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > A computer is like an Old Testament god, with a lot of > rules and no mercy. - Joseph Campbell > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > > _______________________________________________ > mpls-linux-general mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general -- James R. Leu jl...@mi... |
From: Steven V. d. B. <ste...@in...> - 2001-11-30 08:20:45
|
OK, it seems it's time for a change in terminology: when i say bypass IPv4 routing, i mean it in an engineering way, not a conceptional way. The basic scheme we want from a linux ingress box is: +------------------+ +----------+ +--------------+ |MF Classification +-->+Forwarding+-->+get ip nexthop| |Policing/Shaping | |selection ++ +------------+-+ +------------------+ +----------+ \ \ (MF=multifield) (MPLS or ip) \ +----------+ \ +--+ +>+push label+--+->+TC| +----------+ +--+ (1) (2) (3) e.g. 1 classification in phase (1), to be used wherever needed. Now what i would like (personal opinion) is to have to go through only one of the two phases in (2). So i don't want to harm the hybrid mpls/ip approach, i just want to limit the amount of processing and interference with the IP path. Now, about TC:In the core, normally (1) is absent, since no complex multi-field classifications are needed, just a dscp lookup, which can be done in (3). cheers, Steven PS: anybody feels like hacking together an ascii-art drawing plugin for mailclients. Would be a great help :) On Fri, 2001-11-30 at 04:32, James R. Leu wrote: > I want to make sure I've got your opinion correct: > > On an ingress LER your think iptables -> nfmark is sufficient. > On a transit LSR and egress LER tc_index is needed (to augment dsmask and > scheduling). > > In addition you would like the option of using iptables to do all > marking/classifing AND bypass normal routing lookups, but it may be possible > to use the existing mechanisms (aux_proto on a FIB entry or a MPLS tunnel > interface) to achive the same functionality, but not necessarily the same > level of efficiency. > > Jim > > On Thu, Nov 29, 2001 at 08:50:56AM +0100, Steven Van den Berghe wrote: > > On Thu, 2001-11-29 at 05:20, James R. Leu wrote: > > > After looking at iptable a bit more I see that it can set a nfmark via > > > the MARK rule. Should I use this as oppsed to tc_index to influence the > > > LSP and EXP? (note that DSCP will still be an options) > > Would be a clean choice (note: you'll still need part of Olivier's patch > > to implement a FEC like behavior per iptables result). > > > > > > I think I now understand what Olivier did, he created something similar > > > to MARK but for MPLS. If we are going to continue to use that I would > > > like to change it alittle. Instead of storing the mpls_index, I think it > > > should build a dst and store it with the rule. This dst will direct the > > > skb to mpls_output() and will have the outgoung label info attached. > > > When it gets to mpls_output() MPLS processing will occur like normal. > > > The dst will be slapped on to any packet that matches the rule. > > Iptables has hooks in several places along the forwarding path: before, > > in or after the "routing". Is there no possibility that dev will be > > overwritten by the ip-code? I still feel, we should be able to bypass > > routing completely (i'm searching what can be done to handle > > fragmentation etc.) > > > > > > Do you think that by using nfmark we can accomplish the same thing? > > > We would have to relay upon another mean of getting data to mpls_output() > > > like a MPLS tunnel interface or a entry in the FIB that has been marked > > > for MPLS. Once it gets to mpls_output() the nfmark could be used to influence > > > the LSP or EXP. > > > > > first guess (warning: statistically proven it's mostly wrong): possible > > approach. > > > > > Ofcourse maybe it's just safer to have both options availble :-) > > > > > > Now to the matter of tc_index. It seems that nfmark can be used by a > > > scheduling classifier, but it looks like the classifier for tc_index is > > > better. So it might be that nfmark (or a MPLS mark) is used to influence > > > LSP and EXP descisions (note that DSCP will still be option) and that > > > tc_index is used to influence scheduling. > > > > > tc_index is indeed no "must". There are other solutions possible to get > > the same effect, especially in the ingress (in the core, we don't want > > to use iptables for anything). The only thing from the original diffserv > > for ip the sch_dsmark + tc_index combination offers you is: > > > > 1 you can read the (ip) dscp and put it in tc_index (enqueue operation) > > 2 you can classify based on tc_index > > 3 you can modify the value of tc_index during the traffic control > > 4 you can re-write the dscp based on tc_index in the ip-header when > > exiting sch_dsmark (dequeue operation) > > > > 1 and 2 can be done using iptables and fw_mark classifier > > 3 and 4 need more study. At first sight, i'd say we can do this either > > in ingress policing or add it to iptables (configuration would be > > something like: if offered load <1Mbps set fwmark 2, if offered load > > <2Mbps set fwmark 1, else set fwmark 0). > > > > just a personal opinion: i would keep the sch_dsmark approach in LSR and > > egress, but for ingress (where the more complex TC functions should be > > performed, to achieve scaleability), forgetting about sch_dsmark and > > tc_index would be just fine. > > > > > > Does that make any sense? It's late. I'm going to sleep and think about > > > it some more. > > > > > why, i've just woken up :) > > > > > > cheers, > > Steven > > -- > > -- > > Steven Van den Berghe > > ste...@in... > > Workgroup Broadband Communication Networks > > Department Information Technology > > Ghent University - Belgium > > Phone: +32 (0)9 267 35 86 | Fax : +32 (0)9 267 35 99 > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > DiffServ over MPLS for Linux: http://dsmpls.atlantis.rug.ac.be > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > A computer is like an Old Testament god, with a lot of > > rules and no mercy. - Joseph Campbell > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > > > > > _______________________________________________ > > mpls-linux-general mailing list > > mpl...@li... > > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general > > -- > James R. Leu > jl...@mi... > > _______________________________________________ > mpls-linux-general mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general > > -- -- Steven Van den Berghe ste...@in... Workgroup Broadband Communication Networks Department Information Technology Ghent University - Belgium Phone: +32 (0)9 267 35 86 | Fax : +32 (0)9 267 35 99 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* DiffServ over MPLS for Linux: http://dsmpls.atlantis.rug.ac.be *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* A computer is like an Old Testament god, with a lot of rules and no mercy. - Joseph Campbell *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* |