Re: [mpls-linux-general] Better to use nfmark vs tc_index?
Status: Beta
Brought to you by:
jleu
From: Olivier D. <Oli...@rd...> - 2001-11-30 16:14:06
|
Hi Jim, James R. Leu wrote: > Comments within ... > > On Thu, Nov 29, 2001 at 05:32:17PM +0100, Olivier Dugeon wrote: > >>Hi Jim, >> >>James R. Leu wrote: >> >> >>>After looking at iptable a bit more I see that it can set a nfmark via >>>the MARK rule. Should I use this as oppsed to tc_index to influence the >>>LSP and EXP? (note that DSCP will still be an options) >>> >>> >> >>Look at our patch. We have post a full and a small one. The small one >>use nfmark and is very close to the actual kernel. The only reason that >>we have developpe a similar approach with mpls_index is the ability to >>use both nfmark route and mpls_index iptable classification : >> >>Our small patch which use nfmark as mpls key override all time the >>nfmark route selection. With iptable, you can mark packet ie. nfmark >>field in the skbuff and used this nfmark to enhance routing stuff. In >>the ip_route_input(net/ipv4/route.c) routine only @ip dst, @ip src, >>interface number (input or output) and tos field are used to compute the >>hash route table key. Look at the CONFIG_IP_ROUTE_FWMARK flag (line 1686 >>to 1688) and you can saw that nfmark can be used to enhance this key >>calculation. We have mimic this for mpls_index mark. So with the full >>patch you can used both mpls_index and/or nfmark as enhancement for the >>hash key route table computation. >> > > So you're saying that by using nfmark for MPLS we'd be overloading nfmark > and wouldn't be able to do specialized route lookups (with nfmark) and > have nfmark choose the LSP. I guess I understand that. So we need something > like MARK (MPLS) and stores the value on the skb (mpls_index). I might > agrees with that, but the details of what it stores in the skb are still > unclear to me. (as in, I need to think about it more) > mplx_index in the original patch (until to v0.3) store the RADIX_TREE index. After v0.4 (include) we store the label. So, it's more user friendly, we haven't the nedd of retrieve the RADIX_TREE key from /proc/net/mpls_xxx. From the label we recompute the RADIX_TREE key in the (net/ipv4/route.c)rt_set_next_hop routine. So, we are abble to retrieve the moi from the RADIX_TREE and setup the route key ops_data field. This is execute only once per flow. After executing the (net/ipv4/route.c)rt_set_next_hop routine, the (net/ipv4/route.c)ip_route_input_slow routine finish to compute the route hash key. So, the moi is store in this structure, and the next packet are directly process. The mpls_index is used like the nfmark to setup different route hash key and distinguish different packet labelling comming from a same or to a same IP address. To convince Steven, mpls long stuff is made only once per flow. Activate the debug and look at the trace. You can see that rt_set_next_hop mpls stuff is call only once per flow. I made some test. A ping without MPLS between 2 node take around 80 micro-second. With MPLS + iptables + TC, the first ping packet take around 150-180 micro-second and the subsequent one around 120-130 micro-second. As you not in your previous mail, i doesn't want to bypass the ipv4 stack. I think its a bad think and we can't do this because in MPLS, the box is first of all a router. So, it must process the packet as a normal router. It's just at the end that we decide to labelled the packet. You and me respect this both with the FIB and the iptable stuff. > >>>I think I now understand what Olivier did, he created something similar >>>to MARK but for MPLS. If we are going to continue to use that I would >>>like to change it alittle. Instead of storing the mpls_index, I think it >>>should build a dst and store it with the rule. This dst will direct the >>>skb to mpls_output() and will have the outgoung label info attached. >>>When it gets to mpls_output() MPLS processing will occur like normal. >>>The dst will be slapped on to any packet that matches the rule. >>> >>>Do you think that by using nfmark we can accomplish the same thing? >>>We would have to relay upon another mean of getting data to mpls_output() >>>like a MPLS tunnel interface or a entry in the FIB that has been marked >>>for MPLS. Once it gets to mpls_output() the nfmark could be used to influence >>>the LSP or EXP. >>> >>>Ofcourse maybe it's just safer to have both options availble :-) >>> >>>Now to the matter of tc_index. It seems that nfmark can be used by a >>>scheduling classifier, but it looks like the classifier for tc_index is >>>better. So it might be that nfmark (or a MPLS mark) is used to influence >>>LSP and EXP descisions (note that DSCP will still be option) and that >>>tc_index is used to influence scheduling. >>> >> >>Actually for the TC part, we use mpls_index. My latest patch (not >>publish yet) use mpls_index when it has been configured in the >>kernel_config and directly the label in the other case. We can recopy >>this index into the tc_index. The TC mpls classifier has been written >>for this purpose. The original way we want code is to use u32 >>classifier. But there is two pb. >> >>1/ the label is not accessible by u32 classifier. They only start at the >>ip header not the shim header. >> >>2/ Why classify again the packet (CPU power ....) if it has already been >>classified with iptable or another process ? The shim header (formely >>the label and/or the EXP fields of the shim header) can be used as a >>filter mark. >> > > Would using the tc_index sched classifier solve #1? As for number 2 you > need both. iptables simply marks it, no scheduling is actually done. > The sched classifier is mearly trying to sort the marked packets into the > appropriate "queues". > > Jim > > -- FTR&D/DAC/CPN Technopole Anticipa | mailto:Oli...@fr... 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 F-22307 LANNION | Fax: +(33) 2 96 05 18 52 |