Thread: [mpls-linux-general] Looking for examples to test TC/DS/IPTABLE
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@mi...> - 2001-12-06 15:55:27
|
Hello, I have (re)implemented my version of TC/DS enhanced mpls-linux. I would like to describe what I have created so I can get feed back. Also I would like to get examples from people of how to use TC and iptables to actually excercise this new code and to show me what this implementation cannot do. First this to keep in mind is that 'outgoing info' nolonger can be interpreted as 'outgoing label'. A particular 'outgoing info' can fwd on to another 'outgoing info' which may do a 'push'. 'incoming labels', aux_proto, and mpls tunnels all point to 'outgoing info' (in addition 'outgoing info' can point to other 'outgoing info'). I will refer to 'outgoing info' as 'MOI' (in the code it stands for the mpls_outgoing_info structure). Incoming labels and MOI's have an array of 'instructions' associated with them. Each instruction has a 'data block' associated with it. The original set of instructions, have changed very little: MPLS_OP_POP -> IN: pop off top label (no data) MPLS_OP_PEEK -> IN: make the top label the active 'incoming label' (no data) MPLS_OP_PUSH -> OUT: push a label on to the top of the label stack (label to push) MPLS_OP_DLV -> IN: deliver the packet to a specify protocol handler (protocol id to send the packet to ie IPv4 IPv6) MPLS_OP_FWD -> IN: transfer control to mpls_output() (pointer to the MOI) OUT: start processing the instructions wit the new MOI (pointer to the new MOI) MPLS_OP_SET -> IN: set the incoing interface OUT: set the dst_entry on the skb [last step before TXing a MPLS packet] (pointer to the dst_entry) These are the new instructions: *nfmark comes from skb #dsmark comes from IP header *tc_index comes from the skb *EXP comes from the active incoming label MPLS_OP_NF_FWD -> IN/OUT: index into the datablock by using the (nfmark & mask) start processing the MOI that was found. (array of MOIs) MPLS_OP_DS_FWD -> IN: index into the datablock by using the (dsmark & mask) start processing the MOI that was found. (array of MOIs) MPLS_OP_TC_FWD -> OUT: index into the datablock by using the (tc_index & mask) start processing the MOI that was found. (array of MOIs) MPLS_OP_EXP_FWD -> IN: index into the datablock by using the (EXP) start processing the MOI that was found. (array of MOIs) MPLS_OP_SET_TC -> IN/OUT: set the tc_index (tc_index to use) MPLS_OP_SET_DS -> IN/OUT: set the dsmark (DSCP to use) MPLS_OP_SET_EXP -> IN/OUT: set EXP on the top label (EXP to use) MPLS_OP_EXP2TC -> IN: index into the data block by using the (EXP) and set the tc_index to the value found MPLS_OP_EXP2DS -> IN: index into the data block by using the (EXP) and set the dsmark to the value found So here are some examples: Egress LER: On input of label 100 EXP 1 gets DSCP 0x4, EXP 4 gets DSCP 0x7 MII(100) -> PEEK POP EXP2DS(1->0x4,4->0x7) DLV(IPv4) Ingress LER (DSCP): Packets going to 11.0.0.0/16 goes out with label 100, DSCP 0x4 get EXP 1, DSCP 0x7 gets EXP 4 IPROUTE(11.0.0.0/16) -> MOI(1000) MOI(1000) DS_FWD(0x4->MOI(500), 0x7->MOI(2000)) MOI(500) SET_EXP (1) PUSH(100) SET(next hop info) MOI(2000) SET_EXP (4) PUSH(100) SET(next hop info) IP routing tranfer control to mpls_output and starts processing MOI(1000). MOI(1000) looks at the DSCP and starts processing either MOI(500) or MOI(2000). MOI(500) and MOI(2000) set the EXP, puch the label, set the dst_entry and then send the packet) (you could implement L-LSPs in a similar way, push differnt label in MOI(500) and MOI(2000) and do not set the EXP value) Alternative: IPROUTE(11.0.0.0/16) -> MPLS_TUNNEL(mpls0) mpls0 -> MOI(1000) MOI(500) SET_EXP (1) PUSH(100) SET(next hop info) MOI(2000) SET_EXP (4) PUSH(100) SET(next hop info) IP routing tranfer send the packet out interface mpls0. Interface mpls0 transfers control to mpls_output and starts processing MOI(1000). MOI(1000) looks at the DSCP and starts processing either MOI(500) or MOI(2000). MOI(500) and MOI(2000) set the EXP, puch the label, set the dst_entry and then send the packet) Ingress LER NFMARK and TCINDEX, work simlarly. Transit: INCOMING_LABEL(100) PEEK POP EXP2TC(1->0xF,4->0xE) -> MOI(10000) MOI(10000) PUSH(100) SET(next hop info) Incoming label 100 looks at the EXP bits and sets tc_index to 0xF when EXP is 1 and to 0xE when EXP is 4. MOI(10000) is responsible for trasmitting the label. It pushed on label 100 (and the same EXP bits) and send it on it way. As it leaved via the physical interface a packet scheduler can look at the tc_index and schedule it appropriately. If you want to translate the EXP then you could use an EXP forward to differnt MOIs that push on the same label, but set differnt EXP bits. Additional intructions? MPLS_OP_TC2EXP -> coule be use in a MOI to translate the tc_index set on input to differnt EXP values. This would avoid having to do a EXP FWD just to set differnt EXP values. MPLS_OP_DS2EXP -> same as above, but would look at DSCP in the IP header and could only be execute on packet that came directly from the IP layer. It would avoid having to have seperate MOIs to implement E-LSPs. Comments, questions, political statments? Jim -- James R. Leu jl...@mi... |
From: Olivier D. <Oli...@rd...> - 2001-12-11 14:34:38
|
hi Jim, Can you describe a little more the MPLS_OP_NF_FWD action ? If i understand well, the sutff is : - mark packet with iptable - use MPLS_OP_NF_FWD to retrieve the MOI info (and so the outgoing label) from the nfmark value - possibly use nfmark as filter criteria for TC as usual It look like good for me because it made a very little change in the kernel and haven't to maintain an mpls_index change in case of nfmark change. Just a point of clarification about the use of nfmark instead of mpls_index for the reason given previously (incomptability with nfmark route only stuff) Have you a different way to use this action and so let the normal nfmark behaviour work as usual ? I suppose mplsadm need to be modified to set up this action with the right nfmark value ? If so, i suppose you can leave both normal and mpls nfmark. The difference come with a different value for nfmark (only value setup with mplsadm result in a mpls processing) Regards, Olivier James R. Leu wrote: > Hello, > > I have (re)implemented my version of TC/DS enhanced mpls-linux. I would > like to describe what I have created so I can get feed back. Also > I would like to get examples from people of how to use TC and iptables > to actually excercise this new code and to show me what this implementation > cannot do. > > First this to keep in mind is that 'outgoing info' nolonger can be interpreted > as 'outgoing label'. A particular 'outgoing info' can fwd on to another > 'outgoing info' which may do a 'push'. 'incoming labels', aux_proto, and > mpls tunnels all point to 'outgoing info' (in addition 'outgoing info' can > point to other 'outgoing info'). I will refer to 'outgoing info' as > 'MOI' (in the code it stands for the mpls_outgoing_info structure). > > Incoming labels and MOI's have an array of 'instructions' associated > with them. Each instruction has a 'data block' associated with it. > The original set of instructions, have changed very little: > > MPLS_OP_POP -> IN: pop off top label (no data) > MPLS_OP_PEEK -> IN: make the top label the active 'incoming label' (no data) > MPLS_OP_PUSH -> OUT: push a label on to the top of the label stack > (label to push) > MPLS_OP_DLV -> IN: deliver the packet to a specify protocol handler > (protocol id to send the packet to ie IPv4 IPv6) > MPLS_OP_FWD -> IN: transfer control to mpls_output() (pointer to the MOI) > OUT: start processing the instructions wit the new MOI > (pointer to the new MOI) > MPLS_OP_SET -> IN: set the incoing interface > OUT: set the dst_entry on the skb [last step before TXing a > MPLS packet] (pointer to the dst_entry) > > These are the new instructions: > > *nfmark comes from skb > #dsmark comes from IP header > *tc_index comes from the skb > *EXP comes from the active incoming label > > MPLS_OP_NF_FWD -> IN/OUT: index into the datablock by using the (nfmark & mask) > start processing the MOI that was found. (array of MOIs) > MPLS_OP_DS_FWD -> IN: index into the datablock by using the (dsmark & mask) > start processing the MOI that was found. (array of MOIs) > MPLS_OP_TC_FWD -> OUT: index into the datablock by using the (tc_index & mask) > start processing the MOI that was found. (array of MOIs) > MPLS_OP_EXP_FWD -> IN: index into the datablock by using the (EXP) > start processing the MOI that was found. (array of MOIs) > MPLS_OP_SET_TC -> IN/OUT: set the tc_index (tc_index to use) > MPLS_OP_SET_DS -> IN/OUT: set the dsmark (DSCP to use) > MPLS_OP_SET_EXP -> IN/OUT: set EXP on the top label (EXP to use) > MPLS_OP_EXP2TC -> IN: index into the data block by using the (EXP) and set the > tc_index to the value found > MPLS_OP_EXP2DS -> IN: index into the data block by using the (EXP) and set the > dsmark to the value found > > > > So here are some examples: > > Egress LER: > > On input of label 100 EXP 1 gets DSCP 0x4, EXP 4 gets DSCP 0x7 > > MII(100) -> PEEK POP EXP2DS(1->0x4,4->0x7) DLV(IPv4) > > Ingress LER (DSCP): > > Packets going to 11.0.0.0/16 goes out with label 100, DSCP 0x4 get EXP 1, > DSCP 0x7 gets EXP 4 > > IPROUTE(11.0.0.0/16) -> MOI(1000) > MOI(1000) DS_FWD(0x4->MOI(500), 0x7->MOI(2000)) > MOI(500) SET_EXP (1) PUSH(100) SET(next hop info) > MOI(2000) SET_EXP (4) PUSH(100) SET(next hop info) > > IP routing tranfer control to mpls_output and starts processing MOI(1000). > MOI(1000) looks at the DSCP and starts processing either MOI(500) or > MOI(2000). MOI(500) and MOI(2000) set the EXP, puch the label, > set the dst_entry and then send the packet) > > (you could implement L-LSPs in a similar way, push differnt label in MOI(500) > and MOI(2000) and do not set the EXP value) > > Alternative: > > IPROUTE(11.0.0.0/16) -> MPLS_TUNNEL(mpls0) > mpls0 -> MOI(1000) > MOI(500) SET_EXP (1) PUSH(100) SET(next hop info) > MOI(2000) SET_EXP (4) PUSH(100) SET(next hop info) > > IP routing tranfer send the packet out interface mpls0. Interface mpls0 > transfers control to mpls_output and starts processing MOI(1000). > MOI(1000) looks at the DSCP and starts processing either MOI(500) or > MOI(2000). MOI(500) and MOI(2000) set the EXP, puch the label, > set the dst_entry and then send the packet) > > Ingress LER NFMARK and TCINDEX, work simlarly. > > Transit: > > INCOMING_LABEL(100) PEEK POP EXP2TC(1->0xF,4->0xE) -> MOI(10000) > MOI(10000) PUSH(100) SET(next hop info) > > Incoming label 100 looks at the EXP bits and sets tc_index to 0xF when > EXP is 1 and to 0xE when EXP is 4. MOI(10000) is responsible for > trasmitting the label. It pushed on label 100 (and the same EXP bits) > and send it on it way. As it leaved via the physical interface a packet > scheduler can look at the tc_index and schedule it appropriately. > > If you want to translate the EXP then you could use an EXP forward > to differnt MOIs that push on the same label, but set differnt EXP bits. > > Additional intructions? > > MPLS_OP_TC2EXP -> coule be use in a MOI to translate the tc_index set on > input to differnt EXP values. This would avoid having > to do a EXP FWD just to set differnt EXP values. > > MPLS_OP_DS2EXP -> same as above, but would look at DSCP in the IP header > and could only be execute on packet that came directly > from the IP layer. It would avoid having to have > seperate MOIs to implement E-LSPs. > > Comments, questions, political statments? > > Jim > -- FTR&D/DAC/CPN Technopole Anticipa | mailto:Oli...@fr... 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 F-22307 LANNION | Fax: +(33) 2 96 05 18 52 |
From: James R. L. <jl...@mi...> - 2001-12-18 14:51:29
|
I see the light now..... Last night I dug around the kernel and I now see why you think that netfilter is the best way for interacting with the ipv4 routing table. I will look more at your work and see how I can make it less MPLS specific. Right now I think I am going to add a new netlink. POST_ROUTING_SLOW. This will allow a netlink to modify the route cache entry. The result will be that the netlink code for this will only be run for the first packet in the "flow". The rest will hit the entry in the route cache and will be redirected to the MPLS layer. Will this satisfy your hope of avoiding double lookups (for every packet except the first)? Jim On Tue, Dec 11, 2001 at 01:52:39PM +0100, Olivier Dugeon wrote: > hi Jim, > > Can you describe a little more the MPLS_OP_NF_FWD action ? It doesn't work, because I cannot bind a LSP to a route in the mangle table. Jim > If i understand well, the sutff is : > > - mark packet with iptable > - use MPLS_OP_NF_FWD to retrieve the MOI info (and so the outgoing > label) from the nfmark value > - possibly use nfmark as filter criteria for TC as usual > > It look like good for me because it made a very little change in the > kernel and haven't to maintain an mpls_index change in case of nfmark > change. > > Just a point of clarification about the use of nfmark instead of > mpls_index for the reason given previously (incomptability with nfmark > route only stuff) Have you a different way to use this action and so let > the normal nfmark behaviour work as usual ? I suppose mplsadm need to > be modified to set up this action with the right nfmark value ? If so, i > suppose you can leave both normal and mpls nfmark. The difference come > with a different value for nfmark (only value setup with mplsadm result > in a mpls processing) > > Regards, > > Olivier > > James R. Leu wrote: > > > Hello, > > > > I have (re)implemented my version of TC/DS enhanced mpls-linux. I would > > like to describe what I have created so I can get feed back. Also > > I would like to get examples from people of how to use TC and iptables > > to actually excercise this new code and to show me what this implementation > > cannot do. > > > > First this to keep in mind is that 'outgoing info' nolonger can be interpreted > > as 'outgoing label'. A particular 'outgoing info' can fwd on to another > > 'outgoing info' which may do a 'push'. 'incoming labels', aux_proto, and > > mpls tunnels all point to 'outgoing info' (in addition 'outgoing info' can > > point to other 'outgoing info'). I will refer to 'outgoing info' as > > 'MOI' (in the code it stands for the mpls_outgoing_info structure). > > > > Incoming labels and MOI's have an array of 'instructions' associated > > with them. Each instruction has a 'data block' associated with it. > > The original set of instructions, have changed very little: > > > > MPLS_OP_POP -> IN: pop off top label (no data) > > MPLS_OP_PEEK -> IN: make the top label the active 'incoming label' (no data) > > MPLS_OP_PUSH -> OUT: push a label on to the top of the label stack > > (label to push) > > MPLS_OP_DLV -> IN: deliver the packet to a specify protocol handler > > (protocol id to send the packet to ie IPv4 IPv6) > > MPLS_OP_FWD -> IN: transfer control to mpls_output() (pointer to the MOI) > > OUT: start processing the instructions wit the new MOI > > (pointer to the new MOI) > > MPLS_OP_SET -> IN: set the incoing interface > > OUT: set the dst_entry on the skb [last step before TXing a > > MPLS packet] (pointer to the dst_entry) > > > > These are the new instructions: > > > > *nfmark comes from skb > > #dsmark comes from IP header > > *tc_index comes from the skb > > *EXP comes from the active incoming label > > > > MPLS_OP_NF_FWD -> IN/OUT: index into the datablock by using the (nfmark & mask) > > start processing the MOI that was found. (array of MOIs) > > MPLS_OP_DS_FWD -> IN: index into the datablock by using the (dsmark & mask) > > start processing the MOI that was found. (array of MOIs) > > MPLS_OP_TC_FWD -> OUT: index into the datablock by using the (tc_index & mask) > > start processing the MOI that was found. (array of MOIs) > > MPLS_OP_EXP_FWD -> IN: index into the datablock by using the (EXP) > > start processing the MOI that was found. (array of MOIs) > > MPLS_OP_SET_TC -> IN/OUT: set the tc_index (tc_index to use) > > MPLS_OP_SET_DS -> IN/OUT: set the dsmark (DSCP to use) > > MPLS_OP_SET_EXP -> IN/OUT: set EXP on the top label (EXP to use) > > MPLS_OP_EXP2TC -> IN: index into the data block by using the (EXP) and set the > > tc_index to the value found > > MPLS_OP_EXP2DS -> IN: index into the data block by using the (EXP) and set the > > dsmark to the value found > > > > > > > > So here are some examples: > > > > Egress LER: > > > > On input of label 100 EXP 1 gets DSCP 0x4, EXP 4 gets DSCP 0x7 > > > > MII(100) -> PEEK POP EXP2DS(1->0x4,4->0x7) DLV(IPv4) > > > > Ingress LER (DSCP): > > > > Packets going to 11.0.0.0/16 goes out with label 100, DSCP 0x4 get EXP 1, > > DSCP 0x7 gets EXP 4 > > > > IPROUTE(11.0.0.0/16) -> MOI(1000) > > MOI(1000) DS_FWD(0x4->MOI(500), 0x7->MOI(2000)) > > MOI(500) SET_EXP (1) PUSH(100) SET(next hop info) > > MOI(2000) SET_EXP (4) PUSH(100) SET(next hop info) > > > > IP routing tranfer control to mpls_output and starts processing MOI(1000). > > MOI(1000) looks at the DSCP and starts processing either MOI(500) or > > MOI(2000). MOI(500) and MOI(2000) set the EXP, puch the label, > > set the dst_entry and then send the packet) > > > > (you could implement L-LSPs in a similar way, push differnt label in MOI(500) > > and MOI(2000) and do not set the EXP value) > > > > Alternative: > > > > IPROUTE(11.0.0.0/16) -> MPLS_TUNNEL(mpls0) > > mpls0 -> MOI(1000) > > MOI(500) SET_EXP (1) PUSH(100) SET(next hop info) > > MOI(2000) SET_EXP (4) PUSH(100) SET(next hop info) > > > > IP routing tranfer send the packet out interface mpls0. Interface mpls0 > > transfers control to mpls_output and starts processing MOI(1000). > > MOI(1000) looks at the DSCP and starts processing either MOI(500) or > > MOI(2000). MOI(500) and MOI(2000) set the EXP, puch the label, > > set the dst_entry and then send the packet) > > > > Ingress LER NFMARK and TCINDEX, work simlarly. > > > > Transit: > > > > INCOMING_LABEL(100) PEEK POP EXP2TC(1->0xF,4->0xE) -> MOI(10000) > > MOI(10000) PUSH(100) SET(next hop info) > > > > Incoming label 100 looks at the EXP bits and sets tc_index to 0xF when > > EXP is 1 and to 0xE when EXP is 4. MOI(10000) is responsible for > > trasmitting the label. It pushed on label 100 (and the same EXP bits) > > and send it on it way. As it leaved via the physical interface a packet > > scheduler can look at the tc_index and schedule it appropriately. > > > > If you want to translate the EXP then you could use an EXP forward > > to differnt MOIs that push on the same label, but set differnt EXP bits. > > > > Additional intructions? > > > > MPLS_OP_TC2EXP -> coule be use in a MOI to translate the tc_index set on > > input to differnt EXP values. This would avoid having > > to do a EXP FWD just to set differnt EXP values. > > > > MPLS_OP_DS2EXP -> same as above, but would look at DSCP in the IP header > > and could only be execute on packet that came directly > > from the IP layer. It would avoid having to have > > seperate MOIs to implement E-LSPs. > > > > Comments, questions, political statments? > > > > Jim > > > > > -- > FTR&D/DAC/CPN > Technopole Anticipa | mailto:Oli...@fr... > 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 > F-22307 LANNION | Fax: +(33) 2 96 05 18 52 > > _______________________________________________ > mpls-linux-general mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general -- James R. Leu jl...@mi... |
From: Olivier D. <Oli...@rd...> - 2001-12-18 23:03:42
|
Hi Jim, James R. Leu wrote: > I see the light now..... > > Last night I dug around the kernel and I now see why you think that > netfilter is the best way for interacting with the ipv4 routing table. > I will look more at your work and see how I can make it less MPLS specific. > > Right now I think I am going to add a new netlink. POST_ROUTING_SLOW. > This will allow a netlink to modify the route cache entry. The result will > be that the netlink code for this will only be run for the first packet > in the "flow". The rest will hit the entry in the route cache and > will be redirected to the MPLS layer. > Ok, i think it's a good way setup the process of the first packet. It's that we do by hacking the rt_set_nexthop code. This function is call only for the first packet of a flow (ie. the first ping packet). Then, the entry are kept into the route cash a moment (so, stoping a ping and start some time after a second ping, doesn't call another time rt_set_nexthop because the route cash entry is still valid). What do you intend to do with this new POST_ROUTING_SLOW netlink ? - directly setup the route cash entry or - setup mpls_index (or nfmark) ? > Will this satisfy your hope of avoiding double lookups (for every packet > except the first)? Yes. Olivier -- FTR&D/DAC/CPN Technopole Anticipa | mailto:Oli...@fr... 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 F-22307 LANNION | Fax: +(33) 2 96 05 18 52 |
From: James R. L. <jl...@mi...> - 2001-12-18 23:02:00
|
Comments below .... On Tue, Dec 18, 2001 at 04:21:48PM +0100, Olivier Dugeon wrote: > Hi Jim, > > James R. Leu wrote: > > > I see the light now..... > > > > Last night I dug around the kernel and I now see why you think that > > netfilter is the best way for interacting with the ipv4 routing table. > > I will look more at your work and see how I can make it less MPLS specific. > > > > Right now I think I am going to add a new netlink. POST_ROUTING_SLOW. > > This will allow a netlink to modify the route cache entry. The result will > > be that the netlink code for this will only be run for the first packet > > in the "flow". The rest will hit the entry in the route cache and > > will be redirected to the MPLS layer. > > > > > Ok, i think it's a good way setup the process of the first packet. It's > that we do by hacking the rt_set_nexthop code. This function is call > only for the first packet of a flow (ie. the first ping packet). Then, > the entry are kept into the route cash a moment (so, stoping a ping and > start some time after a second ping, doesn't call another time > rt_set_nexthop because the route cash entry is still valid). > > What do you intend to do with this new POST_ROUTING_SLOW netlink ? -change the pmtu -use the some index on the skb to lookup the moi, attach it to the route cache entry -change rt->dst.output to be mpls_output This will require that everytime a moi is deleted, the route cache must be cleared. :-( What do you think of this? What should be done with skb that come through POST_ROUTING_SLOW but a moi could not be found? 1. I could drop it, then a route cache entry will not be created and next time a packet for this flow is seen it will be slow routed again giving another chance for a valid moi to be found. 2. I could let it pass and let it go via IP. Also subsequent packets for this flow will be fast routed, thus not given a chance to find valid moi. To get around this everytime a new moi is created the route cache could be cleared, thuis forcing packets to go through the slow path again. What do you suggest? Jim > - directly setup the route cash entry or > - setup mpls_index (or nfmark) ? > > > > Will this satisfy your hope of avoiding double lookups (for every packet > > except the first)? > > > Yes. > > Olivier > -- > FTR&D/DAC/CPN > Technopole Anticipa | mailto:Oli...@fr... > 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 > F-22307 LANNION | Fax: +(33) 2 96 05 18 52 -- James R. Leu jl...@mi... |
From: Olivier D. <Oli...@rd...> - 2001-12-19 07:51:56
|
Hi Jim, Comments below ... James R. Leu wrote: > Comments below .... > > On Tue, Dec 18, 2001 at 04:21:48PM +0100, Olivier Dugeon wrote: > >>Hi Jim, >> >>James R. Leu wrote: >> >> >>>I see the light now..... >>> >>>Last night I dug around the kernel and I now see why you think that >>>netfilter is the best way for interacting with the ipv4 routing table. >>>I will look more at your work and see how I can make it less MPLS specific. >>> >>>Right now I think I am going to add a new netlink. POST_ROUTING_SLOW. >>>This will allow a netlink to modify the route cache entry. The result will >>>be that the netlink code for this will only be run for the first packet >>>in the "flow". The rest will hit the entry in the route cache and >>>will be redirected to the MPLS layer. >>> >>> >> >>Ok, i think it's a good way setup the process of the first packet. It's >>that we do by hacking the rt_set_nexthop code. This function is call >>only for the first packet of a flow (ie. the first ping packet). Then, >>the entry are kept into the route cash a moment (so, stoping a ping and >>start some time after a second ping, doesn't call another time >>rt_set_nexthop because the route cash entry is still valid). >> >>What do you intend to do with this new POST_ROUTING_SLOW netlink ? >> > > -change the pmtu > -use the some index on the skb to lookup the moi, attach it to the route cache > entry > -change rt->dst.output to be mpls_output > Ok. fine. > This will require that everytime a moi is deleted, the route cache must > be cleared. :-( > In our original code, the moi is not delete. The moi was setup with mplsadm but no route entry was added (ie. no fec was given to mplsadm and dst_aux_proto was not set) The packet go to ip_route_input function first. Then, in the case of the first packet, fast route is not find (ie. there is no route cash entry corresponding to this flow). So, ip_route_input call ip_route_input_slow function which call rt_set_next_hop where reside the mpls processing stuff. After rt_set_nexthop, ip_route_input_slow finish to compute a new route cash entry and store it into the route cash table. So, the subsequent packet are only process by ip_route_input because a valid route cash entry is find. So, i think route cash entry is not cleared, only a new one is created. > What do you think of this? > > What should be done with skb that come through POST_ROUTING_SLOW but > a moi could not be found? > > 1. I could drop it, then a route cache entry will not be created and > next time a packet for this flow is seen it will be slow routed again > giving another chance for a valid moi to be found. > > 2. I could let it pass and let it go via IP. Also subsequent packets for > this flow will be fast routed, thus not given a chance to find valid moi. > To get around this everytime a new moi is created the route cache could > be cleared, thuis forcing packets to go through the slow path again. > > What do you suggest? > 2. If moi could not be found for the first packet, subsequent packet failed to. So, the better is to process this packet as usual ie. normal ip stuff routing without MPLS processing. But, don't drop it and don't clear the route cash entry. I think that if the first packet doesn't activate the MPLS processing, is that it don't do. If it failed because value are not correct (mplsadm, iptable ... miss configuration) it's an operator problem. So, saying that with an MPLS_DEBUG output is sufficient. I would add to mplsadm the iptable and tc configuration to avoid a such situation but haven't find any time to do it. I think, POST_ROUTING_SLOW must be setup if you would use another value (ie. nfmark or mpls_index) to retrieve the moi from this skb entry. I don't if need to be dynamic like you suggest or just a compile flag for the kernel like other CONFIG_xxx. Your previous post about MPLS_OP_NF_FWD seems to be better. The kernel code (inside rt_set_nexthop) can dig with an mpls mark comming from the skb to retrieve the moi information. This mpls mark can be setup by what you want (iptable, ipchains, tc ...). The purpose of MPLS_OP_NF_FWD is just for mplsadm to setup a moi entry into the RADIX_TREE which correspond to this mpls mark (label value is user friendly). So, the normal processing become (pseudo code) : if (mpls_mark) retrieve_moi_from_mpls_mark(mpls_mark); else moi = proto_data[AUX_PROTO_DATA_MPLS]; Now, a better one is to merge the two approach. Only use mpls_mark. This mpls_mark can be setup by iptable, tc or other AND can be setup by the normal mpls stuff based only on the ip dst address. So, the process become: retrieve_moi(mpls_mark ? mpls_mark : proto_data[AUX_PROTO_DATA_MPLS]); proto_data doesn't contain a pointer to the moi, but the corresponding entry (ie. the label value) What do think of this ? Olivier -- FTR&D/DAC/CPN Technopole Anticipa | mailto:Oli...@fr... 2, Avenue Pierre Marzin | Phone: +(33) 2 96 05 28 80 F-22307 LANNION | Fax: +(33) 2 96 05 18 52 |