Thread: [mpls-linux-devel] Jamal's MPLS design document
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@mi...> - 2003-12-05 16:07:03
|
Here is the design doc that Jamal has created. Jamal: if there is a newer version please post it in this thread. -- James R. Leu jl...@mi... -------------------------------------------------------------------- 1. Terminology: LER (Label Edge Router): Router which sits at edge of IP(v4/6) and MPLS network. LSR: Router/switch which sits inside MPLS domain. FEC: Forwarding Equivalance class - This is like "classid" concept we have in the QoS code. In QoS it essentially refers to a queue; in MPLS it will refer to a MPLS LSP/tunnel/label-operations to use. 1.1 Ingress LER: Router at ingress of MPLS domain from IP cloud (often confused as Ingress device ;->). Unlabelled packets arrive at Ingress LER and get labelled based on: a) IPV4/6 route setup b) ingress classification rules (use u32 classifier) c) tunnels like IPSEC using the SPI mapped to an MPLS label d) L2 type of technologies ex VLAN, PPP, ATM etc 1.2 Egress LER: Router at egress of MPLS domain towards IP cloud (not to be confused with egress device on Linux). Labelled packets come in and get their labels removed based on some rules. 1.3 LSR: Switching based on labels 2. Tables involved: We cant ignore these table names because tons of SNMP MIBs exist which at least talk about them; implementation is a different issue but at least we should be able to somehow semantically match them. The tables are the NHLFE, FTN and ILM. The code should use similar names when possible. ILM and FTN derive a FECid from their respective lookups The result (FECid) is then used to lookup the NHLFE to determine how to forward the packet. 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: This table is looked up using the FEC as the key (maybe + label space) although label spaces are still in the TOD below. A standard structure for NHLFE contains: - FEC id - neighbor information (IPV4/6 + egress interface) - MPLS operations to perform The data on this table is to be used by other two tables as mentioned earlier. 2.1.1 NHLFE Configuration: The way i see it being setup is via netlink (this way we can take advantage of distributed architectures later). tc l2conf <cmd> dev <devname> mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> <operation set> fec <FECid> operation set := (op <operation>)* * cmd is one of: <add | del | replace | get> * devname is the output device to be used * index could be used to store the LSPid * protocol to be used is one of IPV4 or V6 (used for neighbor binding) * neighbor is either an IPV4 or V6 address; (for neighbor binding) * operation is the MPLS operation to perform followed by its operands if they. Note there could be a series of operations. * FECid is the FEC identifier to be used as the key for searching. 2.2 FEC to NHLFE mapping (FTN) Table I dont see this table existing by itself. Each MPLS interfacing component will derive a FECid which is used to search the NHLFE table. 2.2.1 IPV4/6 route component FTN Typically, the FEC will be in the IPV4/6 FIB nexthop entry. This way we can have equal cost multi path entries with different FECids. 2.2.2 ingress classification component: This has nothing to do with FTN rather it provides another mapping to the NHLFE table. (when i port tc extension code to 2.6 - we will need a new skb field called FECid); *ingress code matches a packet description and then sets the skb->FECid as an action. We could use the skb->FECid to overrule the FIB FEC when we are selecting the fast path. [The u32 classifier could be used to map based on any header bits and select the FECid.] skb->FECid could also be used on egress for QoS/TE purposes. skb->FECid is meaningful even when not set by the tc-extension on ingress; So whenever we extract the FECid from the FTN and the lookup operation is successful you copy FECid from the FIB/FTN to the skb->FECid. 2.2.3 Tunneling and L2 technologies FTN Revist this later. Example IPSEC, tunnels, VLANs etc etc: Again by having the FEC stored in f.e IPSEC specific tables etc you could easily select NHLFE entries and operate on say an IPSEC packet going out. So this is similar to IPV4 and IPV6. Same with the others. 2.2.4 NHLFE packet path: As in standard Linux, the fast path is first accessed. Two results: 1) On success a MPLS cache entry is found and attached to the skb->dst the skb->dst is used to forward. 2) On failure a slow path is exercised and a new dst cache is created from the NHLFE table. There are two slow path sources: forwarded and localy sourced packets are treated by route_output() whereas incoming packets are treated by route_input() On input slow path use the label to lookup the FEC in the ILM. On LER lookup the respective service (IPV4/6) to find the FEC. the FECid used to lookup the NHLFE for the cache entry creation. 2.2.5 Configuration IPV4/6 routing: The ip tool should allow you specify route you want then specify the FECid for that route, i.e: ip route ... FECid <FECid> where FECid is the NHLFE keyid we want to use Note that multiple FECids in conjunction with the "nexthop" parameter for Equal Cost Multi Path. Of course the route should fail to insert if NHLFE FECid doesnt exist already. [??? What would happen if the route nexthop entry and the NHLFE point to different egress devices?] 2.2.6 Configuration for others They need to be netlink enabled. At the moment only ipsec is. 2.3 ILM (incoming label mapping): Typical entries for this table are: label, ingress dev, FECid Lookup is based on label. ILM is used by both LSR or egress LER. 2.3.1 ILM packet processing: Incoming packets: - use label to lookup the dst cache via route_input() - on failure, ILM lookup to find the NHLFE entry - FECid entry should exist within the ILM table - create dst cache entry on success - drop packet on failure 2.3.2 Configuration is: tc l2conf <cmd> dev <devname> mpls ilm index <val> label fec <FECid> * cmd is one of: <add | del | replace | get> * devname is the input device to be used * Index is an additional identifier that could be used to store LSP info. * FECid is the FECid to be used for searching the NHLFE. 3.0 Allowed OPCODEs At the moment the following look valid: 3.1 Modifiying opcodes - REDIRECT: redirect a packet to a different LSP (useful for testing or redirecting to a control plane) - MIRROR: send a copy of a packet somewhere else for further processing (useful for LSP pings, traceroute, debug etc) 3.2 Label action opcodes - POP_AND_LOOKUP - POP_AND_FORWARD - NO_POP_AND_FORWARD - DISCARD TODO: 1. look into multi next hop for loadbalancing For LSRs. Is this necessary? If yes, there has to be multiple FECids in the ILM table. 2. Stats for each table which may be tricky with caching. 3. describe policy for what happens when we have an error. (example FECid exists in the IPV4 FIB but not in NHLFE; current policy is drop but we could send this packet to user space if theres a listening socket etc). The bad thing about it is it could be used as a DOS. 4. Labels spaces: Interfaces vs system 5. List all netlink events we want to throw. 6. Add used data structures representing tables and other things like IPV4/6 protocol drivers for NH binding. |
From: Ramon C. <cas...@in...> - 2003-12-05 16:15:47
|
Hi all, I'll be reading this doc over the weekend. Also, please note that a (lame && ongoing) effort to document the current implementation is at : http://perso.enst.fr/~casellas/mpls-linux/index.html It's a work in progress, but I'll be working on this this W.E. R. |
From: James R. L. <jl...@mi...> - 2003-12-05 16:49:41
|
I would like to start with one comment that has cascading effect throughout the document. I would like to clarify it, then either update the document to reflect, or go through each use of it and modify if needed. FECid - maybe this is just a naming thing, but the term FEC should never show up in any table other then the FTN, nor should it show up in any part of the MPLS forwarding plan. FECs are only valid in the FTN (duh FEC to NHFLE) and in the 'services' that use MPLS. For example in IPv[4|6] you can define a FEC as a specific entry in the FIB. Thus it makes sense to use the term FEC when talking about how IPv[4|6] will 'bind' to NHLFEs. Although, as Jamal stated, there will probably never exist a single FTN table. Each 'service' that utilizes MPLS will store there own FEC to NHLFE mapping. Thus we can change the IPv[4|6] example from above to say that a specific entry in the FIB will refer to a NHLFE. So to sum it all up, I don't think we should use FECids in any part of the net or MPLS stack. Instead we should be using the NHLFEid. Remeber that NHLFEs are indexed independent of the label value(s) they hold and indepent of the nexthop it points to. In the LSR MIB out-segments have an index which is generated by the system (most of the time sequentially increasing). So in section 2 we should change: ILM and FTN derive a FECid from their respective lookups The result (FECid) is then used to lookup the NHLFE to determine how to forward the packet. to: ILM and FTN derive a NHLFEid from their repective lookups the resulting NHLFEid is use to lookup the NHLFE to determine how to forward the packet. Again, this might all me naming issues, but to me the confusion of FEC vs NHLFE is a fundamental one. On Fri, Dec 05, 2003 at 10:06:00AM -0600, James R. Leu wrote: > Here is the design doc that Jamal has created. > > Jamal: if there is a newer version please post it in this thread. > > -- > James R. Leu > jl...@mi... > > -------------------------------------------------------------------- > 1. Terminology: > > LER (Label Edge Router): Router which sits at edge of > IP(v4/6) and MPLS network. > LSR: Router/switch which sits inside MPLS domain. > FEC: Forwarding Equivalance class - This is like "classid" concept > we have in the QoS code. In QoS it essentially refers to a queue; > in MPLS it will refer to a MPLS LSP/tunnel/label-operations to use. > > 1.1 Ingress LER: > Router at ingress of MPLS domain from IP cloud > (often confused as Ingress device ;->). Unlabelled > packets arrive at Ingress LER and get labelled based on: > a) IPV4/6 route setup > b) ingress classification rules (use u32 classifier) > c) tunnels like IPSEC using the SPI mapped to an MPLS label > d) L2 type of technologies ex VLAN, PPP, ATM etc > > 1.2 Egress LER: > Router at egress of MPLS domain towards IP cloud > (not to be confused with egress device on Linux). > Labelled packets come in and get their labels removed > based on some rules. > > 1.3 LSR: > Switching based on labels > > 2. Tables involved: > We cant ignore these table names because tons of SNMP MIBs exist > which at least talk about them; implementation is a different > issue but at least we should be able to somehow semantically match > them. The tables are the NHLFE, FTN and ILM. > The code should use similar names when possible. > > ILM and FTN derive a FECid from their respective lookups > The result (FECid) is then used to lookup > the NHLFE to determine how to forward the packet. > > 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: > This table is looked up using the FEC as the key (maybe > + label space) although label spaces are still in the TOD below. > > A standard structure for NHLFE contains: > - FEC id > - neighbor information (IPV4/6 + egress interface) > - MPLS operations to perform > > The data on this table is to be used by other two tables as mentioned > earlier. > > 2.1.1 NHLFE Configuration: > The way i see it being setup is via netlink (this way we can take > advantage of distributed architectures later). > > tc l2conf <cmd> dev <devname> > mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> > <operation set> fec <FECid> > operation set := (op <operation>)* > > * cmd is one of: <add | del | replace | get> > * devname is the output device to be used > * index could be used to store the LSPid > * protocol to be used is one of IPV4 or V6 (used for neighbor binding) > * neighbor is either an IPV4 or V6 address; (for neighbor binding) > * operation is the MPLS operation to perform followed by its > operands if they. Note there could be a series of operations. > * FECid is the FEC identifier to be used as the key for searching. > > > 2.2 FEC to NHLFE mapping (FTN) Table > > I dont see this table existing by itself. > Each MPLS interfacing component will derive a FECid which is used > to search the NHLFE table. > > 2.2.1 IPV4/6 route component FTN > Typically, the FEC will be in the IPV4/6 FIB nexthop entry. > This way we can have equal cost multi path entries > with different FECids. > > 2.2.2 ingress classification component: > This has nothing to do with FTN rather it provides another mapping to > the NHLFE table. > (when i port tc extension code to 2.6 - we will need a new > skb field called FECid); > *ingress code matches a packet description and then sets the skb->FECid > as an action. We could use the skb->FECid to overrule the FIB FEC > when we are selecting the fast path. > [The u32 classifier could be used to map based on any header bits and select > the FECid.] > skb->FECid could also be used on egress for QoS/TE purposes. > > skb->FECid is meaningful even when not set by the tc-extension on ingress; > So whenever we extract the FECid from the FTN and the lookup operation > is successful you copy FECid from the FIB/FTN to the skb->FECid. > > 2.2.3 Tunneling and L2 technologies FTN > Revist this later. > Example IPSEC, tunnels, VLANs etc etc: > Again by having the FEC stored in f.e IPSEC specific tables etc > you could easily select NHLFE entries and operate on say > an IPSEC packet going out. So this is similar to IPV4 and IPV6. > Same with the others. > > 2.2.4 NHLFE packet path: > > As in standard Linux, the fast path is first accessed. Two > results: > 1) On success a MPLS cache entry is found and attached to the skb->dst > the skb->dst is used to forward. > 2) On failure a slow path is exercised and a new dst cache is created > from the NHLFE table. > > There are two slow path sources: forwarded and localy sourced packets > are treated by route_output() whereas incoming packets are treated > by route_input() > On input slow path use the label to lookup the FEC in the ILM. > On LER lookup the respective service (IPV4/6) to find the FEC. > > the FECid used to lookup the NHLFE for the cache entry creation. > > 2.2.5 Configuration IPV4/6 routing: > The ip tool should allow you specify route you want then > specify the FECid for that route, i.e: > ip route ... FECid <FECid> > where FECid is the NHLFE keyid we want to use > Note that multiple FECids in conjunction with the "nexthop" parameter > for Equal Cost Multi Path. > > Of course the route should fail to insert if NHLFE FECid doesnt exist > already. > [??? What would happen if the route nexthop entry and the NHLFE point > to different egress devices?] > > 2.2.6 Configuration for others > > They need to be netlink enabled. At the moment only ipsec is. > > 2.3 ILM (incoming label mapping): > > Typical entries for this table are: label, ingress dev, FECid > Lookup is based on label. > > ILM is used by both LSR or egress LER. > > 2.3.1 ILM packet processing: > > Incoming packets: > - use label to lookup the dst cache via route_input() > - on failure, ILM lookup to find the NHLFE entry > - FECid entry should exist within the ILM table > - create dst cache entry on success > - drop packet on failure > > 2.3.2 Configuration is: > > tc l2conf <cmd> dev <devname> > mpls ilm index <val> label fec <FECid> > > * cmd is one of: <add | del | replace | get> > * devname is the input device to be used > * Index is an additional identifier that could be used to > store LSP info. > * FECid is the FECid to be used for searching the NHLFE. > > 3.0 Allowed OPCODEs > > At the moment the following look valid: > > 3.1 Modifiying opcodes > > - REDIRECT: redirect a packet to a different LSP > (useful for testing or redirecting to a control plane) > - MIRROR: send a copy of a packet somewhere else for further > processing (useful for LSP pings, traceroute, debug etc) > > 3.2 Label action opcodes > > - POP_AND_LOOKUP > - POP_AND_FORWARD > - NO_POP_AND_FORWARD > - DISCARD > > TODO: > 1. look into multi next hop for loadbalancing For LSRs. > Is this necessary? If yes, there has to be multiple FECids > in the ILM table. > 2. Stats for each table which may be tricky with caching. > 3. describe policy for what happens when we have an error. > (example FECid exists in the IPV4 FIB but not in NHLFE; > current policy is drop but we could send this packet to > user space if theres a listening socket etc). The bad > thing about it is it could be used as a DOS. > 4. Labels spaces: Interfaces vs system > 5. List all netlink events we want to throw. > 6. Add used data structures representing tables and other > things like IPV4/6 protocol drivers for NH binding. > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: Ramon C. <cas...@in...> - 2003-12-07 14:09:38
|
James/Hamal/All, Thank you for submitting this document for review/comments. Some thoughts right away (more to follow). For the moment, I'll be giving my impressions w.r.t. the existing implementation and the patches posted on this list. Disclaimer: These are the opinions of my own, and in no way they are written in stone. You'll see that I do not agree with some aspects of the document, but I am open to discussion. A general comment (maybe I'm wrong, let me dig a little more into this) is that it is too rushy to discard the existing implementation without taking the time to understand it (which is logical, since there was 0 documentation no even code comments, but it is something on which I'm working on), but I would like to know in which particular parts the existing implementation is flawed and cannot be corrected / extended / reviewed. For the moment, I have not seen a design flaw or major valid reason to start from scratch. > 2. Tables involved: > We cant ignore these table names because tons of SNMP MIBs exist > which at least talk about them; implementation is a different > issue but at least we should be able to somehow semantically match > them. The tables are the NHLFE, FTN and ILM. > The code should use similar names when possible. Agreed. One of the first things I noticed was the (apparent) lack of these tables in Jim's implementation. Hopefully, the devel-guide will explain this. What I have understood: - the NHLFE is in fact split (which I consider a good design) into two parts : MPLS Incoming Information (MII for short), a struct that has the set of opcodes to execute upon arrival of the incoming packet, and MPLS Outgoing Information (MOI for short) containing the set of opcodes to execute when forwarding the packet. This also decouples w.r.t. labelled packets that are delivered to the host. - The ILM table exists (it is in fact the Incoming Radix Tree). The "ILM" lookup is indeed mpls_get_mii_by_label. It looks the corresponding NHLFE from a Label (which is the MII objet, see my previous paragraph). So, yes: - Some existing symbols should be renamed. For example the function mpls_opcode_peek has been renamed (in my tree anyway) to mpls_label_entry peek. In a similar way, we should define a high level function like "mpls_ilm_lookup" which takes the topmost label entry and gets the corresponding MII.These changes are trivial. - the FTN table is indeed somehow missing, and is basically done in parts using mplsadm: add an outgoing label (with an associated MOI (see below) and then map some traffic to that label (this was mainly done hacking some net/core parts). So the big missing part is FEC management (at the per MPLS-domain ingress node). We *should not* assume that a FEC is always an IPv4/IPv6 adress prefix: * We need a "generic" FEC encoding, so later we can also integrate L2 LSPs EoMPLS, etc. * A new part "classifier" that maps data objects (and not only address prefixes) to FECs. These points are discussed later in this document. > > ILM and FTN derive a FECid from their respective lookups > The result (FECid) is then used to lookup > the NHLFE to determine how to forward the packet. > 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: > This table is looked up using the FEC as the key (maybe > + label space) although label spaces are still in the TOD below. > > A standard structure for NHLFE contains: > - FEC id See my comments (*) below. > - neighbor information (IPV4/6 + egress interface) Yes. It is/should be in the MOI part (that is, the "second half" of the NHLFE) > - MPLS operations to perform The MII/MOI opcodes. With the benefit that if it is locally delivered, theres no need to check the MOI. (*) Comments: I'm afraid that I don't agree here. IMHO, I thing we should not add the "FECid" indirection here. it has several drawbacks: - NHLFE are FEC agnostic. The same NHLFE could be reused for different FECs. This is necessary for example, for LSP merging. - The notion of FEC should only be defined at Ingress LSRs. - W.r.t. the ILM table the FECid *is* the topmost label itself! Explciitly the label represents the FEC w.r.t to a couple upstream/downstream LSRs. The lookup should be label -> NHLFE (MII object). No need to manage FECids (allocation/removal/etc) - In some cases, it is necessary to establish cross-connects without knowing the FEC that will be transported over the LSP (e.g. when working at > 2 hierarchy evels): e.g. Incoming label (+labelspace+interface) -> Outgoing label + outgoing interface. No need to know the FEC here. With the notion of FECid, you have two issues: Label management and FECId management. Let me explain myself a little more here: imagine we have a simple FEC F, 'all packets with @IPdest = A.B.C.D/N', well defined, so it should have a 'locally' unique FECid. This FECid cannot "non ambiguously" be used to look up a NHLFE (e.g. when received over two different interfaces for example). of course the same argument applies to labels, but my point is "let the label only identify itself the FEC", do not add another indirection. > 2.1.1 NHLFE Configuration: > The way i see it being setup is via netlink (this way we can take > advantage of distributed architectures later). Definitely :). For updates/requests. However, this does not preclude the use of procfs/sysfs for exposing ReadOnly simple objects like attributes/labelspaces/etc. > > tc l2conf <cmd> dev <devname> > mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> It is too soon to define a grammar for a userspace application. I would work on the protocol between the kernel MPLS subsystem and the userspace, defining which information objects are required. Netlink datagram format format and then define a userspace app. Moreover, I would propose having a new userspace app (something like mplsadm) rather than patching tc ip route etc, given the fact that most users won't be using MPLS at all. > 2.2 FEC to NHLFE mapping (FTN) Table > > I dont see this table existing by itself. it doesn't :) > Each MPLS interfacing component will derive a FECid which is used > to search the NHLFE table. See my previous comment. The topmost label + the labelspace (or incoming device) + optionnally the upstream router in the case that the downstream cannot know the upstream router *and* has allocated the same label for the same FEC should suffice to derive the NHLFE (MII) to apply. > 2.2.1 IPV4/6 route component FTN > Typically, the FEC will be in the IPV4/6 FIB nexthop entry. > This way we can have equal cost multi path entries > with different FECids. I'll discuss load sharing later in this mail. > > 2.2.2 ingress classification component: > This has nothing to do with FTN rather it provides another mapping to > the NHLFE table. > (when i port tc extension code to 2.6 - we will need a new > skb field called FECid); > *ingress code matches a packet description and then sets the skb->FECid > as an action. We could use the skb->FECid to overrule the FIB FEC > when we are selecting the fast path. Good point. But it should not manage FECids, it should manage NHLFE entries. > [The u32 classifier could be used to map based on any header bits and select > the FECid.] semantically, it does map the "data object" to a FEC, but it does not mean that it needs to explicitely manage FECids. FECids are the labels. If you add the FEC id notion you have two problems: Label management and FECid management. At most, "FECids" (if you really really want to add them) should be only managed at the "per domain I-LSR) > 2.2.3 Tunneling and L2 technologies FTN > Revist this later. Yes! :) Ethernet over MPLS should now be a primary objective > > 2.2.4 NHLFE packet path: > > As in standard Linux, the fast path is first accessed. Two > results: > 1) On success a MPLS cache entry is found and attached to the skb->dst > the skb->dst is used to forward. > 2) On failure a slow path is exercised and a new dst cache is created > from the NHLFE table. Agreed. Nothing to add here. > the FECid used to lookup the NHLFE for the cache entry creation. :) nope!! the "label" should be used. > 2.2.5 Configuration IPV4/6 routing: > The ip tool should allow you specify route you want then > specify the FECid for that route, i.e: Again, too soon to focus on userspace / control plane > [??? What would happen if the route nexthop entry and the NHLFE point > to different egress devices?] The NHLFE overrides the route nexthop. This is the basis of MPLS Traffic Engineering. LSPs are not IGP constrained. > 2.2.6 Configuration for others > > They need to be netlink enabled. At the moment only ipsec is. What others? anyway. A well defined netlink based protocol is, as you state, much needed. > 2.3 ILM (incoming label mapping): > > Typical entries for this table are: label, ingress dev, FECid > Lookup is based on label. and this is the Radix Tree. > ILM is used by both LSR or egress LER. > > 2.3.1 ILM packet processing: > > Incoming packets: > - use label to lookup the dst cache via route_input() The label is used to lookup a MII object from a Radix Tree that holds information regarding what to do with the packet. Only in the case that the packet needs to be forwarded, we obtain a MOI object. It is then that we could: * check the dst_cache as you state. right.. this fits well in mpls_output* family. > 3.2 Label action opcodes what's wrong with the existing opcodes? I see little performance gain in having X_AND_Y opcodes rather than X Y, sequentially. Atomic opcodes in (IMHO) the way to go. Otherwise we'll end up with POP_AND_SWAP_AND_PUSH_AND_MAPEXP. However, I do not want to state that we nned not try to optimize performance later, but chaining opcodes adds great flexibility. > - POP_AND_LOOKUP POP & DLV ? > - POP_AND_FORWARD > - NO_POP_AND_FORWARD FWD > - DISCARD DROP > TODO: > 1. look into multi next hop for loadbalancing For LSRs. > Is this necessary? If yes, there has to be multiple FECids > in the ILM table. I've been working on load balancing in MPLS networks. The "right" approach is as you state, to have several pointed NHLFEs in the ILM table for a given label(+labelspace+..). However, another nice approach is to setup tunnels ("mpls%d") and then use an equalizer algorithm to split the load. This decouples the algorithm from the implementation. To test this, played with having two mpls%d tunnels and use teql which is interface "agnostic" and worked well. In academic research and IETF w.g. we have discussed Load Sharing several times, and the "current" consensus is that it is difficult to implement L.S. in split points other that the "per domain" Ingress LSRs. Since intermediate LSRs are not allowed to look at the IP header, no hash techinques (or not with enough granularity) can be used to make sure that packets beloging to the same microflow are forwarded over the same physical route. In my opinion, what needs to be done - Define a complete framework for FEC management at ingress LSRs, with policies to define: - How to classify "L2/L3 data objects" into FECs, without limiting only to IPvX address prefixes as FECs. How do we encode "if the Ethernet dst addres is a:b:c:d:e:f and the ehtertype is IPX then this data object belongs to FEC F". DEfine related FEC encodings. Let the control plane apps distribute FEC information *as labels*. The non-ambigous incoming label conveys all FEC information, do not add the FECid indirection. - Define the protocol between MPLS subsystem and userspace - Develop (or adapt) a new userspace app (mplsnl in my previous mail) that communicates with the MPLS kernel subsystem in order to Get/Update MPLS tables. - Rewrite (I agree with previous comments that this is the least elegant part of the existing implementation) dst/neigh/hh cache management, Once we have the whole outgoing MPLS PDU rebuilt. - Multicast "barebones" support (or, more adequatelly, point to multipoint LSP support), conceptually similar to Load Sharing: the incoming label + labelspaces + interface + ...(this all means "a non ambiguous incoming label") should determine a set of MIIs to apply. The implementation dependant details are, for example: Are the MIIs to be applied iteratively (e.g. point to multipoint) or one among all (Load Sharing). Things that I don't like from the existing implementation: - The Radix Tree/Label space implementation. since mpls_recv_pckt (the callback when registering the packet handler) contains the incoming device, I'm still analyzing the drawbacks /advantages of having the "ILM global table" split in "per interface" ilm tables . That is, the incoming interface + the topmost incoming table are the "key" to find the MII object in a hash table. Thank you for reading. Best regards, Ramon |
From: Vilyan D. <vdi...@ne...> - 2003-12-08 08:58:43
|
Hello, I have comments on last e-mails here (general notes, not regarding current implementation). First, what the FEC is? RFC3031: forwarding equivalence class a group of IP packets which are forwarded in the same manner (e.g., over the same path, with the same forwarding treatment) Most commonly, a packet is assigned to a FEC based (completely or partially) on its network layer destination address. However, the label is never an encoding of that address. -------------------------------------------------------------------------------------------------------- So, we are free to decide how to classify packets into FECs. We should use at least network layer destination address - so called basic MPLS implementation. Label binding are advertised to other LSRs via some label distriburion protocol. This protocol should describe FECs to its neigbours. Currently LDP (RFC3036) supports 3 types of FEC elements: wildcard, host and prefix (IPv4 and IPv6). Destination prefix only classification is implemented by many vendors. For example see NPF MPLS Implementation Agreement. In other hand draft-ietf-mpls-ftn-mib-09 defines more complex classification rules: mplsFTNTable allows 6-tuple matching rules based on one or more of source address range, destination address range, source port range, destination port range, IPv4 Protocol field [RFC791] or IPv6 next- header field [RFC2460] and the DiffServ Code Point (DSCP, [RFC2474]) to be specified. But I don't see support in label distribution protocols for such FEC elements. Using only destination prefixes for classification will allow us to implement FTN table as extention of IPv4/IPv6 FIB. In this case we don't need separate Classification Engine. So, we should define classification rules. > c) tunnels like IPSEC using the SPI mapped to an MPLS label > d) L2 type of technologies ex VLAN, PPP, ATM etc For transport of L2 frames we should use Pseudo Wires (see IETF PWE3 w.g. documents). We don't need additional classification rules. -- Regards, Vilyan Dimitrov Network Administrator Net Is Sat Ltd. |
From: James R. L. <jl...@mi...> - 2003-12-08 17:31:27
|
See my comments within: On Sun, Dec 07, 2003 at 03:08:44PM +0100, Ramon Casellas wrote: > > > James/Hamal/All, > > Thank you for submitting this document for review/comments. > > Some thoughts right away (more to follow). For the moment, I'll be giving > my impressions w.r.t. the existing implementation and the patches posted > on this list. > > Disclaimer: These are the opinions of my own, and in no way they are > written in stone. You'll see that I do not agree with some aspects of the > document, but I am open to discussion. > > > A general comment (maybe I'm wrong, let me dig a little more into this) > is that it is too rushy to discard the existing implementation without > taking the time to understand it (which is logical, since there was 0 > documentation no even code comments, but it is something on which I'm > working on), but I would like to know in which particular parts the > existing implementation is flawed and cannot be corrected / extended / > reviewed. I agree. Thank you for being the one to state this. > For the moment, I have not seen a design flaw or major valid reason to > start from scratch. > > > > > 2. Tables involved: > > We cant ignore these table names because tons of SNMP MIBs exist > > which at least talk about them; implementation is a different > > issue but at least we should be able to somehow semantically match > > them. The tables are the NHLFE, FTN and ILM. > > The code should use similar names when possible. > One thing to remember is that the names NHLFE, ILM and FTN are terms used when talking about MPLS architectures. These are just logical 'tables' that encompass required functionality. A great example of this is that the LSR MIB does not have ILM or NHLFE tables, but referes to these logical entities in the explaination of the insegment and outsegment tables. So to use the names ILM and NHLFE in any implementation is misleading unless you limit the implementation to be _only_ what is refered to by the MPLS architecture RFC. As everyone knows the architecture RFC is way too generic to try and use as the sole basis of a forwarding plane. > Agreed. One of the first things I noticed was the (apparent) lack of these > tables in Jim's implementation. Hopefully, the devel-guide will explain > this. What I have understood: <snip> > * A new part "classifier" that maps data objects (and not only address > prefixes) to FECs. These points are discussed later in this document. The FTN again is a logical block of funtionality. The only reason a FTN table should exist is so that a particular service can register a binding with it. In otherwords it is informational only. I do not think we want to implement a generic FEC registration mechanism because no matter what we do it will never be flexible enough to handle new FEC definition without re-writting all of the old FEC definition. Leave FEC binding to be extentions to existing tool which already deal with that type of traffic (ie iproute2, iptables, tc, brctl etc). > > ILM and FTN derive a FECid from their respective lookups > > The result (FECid) is then used to lookup > > the NHLFE to determine how to forward the packet. > > 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: > > This table is looked up using the FEC as the key (maybe > > + label space) although label spaces are still in the TOD below. > > > > A standard structure for NHLFE contains: > > - FEC id > > See my comments (*) below. > > > > > - neighbor information (IPV4/6 + egress interface) > > Yes. It is/should be in the MOI part (that is, the "second half" of the > NHLFE) > > > > - MPLS operations to perform > > The MII/MOI opcodes. With the benefit that if it is locally delivered, > theres no need to check the MOI. > > > > (*) Comments: > I'm afraid that I don't agree here. IMHO, I thing we should not add the > "FECid" indirection here. it has several drawbacks: > > - NHLFE are FEC agnostic. The same NHLFE could be reused for different > FECs. This is necessary for example, for LSP merging. > > - The notion of FEC should only be defined at Ingress LSRs. > > - W.r.t. the ILM table the FECid *is* the topmost label itself! > Explciitly the label represents the FEC w.r.t to a couple > upstream/downstream LSRs. The lookup should be label -> NHLFE (MII > object). No need to manage FECids (allocation/removal/etc) > > - In some cases, it is necessary to establish cross-connects without > knowing the FEC that will be transported over the LSP (e.g. when working > at > 2 hierarchy evels): e.g. Incoming label (+labelspace+interface) -> > Outgoing label + outgoing interface. No need to know the FEC here. > > With the notion of FECid, you have two issues: Label management and FECId > management. Let me explain myself a little more here: imagine we have a > simple FEC F, 'all packets with @IPdest = A.B.C.D/N', well defined, so it > should have a 'locally' unique FECid. This FECid cannot "non ambiguously" > be used to look up a NHLFE (e.g. when received over two different > interfaces for example). of course the same argument applies to labels, > but my point is "let the label only identify itself the FEC", do not add > another indirection. I agree. This is kind of the point I was getting at in my previous e-mail. <snip> > > 2.2.3 Tunneling and L2 technologies FTN > > Revist this later. > Yes! :) Ethernet over MPLS should now be a primary objective Take a look at my l2cc code (in my p4 tree). If you ask me this is the correct seperation. l2cc is more then just being able to transport L2 frames over MPLS. It is a generic mechanism for implementing L2 switching and splicing for Linux. If you look at any mature L2 over MPLS implementation it also has the ability to do local L2 switching/splicing. <snip> > > 3.2 Label action opcodes > > what's wrong with the existing opcodes? I see little performance gain in > having X_AND_Y opcodes rather than X Y, sequentially. Atomic opcodes in > (IMHO) the way to go. Otherwise we'll end up with > POP_AND_SWAP_AND_PUSH_AND_MAPEXP. However, I do not want to state that we > nned not try to optimize performance later, but chaining opcodes adds > great flexibility. I agree that tryign to make single OPs that do mulitple things is a little silly. I guess you can just call me a RISC type of guy. > > - POP_AND_LOOKUP > > POP & DLV ? POP and PEEK > > > > - POP_AND_FORWARD > > > > - NO_POP_AND_FORWARD > FWD > > > > - DISCARD > DROP > > > TODO: > > 1. look into multi next hop for loadbalancing For LSRs. > > Is this necessary? If yes, there has to be multiple FECids > > in the ILM table. > > I've been working on load balancing in MPLS networks. The "right" approach > is as you state, to have several pointed NHLFEs in the ILM table for a > given label(+labelspace+..). However, another nice approach is to setup > tunnels ("mpls%d") and then use an equalizer algorithm to split the load. > This decouples the algorithm from the implementation. To test this, played > with having two mpls%d tunnels and use teql which is interface "agnostic" > and worked well. This aproach only works for end to end LSP load balancing. Which should not even be an issue if we expose each possible LSP as a nexthop. Then standard techniques for load balancing can be used. > In academic research and IETF w.g. we have discussed Load Sharing several > times, and the "current" consensus is that it is difficult to implement > L.S. in split points other that the "per domain" Ingress LSRs. Since > intermediate LSRs are not allowed to look at the IP header, no hash > techinques (or not with enough granularity) can be used to make sure that > packets beloging to the same microflow are forwarded over the same > physical route. This has been a pretty hot debate as of late on the MPLS-WG mailing list (as part of the OAM framework draft). In general the accepted technique is to do layer a layer vilolation and look past the MPLS shim. it the first octect is 04/06 then assume it's IP and do a typical microflow load balancing, otherwise use the label stack to create a hash. I have some interesting ideas far areas of experimentation with reguards to this. For now I think we should make the statment that end-to-end and mid-stream load balancing is needed, and should be configured by having muliple out-going labels configured (perhaps the FWD instruction could be an array of out-going labels?) > In my opinion, what needs to be done > > - Define a complete framework for FEC management at ingress LSRs, with > policies to define: > - How to classify "L2/L3 data objects" into FECs, without limiting > only to IPvX address prefixes as FECs. How do we encode "if the Ethernet > dst addres is a:b:c:d:e:f and the ehtertype is IPX then this data object > belongs to FEC F". DEfine related FEC encodings. Let the control plane > apps distribute FEC information *as labels*. The non-ambigous incoming > label conveys all FEC information, do not add the FECid indirection. See my above comment on this. I don't we want to do this. > - Define the protocol between MPLS subsystem and userspace Agreed. > - Develop (or adapt) a new userspace app (mplsnl in my previous > mail) that communicates with the MPLS kernel subsystem in order to > Get/Update MPLS tables. > > - Rewrite (I agree with previous comments that this is the least > elegant part of the existing implementation) dst/neigh/hh cache > management, Once we have the whole outgoing MPLS PDU rebuilt. I agree that the dst/neigh/hh stuff is not pretty. I think this is definitly a place where some kernel gurus could provide some help. > - Multicast "barebones" support (or, more adequatelly, point to > multipoint LSP support), conceptually similar to Load Sharing: the > incoming label + labelspaces + interface + ...(this all means "a non > ambiguous incoming label") should determine a set of MIIs to apply. The > implementation dependant details are, for example: Are the MIIs to be > applied iteratively (e.g. point to multipoint) or one among all (Load > Sharing). > > Things that I don't like from the existing implementation: > > - The Radix Tree/Label space implementation. since mpls_recv_pckt (the > callback when registering the packet handler) contains the incoming > device, I'm still analyzing the drawbacks /advantages of having the "ILM > global table" split in "per interface" ilm tables . That is, the incoming > interface + the topmost incoming table are the "key" to find the MII > object in a hash table. The ramification here is whether or not you want to expose all of the 'label programming' to the applications. Just think of signaling protocols that allocates a label in labelspace 0. Or thing about how you go about allocating an application label (think of the 2nd label used with L2CC ala Martini) I'm not saying the technique I used it the best, but it atleast handles all of the possible uses of labels. This is a good coversation we have going. I hope others join in ;-) -- James R. Leu jl...@mi... |