[mpls-linux-devel] Jamal's MPLS design document
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@mi...> - 2003-12-05 16:07:03
|
Here is the design doc that Jamal has created. Jamal: if there is a newer version please post it in this thread. -- James R. Leu jl...@mi... -------------------------------------------------------------------- 1. Terminology: LER (Label Edge Router): Router which sits at edge of IP(v4/6) and MPLS network. LSR: Router/switch which sits inside MPLS domain. FEC: Forwarding Equivalance class - This is like "classid" concept we have in the QoS code. In QoS it essentially refers to a queue; in MPLS it will refer to a MPLS LSP/tunnel/label-operations to use. 1.1 Ingress LER: Router at ingress of MPLS domain from IP cloud (often confused as Ingress device ;->). Unlabelled packets arrive at Ingress LER and get labelled based on: a) IPV4/6 route setup b) ingress classification rules (use u32 classifier) c) tunnels like IPSEC using the SPI mapped to an MPLS label d) L2 type of technologies ex VLAN, PPP, ATM etc 1.2 Egress LER: Router at egress of MPLS domain towards IP cloud (not to be confused with egress device on Linux). Labelled packets come in and get their labels removed based on some rules. 1.3 LSR: Switching based on labels 2. Tables involved: We cant ignore these table names because tons of SNMP MIBs exist which at least talk about them; implementation is a different issue but at least we should be able to somehow semantically match them. The tables are the NHLFE, FTN and ILM. The code should use similar names when possible. ILM and FTN derive a FECid from their respective lookups The result (FECid) is then used to lookup the NHLFE to determine how to forward the packet. 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: This table is looked up using the FEC as the key (maybe + label space) although label spaces are still in the TOD below. A standard structure for NHLFE contains: - FEC id - neighbor information (IPV4/6 + egress interface) - MPLS operations to perform The data on this table is to be used by other two tables as mentioned earlier. 2.1.1 NHLFE Configuration: The way i see it being setup is via netlink (this way we can take advantage of distributed architectures later). tc l2conf <cmd> dev <devname> mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> <operation set> fec <FECid> operation set := (op <operation>)* * cmd is one of: <add | del | replace | get> * devname is the output device to be used * index could be used to store the LSPid * protocol to be used is one of IPV4 or V6 (used for neighbor binding) * neighbor is either an IPV4 or V6 address; (for neighbor binding) * operation is the MPLS operation to perform followed by its operands if they. Note there could be a series of operations. * FECid is the FEC identifier to be used as the key for searching. 2.2 FEC to NHLFE mapping (FTN) Table I dont see this table existing by itself. Each MPLS interfacing component will derive a FECid which is used to search the NHLFE table. 2.2.1 IPV4/6 route component FTN Typically, the FEC will be in the IPV4/6 FIB nexthop entry. This way we can have equal cost multi path entries with different FECids. 2.2.2 ingress classification component: This has nothing to do with FTN rather it provides another mapping to the NHLFE table. (when i port tc extension code to 2.6 - we will need a new skb field called FECid); *ingress code matches a packet description and then sets the skb->FECid as an action. We could use the skb->FECid to overrule the FIB FEC when we are selecting the fast path. [The u32 classifier could be used to map based on any header bits and select the FECid.] skb->FECid could also be used on egress for QoS/TE purposes. skb->FECid is meaningful even when not set by the tc-extension on ingress; So whenever we extract the FECid from the FTN and the lookup operation is successful you copy FECid from the FIB/FTN to the skb->FECid. 2.2.3 Tunneling and L2 technologies FTN Revist this later. Example IPSEC, tunnels, VLANs etc etc: Again by having the FEC stored in f.e IPSEC specific tables etc you could easily select NHLFE entries and operate on say an IPSEC packet going out. So this is similar to IPV4 and IPV6. Same with the others. 2.2.4 NHLFE packet path: As in standard Linux, the fast path is first accessed. Two results: 1) On success a MPLS cache entry is found and attached to the skb->dst the skb->dst is used to forward. 2) On failure a slow path is exercised and a new dst cache is created from the NHLFE table. There are two slow path sources: forwarded and localy sourced packets are treated by route_output() whereas incoming packets are treated by route_input() On input slow path use the label to lookup the FEC in the ILM. On LER lookup the respective service (IPV4/6) to find the FEC. the FECid used to lookup the NHLFE for the cache entry creation. 2.2.5 Configuration IPV4/6 routing: The ip tool should allow you specify route you want then specify the FECid for that route, i.e: ip route ... FECid <FECid> where FECid is the NHLFE keyid we want to use Note that multiple FECids in conjunction with the "nexthop" parameter for Equal Cost Multi Path. Of course the route should fail to insert if NHLFE FECid doesnt exist already. [??? What would happen if the route nexthop entry and the NHLFE point to different egress devices?] 2.2.6 Configuration for others They need to be netlink enabled. At the moment only ipsec is. 2.3 ILM (incoming label mapping): Typical entries for this table are: label, ingress dev, FECid Lookup is based on label. ILM is used by both LSR or egress LER. 2.3.1 ILM packet processing: Incoming packets: - use label to lookup the dst cache via route_input() - on failure, ILM lookup to find the NHLFE entry - FECid entry should exist within the ILM table - create dst cache entry on success - drop packet on failure 2.3.2 Configuration is: tc l2conf <cmd> dev <devname> mpls ilm index <val> label fec <FECid> * cmd is one of: <add | del | replace | get> * devname is the input device to be used * Index is an additional identifier that could be used to store LSP info. * FECid is the FECid to be used for searching the NHLFE. 3.0 Allowed OPCODEs At the moment the following look valid: 3.1 Modifiying opcodes - REDIRECT: redirect a packet to a different LSP (useful for testing or redirecting to a control plane) - MIRROR: send a copy of a packet somewhere else for further processing (useful for LSP pings, traceroute, debug etc) 3.2 Label action opcodes - POP_AND_LOOKUP - POP_AND_FORWARD - NO_POP_AND_FORWARD - DISCARD TODO: 1. look into multi next hop for loadbalancing For LSRs. Is this necessary? If yes, there has to be multiple FECids in the ILM table. 2. Stats for each table which may be tricky with caching. 3. describe policy for what happens when we have an error. (example FECid exists in the IPV4 FIB but not in NHLFE; current policy is drop but we could send this packet to user space if theres a listening socket etc). The bad thing about it is it could be used as a DOS. 4. Labels spaces: Interfaces vs system 5. List all netlink events we want to throw. 6. Add used data structures representing tables and other things like IPV4/6 protocol drivers for NH binding. |