[mpls-linux-devel] Jamal's MPLS design document

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Here is the design doc that Jamal has created.

Jamal:  if there is a newer version please post it in this thread.

-- 
James R. Leu
jl...@mi...

--------------------------------------------------------------------
1. Terminology:

LER (Label Edge Router): Router which sits at edge of
IP(v4/6) and MPLS network. 
LSR: Router/switch which sits inside MPLS domain.
FEC: Forwarding Equivalance class - This is like "classid" concept
we have in the QoS code. In QoS it essentially refers to a queue;
in MPLS it will refer to a MPLS LSP/tunnel/label-operations to use.

1.1 Ingress LER:
Router at ingress of MPLS domain from IP cloud
(often confused as Ingress device ;->). Unlabelled
packets arrive at Ingress LER and get labelled based on:
a) IPV4/6 route setup
b) ingress classification rules (use u32 classifier)
c) tunnels like IPSEC using the SPI mapped to an MPLS label
d) L2 type of technologies ex VLAN, PPP, ATM etc

1.2 Egress LER: 
Router at egress of MPLS domain towards IP cloud
(not to be confused with egress device on Linux).
Labelled packets come in and get their labels removed 
based on some rules.

1.3 LSR:
Switching based on labels 

2. Tables involved:
We cant ignore these table names because tons of SNMP MIBs exist
which at least talk about them; implementation is a different
issue but at least we should be able to somehow semantically match
them. The tables are the NHLFE, FTN and ILM.
The code should use similar names when possible.

ILM and FTN derive a FECid from their respective lookups
The result (FECid) is then used to lookup 
the NHLFE to determine how to forward the packet.

2.1 Next Hop Label Forwarding Entry (NHLFE) Table:
This table is looked up using the FEC as the key (maybe
+ label space) although label spaces are still in the TOD below. 

A standard structure for NHLFE contains:
- FEC id
- neighbor information (IPV4/6 + egress interface)
- MPLS operations to perform

The data on this table is to be used by other two tables as mentioned
earlier. 

2.1.1 NHLFE Configuration:
The way i see it being setup is via netlink (this way we can take
advantage of distributed architectures later).

tc l2conf <cmd> dev <devname>
mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> 
<operation set> fec <FECid>
operation set := (op <operation>)* 

* cmd is one of:  <add | del | replace | get> 
* devname is the output device to be used
* index could be used to store the LSPid
* protocol to be used is one of IPV4 or V6 (used for neighbor binding) 
* neighbor is either an IPV4 or V6 address; (for neighbor binding)
* operation is the MPLS operation to perform followed by its
operands if they. Note there could be a series of operations.
* FECid is the FEC identifier to be used as the key for searching.

2.2 FEC to NHLFE mapping (FTN) Table

I dont see this table existing by itself.
Each MPLS interfacing component will derive a FECid which is used
to search the NHLFE table.

2.2.1 IPV4/6 route component FTN
Typically, the FEC will be in the IPV4/6 FIB nexthop entry. 
This way we can have equal cost multi path entries 
with different FECids.

2.2.2 ingress classification component:
This has nothing to do with FTN rather it provides another mapping to 
the NHLFE table.
(when i port tc extension code to 2.6 - we will need a new
skb field called FECid); 
*ingress code matches a packet description and then sets the skb->FECid
as an action. We could use the skb->FECid to overrule the FIB FEC
when we are selecting the fast path. 
[The u32 classifier could be used to map based on any header bits and select
the FECid.]
skb->FECid could also be used on egress for QoS/TE purposes. 

skb->FECid is meaningful even when not set by the tc-extension on ingress;
So whenever we extract the FECid from the FTN and the lookup operation
is successful you copy FECid from the FIB/FTN to the skb->FECid.

2.2.3 Tunneling and L2 technologies FTN
Revist this later.
Example IPSEC, tunnels, VLANs etc etc:
Again by having the FEC stored in f.e IPSEC specific tables etc
you could easily select NHLFE entries and operate on say
an IPSEC packet going out. So this is similar to IPV4 and IPV6.
Same with the others.

2.2.4 NHLFE packet path:

As in standard Linux, the fast path is first accessed. Two
results:
1) On success a MPLS cache entry is found and attached to the skb->dst
the skb->dst is used to forward.
2) On failure a slow path is exercised and a new dst cache is created
from the NHLFE table.

There are two slow path sources: forwarded and localy sourced packets
are treated by route_output() whereas incoming packets are treated
by route_input()
On input slow path use the label to lookup the FEC in the ILM.
On LER lookup the respective service (IPV4/6) to find the FEC.

the FECid used to lookup the NHLFE for the cache entry creation.

2.2.5 Configuration IPV4/6 routing:
The ip tool should allow you specify route you want then
specify the FECid for that route, i.e:
ip route ... FECid <FECid>
where FECid is the NHLFE keyid we want to use
Note that multiple FECids in conjunction with the "nexthop" parameter
for Equal Cost Multi Path.

Of course the route should fail to insert if NHLFE FECid doesnt exist
already.
[??? What would happen if the route nexthop entry and the NHLFE point
to different egress devices?]

2.2.6 Configuration for others

They need to be netlink enabled. At the moment only ipsec is.

2.3 ILM (incoming label mapping):

Typical entries for this table are: label, ingress dev, FECid
Lookup is based on label.

ILM is used by both LSR or egress LER. 

2.3.1 ILM packet processing:

Incoming packets:
- use label to lookup the dst cache via route_input()
- on failure, ILM lookup to find the NHLFE entry
	- FECid entry should exist within the ILM table
	- create dst cache entry on success
	- drop packet on failure

2.3.2 Configuration is:

tc l2conf <cmd> dev <devname>
mpls ilm index <val> label fec <FECid>

* cmd is one of:  <add | del | replace | get>
* devname is the input device to be used
* Index is an additional identifier that could be used to
store LSP info.
* FECid is the FECid to be used for searching the NHLFE.

3.0 Allowed OPCODEs

At the moment the following look valid:

3.1 Modifiying opcodes

- REDIRECT: redirect a packet to a different LSP
(useful for testing or redirecting to a control plane)
- MIRROR: send a copy of a packet somewhere else for further
processing (useful for LSP pings, traceroute, debug etc)

3.2 Label action opcodes

- POP_AND_LOOKUP
- POP_AND_FORWARD
- NO_POP_AND_FORWARD
- DISCARD

TODO:
1.  look into multi next hop for loadbalancing For LSRs.
Is this necessary? If yes, there has to be multiple FECids
in the ILM table.
2.  Stats for each table which may be tricky with caching.
3.  describe policy for what happens when we have an error.
(example FECid exists in the IPV4 FIB but not in NHLFE;
current policy is drop but we could send this packet to
user space if theres a listening socket etc). The bad
thing about it is it could be used as a DOS. 
4.  Labels spaces: Interfaces vs system
5. List all netlink events we want to throw.
6. Add used data structures representing tables and other
things like IPV4/6 protocol drivers for NH binding.