mpls-linux-devel Mailing List for MPLS for Linux (Page 31)
Status: Beta
Brought to you by:
jleu
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(7) |
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(5) |
Feb
(73) |
Mar
(22) |
Apr
(21) |
May
|
Jun
|
Jul
(3) |
Aug
(5) |
Sep
(4) |
Oct
(4) |
Nov
(2) |
Dec
(6) |
2005 |
Jan
(5) |
Feb
|
Mar
(6) |
Apr
(11) |
May
(6) |
Jun
(5) |
Jul
(4) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(9) |
Dec
(15) |
2006 |
Jan
(11) |
Feb
(7) |
Mar
(4) |
Apr
(1) |
May
(2) |
Jun
(2) |
Jul
(7) |
Aug
|
Sep
(8) |
Oct
(9) |
Nov
(10) |
Dec
(14) |
2007 |
Jan
(11) |
Feb
(9) |
Mar
(39) |
Apr
(7) |
May
(4) |
Jun
(2) |
Jul
(5) |
Aug
(6) |
Sep
(6) |
Oct
(1) |
Nov
(1) |
Dec
(8) |
2008 |
Jan
|
Feb
(13) |
Mar
(19) |
Apr
(11) |
May
(16) |
Jun
(6) |
Jul
(2) |
Aug
(4) |
Sep
|
Oct
(5) |
Nov
|
Dec
(16) |
2009 |
Jan
(13) |
Feb
(5) |
Mar
|
Apr
|
May
(11) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(2) |
Oct
(8) |
Nov
(16) |
Dec
(15) |
2010 |
Jan
(6) |
Feb
(5) |
Mar
(1) |
Apr
(14) |
May
(42) |
Jun
(4) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
(4) |
Dec
(1) |
2011 |
Jan
(3) |
Feb
|
Mar
|
Apr
(7) |
May
(1) |
Jun
(2) |
Jul
(4) |
Aug
(19) |
Sep
(9) |
Oct
(13) |
Nov
(4) |
Dec
(3) |
2012 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
(11) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
(3) |
Dec
(2) |
2013 |
Jan
(4) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(7) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(2) |
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(2) |
2016 |
Jan
(6) |
Feb
(2) |
Mar
(1) |
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: James R. L. <jl...@mi...> - 2003-12-08 17:31:27
|
See my comments within: On Sun, Dec 07, 2003 at 03:08:44PM +0100, Ramon Casellas wrote: > > > James/Hamal/All, > > Thank you for submitting this document for review/comments. > > Some thoughts right away (more to follow). For the moment, I'll be giving > my impressions w.r.t. the existing implementation and the patches posted > on this list. > > Disclaimer: These are the opinions of my own, and in no way they are > written in stone. You'll see that I do not agree with some aspects of the > document, but I am open to discussion. > > > A general comment (maybe I'm wrong, let me dig a little more into this) > is that it is too rushy to discard the existing implementation without > taking the time to understand it (which is logical, since there was 0 > documentation no even code comments, but it is something on which I'm > working on), but I would like to know in which particular parts the > existing implementation is flawed and cannot be corrected / extended / > reviewed. I agree. Thank you for being the one to state this. > For the moment, I have not seen a design flaw or major valid reason to > start from scratch. > > > > > 2. Tables involved: > > We cant ignore these table names because tons of SNMP MIBs exist > > which at least talk about them; implementation is a different > > issue but at least we should be able to somehow semantically match > > them. The tables are the NHLFE, FTN and ILM. > > The code should use similar names when possible. > One thing to remember is that the names NHLFE, ILM and FTN are terms used when talking about MPLS architectures. These are just logical 'tables' that encompass required functionality. A great example of this is that the LSR MIB does not have ILM or NHLFE tables, but referes to these logical entities in the explaination of the insegment and outsegment tables. So to use the names ILM and NHLFE in any implementation is misleading unless you limit the implementation to be _only_ what is refered to by the MPLS architecture RFC. As everyone knows the architecture RFC is way too generic to try and use as the sole basis of a forwarding plane. > Agreed. One of the first things I noticed was the (apparent) lack of these > tables in Jim's implementation. Hopefully, the devel-guide will explain > this. What I have understood: <snip> > * A new part "classifier" that maps data objects (and not only address > prefixes) to FECs. These points are discussed later in this document. The FTN again is a logical block of funtionality. The only reason a FTN table should exist is so that a particular service can register a binding with it. In otherwords it is informational only. I do not think we want to implement a generic FEC registration mechanism because no matter what we do it will never be flexible enough to handle new FEC definition without re-writting all of the old FEC definition. Leave FEC binding to be extentions to existing tool which already deal with that type of traffic (ie iproute2, iptables, tc, brctl etc). > > ILM and FTN derive a FECid from their respective lookups > > The result (FECid) is then used to lookup > > the NHLFE to determine how to forward the packet. > > 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: > > This table is looked up using the FEC as the key (maybe > > + label space) although label spaces are still in the TOD below. > > > > A standard structure for NHLFE contains: > > - FEC id > > See my comments (*) below. > > > > > - neighbor information (IPV4/6 + egress interface) > > Yes. It is/should be in the MOI part (that is, the "second half" of the > NHLFE) > > > > - MPLS operations to perform > > The MII/MOI opcodes. With the benefit that if it is locally delivered, > theres no need to check the MOI. > > > > (*) Comments: > I'm afraid that I don't agree here. IMHO, I thing we should not add the > "FECid" indirection here. it has several drawbacks: > > - NHLFE are FEC agnostic. The same NHLFE could be reused for different > FECs. This is necessary for example, for LSP merging. > > - The notion of FEC should only be defined at Ingress LSRs. > > - W.r.t. the ILM table the FECid *is* the topmost label itself! > Explciitly the label represents the FEC w.r.t to a couple > upstream/downstream LSRs. The lookup should be label -> NHLFE (MII > object). No need to manage FECids (allocation/removal/etc) > > - In some cases, it is necessary to establish cross-connects without > knowing the FEC that will be transported over the LSP (e.g. when working > at > 2 hierarchy evels): e.g. Incoming label (+labelspace+interface) -> > Outgoing label + outgoing interface. No need to know the FEC here. > > With the notion of FECid, you have two issues: Label management and FECId > management. Let me explain myself a little more here: imagine we have a > simple FEC F, 'all packets with @IPdest = A.B.C.D/N', well defined, so it > should have a 'locally' unique FECid. This FECid cannot "non ambiguously" > be used to look up a NHLFE (e.g. when received over two different > interfaces for example). of course the same argument applies to labels, > but my point is "let the label only identify itself the FEC", do not add > another indirection. I agree. This is kind of the point I was getting at in my previous e-mail. <snip> > > 2.2.3 Tunneling and L2 technologies FTN > > Revist this later. > Yes! :) Ethernet over MPLS should now be a primary objective Take a look at my l2cc code (in my p4 tree). If you ask me this is the correct seperation. l2cc is more then just being able to transport L2 frames over MPLS. It is a generic mechanism for implementing L2 switching and splicing for Linux. If you look at any mature L2 over MPLS implementation it also has the ability to do local L2 switching/splicing. <snip> > > 3.2 Label action opcodes > > what's wrong with the existing opcodes? I see little performance gain in > having X_AND_Y opcodes rather than X Y, sequentially. Atomic opcodes in > (IMHO) the way to go. Otherwise we'll end up with > POP_AND_SWAP_AND_PUSH_AND_MAPEXP. However, I do not want to state that we > nned not try to optimize performance later, but chaining opcodes adds > great flexibility. I agree that tryign to make single OPs that do mulitple things is a little silly. I guess you can just call me a RISC type of guy. > > - POP_AND_LOOKUP > > POP & DLV ? POP and PEEK > > > > - POP_AND_FORWARD > > > > - NO_POP_AND_FORWARD > FWD > > > > - DISCARD > DROP > > > TODO: > > 1. look into multi next hop for loadbalancing For LSRs. > > Is this necessary? If yes, there has to be multiple FECids > > in the ILM table. > > I've been working on load balancing in MPLS networks. The "right" approach > is as you state, to have several pointed NHLFEs in the ILM table for a > given label(+labelspace+..). However, another nice approach is to setup > tunnels ("mpls%d") and then use an equalizer algorithm to split the load. > This decouples the algorithm from the implementation. To test this, played > with having two mpls%d tunnels and use teql which is interface "agnostic" > and worked well. This aproach only works for end to end LSP load balancing. Which should not even be an issue if we expose each possible LSP as a nexthop. Then standard techniques for load balancing can be used. > In academic research and IETF w.g. we have discussed Load Sharing several > times, and the "current" consensus is that it is difficult to implement > L.S. in split points other that the "per domain" Ingress LSRs. Since > intermediate LSRs are not allowed to look at the IP header, no hash > techinques (or not with enough granularity) can be used to make sure that > packets beloging to the same microflow are forwarded over the same > physical route. This has been a pretty hot debate as of late on the MPLS-WG mailing list (as part of the OAM framework draft). In general the accepted technique is to do layer a layer vilolation and look past the MPLS shim. it the first octect is 04/06 then assume it's IP and do a typical microflow load balancing, otherwise use the label stack to create a hash. I have some interesting ideas far areas of experimentation with reguards to this. For now I think we should make the statment that end-to-end and mid-stream load balancing is needed, and should be configured by having muliple out-going labels configured (perhaps the FWD instruction could be an array of out-going labels?) > In my opinion, what needs to be done > > - Define a complete framework for FEC management at ingress LSRs, with > policies to define: > - How to classify "L2/L3 data objects" into FECs, without limiting > only to IPvX address prefixes as FECs. How do we encode "if the Ethernet > dst addres is a:b:c:d:e:f and the ehtertype is IPX then this data object > belongs to FEC F". DEfine related FEC encodings. Let the control plane > apps distribute FEC information *as labels*. The non-ambigous incoming > label conveys all FEC information, do not add the FECid indirection. See my above comment on this. I don't we want to do this. > - Define the protocol between MPLS subsystem and userspace Agreed. > - Develop (or adapt) a new userspace app (mplsnl in my previous > mail) that communicates with the MPLS kernel subsystem in order to > Get/Update MPLS tables. > > - Rewrite (I agree with previous comments that this is the least > elegant part of the existing implementation) dst/neigh/hh cache > management, Once we have the whole outgoing MPLS PDU rebuilt. I agree that the dst/neigh/hh stuff is not pretty. I think this is definitly a place where some kernel gurus could provide some help. > - Multicast "barebones" support (or, more adequatelly, point to > multipoint LSP support), conceptually similar to Load Sharing: the > incoming label + labelspaces + interface + ...(this all means "a non > ambiguous incoming label") should determine a set of MIIs to apply. The > implementation dependant details are, for example: Are the MIIs to be > applied iteratively (e.g. point to multipoint) or one among all (Load > Sharing). > > Things that I don't like from the existing implementation: > > - The Radix Tree/Label space implementation. since mpls_recv_pckt (the > callback when registering the packet handler) contains the incoming > device, I'm still analyzing the drawbacks /advantages of having the "ILM > global table" split in "per interface" ilm tables . That is, the incoming > interface + the topmost incoming table are the "key" to find the MII > object in a hash table. The ramification here is whether or not you want to expose all of the 'label programming' to the applications. Just think of signaling protocols that allocates a label in labelspace 0. Or thing about how you go about allocating an application label (think of the 2nd label used with L2CC ala Martini) I'm not saying the technique I used it the best, but it atleast handles all of the possible uses of labels. This is a good coversation we have going. I hope others join in ;-) -- James R. Leu jl...@mi... |
From: Vilyan D. <vdi...@ne...> - 2003-12-08 08:58:43
|
Hello, I have comments on last e-mails here (general notes, not regarding current implementation). First, what the FEC is? RFC3031: forwarding equivalence class a group of IP packets which are forwarded in the same manner (e.g., over the same path, with the same forwarding treatment) Most commonly, a packet is assigned to a FEC based (completely or partially) on its network layer destination address. However, the label is never an encoding of that address. -------------------------------------------------------------------------------------------------------- So, we are free to decide how to classify packets into FECs. We should use at least network layer destination address - so called basic MPLS implementation. Label binding are advertised to other LSRs via some label distriburion protocol. This protocol should describe FECs to its neigbours. Currently LDP (RFC3036) supports 3 types of FEC elements: wildcard, host and prefix (IPv4 and IPv6). Destination prefix only classification is implemented by many vendors. For example see NPF MPLS Implementation Agreement. In other hand draft-ietf-mpls-ftn-mib-09 defines more complex classification rules: mplsFTNTable allows 6-tuple matching rules based on one or more of source address range, destination address range, source port range, destination port range, IPv4 Protocol field [RFC791] or IPv6 next- header field [RFC2460] and the DiffServ Code Point (DSCP, [RFC2474]) to be specified. But I don't see support in label distribution protocols for such FEC elements. Using only destination prefixes for classification will allow us to implement FTN table as extention of IPv4/IPv6 FIB. In this case we don't need separate Classification Engine. So, we should define classification rules. > c) tunnels like IPSEC using the SPI mapped to an MPLS label > d) L2 type of technologies ex VLAN, PPP, ATM etc For transport of L2 frames we should use Pseudo Wires (see IETF PWE3 w.g. documents). We don't need additional classification rules. -- Regards, Vilyan Dimitrov Network Administrator Net Is Sat Ltd. |
From: Ramon C. <cas...@in...> - 2003-12-07 14:09:38
|
James/Hamal/All, Thank you for submitting this document for review/comments. Some thoughts right away (more to follow). For the moment, I'll be giving my impressions w.r.t. the existing implementation and the patches posted on this list. Disclaimer: These are the opinions of my own, and in no way they are written in stone. You'll see that I do not agree with some aspects of the document, but I am open to discussion. A general comment (maybe I'm wrong, let me dig a little more into this) is that it is too rushy to discard the existing implementation without taking the time to understand it (which is logical, since there was 0 documentation no even code comments, but it is something on which I'm working on), but I would like to know in which particular parts the existing implementation is flawed and cannot be corrected / extended / reviewed. For the moment, I have not seen a design flaw or major valid reason to start from scratch. > 2. Tables involved: > We cant ignore these table names because tons of SNMP MIBs exist > which at least talk about them; implementation is a different > issue but at least we should be able to somehow semantically match > them. The tables are the NHLFE, FTN and ILM. > The code should use similar names when possible. Agreed. One of the first things I noticed was the (apparent) lack of these tables in Jim's implementation. Hopefully, the devel-guide will explain this. What I have understood: - the NHLFE is in fact split (which I consider a good design) into two parts : MPLS Incoming Information (MII for short), a struct that has the set of opcodes to execute upon arrival of the incoming packet, and MPLS Outgoing Information (MOI for short) containing the set of opcodes to execute when forwarding the packet. This also decouples w.r.t. labelled packets that are delivered to the host. - The ILM table exists (it is in fact the Incoming Radix Tree). The "ILM" lookup is indeed mpls_get_mii_by_label. It looks the corresponding NHLFE from a Label (which is the MII objet, see my previous paragraph). So, yes: - Some existing symbols should be renamed. For example the function mpls_opcode_peek has been renamed (in my tree anyway) to mpls_label_entry peek. In a similar way, we should define a high level function like "mpls_ilm_lookup" which takes the topmost label entry and gets the corresponding MII.These changes are trivial. - the FTN table is indeed somehow missing, and is basically done in parts using mplsadm: add an outgoing label (with an associated MOI (see below) and then map some traffic to that label (this was mainly done hacking some net/core parts). So the big missing part is FEC management (at the per MPLS-domain ingress node). We *should not* assume that a FEC is always an IPv4/IPv6 adress prefix: * We need a "generic" FEC encoding, so later we can also integrate L2 LSPs EoMPLS, etc. * A new part "classifier" that maps data objects (and not only address prefixes) to FECs. These points are discussed later in this document. > > ILM and FTN derive a FECid from their respective lookups > The result (FECid) is then used to lookup > the NHLFE to determine how to forward the packet. > 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: > This table is looked up using the FEC as the key (maybe > + label space) although label spaces are still in the TOD below. > > A standard structure for NHLFE contains: > - FEC id See my comments (*) below. > - neighbor information (IPV4/6 + egress interface) Yes. It is/should be in the MOI part (that is, the "second half" of the NHLFE) > - MPLS operations to perform The MII/MOI opcodes. With the benefit that if it is locally delivered, theres no need to check the MOI. (*) Comments: I'm afraid that I don't agree here. IMHO, I thing we should not add the "FECid" indirection here. it has several drawbacks: - NHLFE are FEC agnostic. The same NHLFE could be reused for different FECs. This is necessary for example, for LSP merging. - The notion of FEC should only be defined at Ingress LSRs. - W.r.t. the ILM table the FECid *is* the topmost label itself! Explciitly the label represents the FEC w.r.t to a couple upstream/downstream LSRs. The lookup should be label -> NHLFE (MII object). No need to manage FECids (allocation/removal/etc) - In some cases, it is necessary to establish cross-connects without knowing the FEC that will be transported over the LSP (e.g. when working at > 2 hierarchy evels): e.g. Incoming label (+labelspace+interface) -> Outgoing label + outgoing interface. No need to know the FEC here. With the notion of FECid, you have two issues: Label management and FECId management. Let me explain myself a little more here: imagine we have a simple FEC F, 'all packets with @IPdest = A.B.C.D/N', well defined, so it should have a 'locally' unique FECid. This FECid cannot "non ambiguously" be used to look up a NHLFE (e.g. when received over two different interfaces for example). of course the same argument applies to labels, but my point is "let the label only identify itself the FEC", do not add another indirection. > 2.1.1 NHLFE Configuration: > The way i see it being setup is via netlink (this way we can take > advantage of distributed architectures later). Definitely :). For updates/requests. However, this does not preclude the use of procfs/sysfs for exposing ReadOnly simple objects like attributes/labelspaces/etc. > > tc l2conf <cmd> dev <devname> > mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> It is too soon to define a grammar for a userspace application. I would work on the protocol between the kernel MPLS subsystem and the userspace, defining which information objects are required. Netlink datagram format format and then define a userspace app. Moreover, I would propose having a new userspace app (something like mplsadm) rather than patching tc ip route etc, given the fact that most users won't be using MPLS at all. > 2.2 FEC to NHLFE mapping (FTN) Table > > I dont see this table existing by itself. it doesn't :) > Each MPLS interfacing component will derive a FECid which is used > to search the NHLFE table. See my previous comment. The topmost label + the labelspace (or incoming device) + optionnally the upstream router in the case that the downstream cannot know the upstream router *and* has allocated the same label for the same FEC should suffice to derive the NHLFE (MII) to apply. > 2.2.1 IPV4/6 route component FTN > Typically, the FEC will be in the IPV4/6 FIB nexthop entry. > This way we can have equal cost multi path entries > with different FECids. I'll discuss load sharing later in this mail. > > 2.2.2 ingress classification component: > This has nothing to do with FTN rather it provides another mapping to > the NHLFE table. > (when i port tc extension code to 2.6 - we will need a new > skb field called FECid); > *ingress code matches a packet description and then sets the skb->FECid > as an action. We could use the skb->FECid to overrule the FIB FEC > when we are selecting the fast path. Good point. But it should not manage FECids, it should manage NHLFE entries. > [The u32 classifier could be used to map based on any header bits and select > the FECid.] semantically, it does map the "data object" to a FEC, but it does not mean that it needs to explicitely manage FECids. FECids are the labels. If you add the FEC id notion you have two problems: Label management and FECid management. At most, "FECids" (if you really really want to add them) should be only managed at the "per domain I-LSR) > 2.2.3 Tunneling and L2 technologies FTN > Revist this later. Yes! :) Ethernet over MPLS should now be a primary objective > > 2.2.4 NHLFE packet path: > > As in standard Linux, the fast path is first accessed. Two > results: > 1) On success a MPLS cache entry is found and attached to the skb->dst > the skb->dst is used to forward. > 2) On failure a slow path is exercised and a new dst cache is created > from the NHLFE table. Agreed. Nothing to add here. > the FECid used to lookup the NHLFE for the cache entry creation. :) nope!! the "label" should be used. > 2.2.5 Configuration IPV4/6 routing: > The ip tool should allow you specify route you want then > specify the FECid for that route, i.e: Again, too soon to focus on userspace / control plane > [??? What would happen if the route nexthop entry and the NHLFE point > to different egress devices?] The NHLFE overrides the route nexthop. This is the basis of MPLS Traffic Engineering. LSPs are not IGP constrained. > 2.2.6 Configuration for others > > They need to be netlink enabled. At the moment only ipsec is. What others? anyway. A well defined netlink based protocol is, as you state, much needed. > 2.3 ILM (incoming label mapping): > > Typical entries for this table are: label, ingress dev, FECid > Lookup is based on label. and this is the Radix Tree. > ILM is used by both LSR or egress LER. > > 2.3.1 ILM packet processing: > > Incoming packets: > - use label to lookup the dst cache via route_input() The label is used to lookup a MII object from a Radix Tree that holds information regarding what to do with the packet. Only in the case that the packet needs to be forwarded, we obtain a MOI object. It is then that we could: * check the dst_cache as you state. right.. this fits well in mpls_output* family. > 3.2 Label action opcodes what's wrong with the existing opcodes? I see little performance gain in having X_AND_Y opcodes rather than X Y, sequentially. Atomic opcodes in (IMHO) the way to go. Otherwise we'll end up with POP_AND_SWAP_AND_PUSH_AND_MAPEXP. However, I do not want to state that we nned not try to optimize performance later, but chaining opcodes adds great flexibility. > - POP_AND_LOOKUP POP & DLV ? > - POP_AND_FORWARD > - NO_POP_AND_FORWARD FWD > - DISCARD DROP > TODO: > 1. look into multi next hop for loadbalancing For LSRs. > Is this necessary? If yes, there has to be multiple FECids > in the ILM table. I've been working on load balancing in MPLS networks. The "right" approach is as you state, to have several pointed NHLFEs in the ILM table for a given label(+labelspace+..). However, another nice approach is to setup tunnels ("mpls%d") and then use an equalizer algorithm to split the load. This decouples the algorithm from the implementation. To test this, played with having two mpls%d tunnels and use teql which is interface "agnostic" and worked well. In academic research and IETF w.g. we have discussed Load Sharing several times, and the "current" consensus is that it is difficult to implement L.S. in split points other that the "per domain" Ingress LSRs. Since intermediate LSRs are not allowed to look at the IP header, no hash techinques (or not with enough granularity) can be used to make sure that packets beloging to the same microflow are forwarded over the same physical route. In my opinion, what needs to be done - Define a complete framework for FEC management at ingress LSRs, with policies to define: - How to classify "L2/L3 data objects" into FECs, without limiting only to IPvX address prefixes as FECs. How do we encode "if the Ethernet dst addres is a:b:c:d:e:f and the ehtertype is IPX then this data object belongs to FEC F". DEfine related FEC encodings. Let the control plane apps distribute FEC information *as labels*. The non-ambigous incoming label conveys all FEC information, do not add the FECid indirection. - Define the protocol between MPLS subsystem and userspace - Develop (or adapt) a new userspace app (mplsnl in my previous mail) that communicates with the MPLS kernel subsystem in order to Get/Update MPLS tables. - Rewrite (I agree with previous comments that this is the least elegant part of the existing implementation) dst/neigh/hh cache management, Once we have the whole outgoing MPLS PDU rebuilt. - Multicast "barebones" support (or, more adequatelly, point to multipoint LSP support), conceptually similar to Load Sharing: the incoming label + labelspaces + interface + ...(this all means "a non ambiguous incoming label") should determine a set of MIIs to apply. The implementation dependant details are, for example: Are the MIIs to be applied iteratively (e.g. point to multipoint) or one among all (Load Sharing). Things that I don't like from the existing implementation: - The Radix Tree/Label space implementation. since mpls_recv_pckt (the callback when registering the packet handler) contains the incoming device, I'm still analyzing the drawbacks /advantages of having the "ILM global table" split in "per interface" ilm tables . That is, the incoming interface + the topmost incoming table are the "key" to find the MII object in a hash table. Thank you for reading. Best regards, Ramon |
From: James R. L. <jl...@mi...> - 2003-12-05 16:49:41
|
I would like to start with one comment that has cascading effect throughout the document. I would like to clarify it, then either update the document to reflect, or go through each use of it and modify if needed. FECid - maybe this is just a naming thing, but the term FEC should never show up in any table other then the FTN, nor should it show up in any part of the MPLS forwarding plan. FECs are only valid in the FTN (duh FEC to NHFLE) and in the 'services' that use MPLS. For example in IPv[4|6] you can define a FEC as a specific entry in the FIB. Thus it makes sense to use the term FEC when talking about how IPv[4|6] will 'bind' to NHLFEs. Although, as Jamal stated, there will probably never exist a single FTN table. Each 'service' that utilizes MPLS will store there own FEC to NHLFE mapping. Thus we can change the IPv[4|6] example from above to say that a specific entry in the FIB will refer to a NHLFE. So to sum it all up, I don't think we should use FECids in any part of the net or MPLS stack. Instead we should be using the NHLFEid. Remeber that NHLFEs are indexed independent of the label value(s) they hold and indepent of the nexthop it points to. In the LSR MIB out-segments have an index which is generated by the system (most of the time sequentially increasing). So in section 2 we should change: ILM and FTN derive a FECid from their respective lookups The result (FECid) is then used to lookup the NHLFE to determine how to forward the packet. to: ILM and FTN derive a NHLFEid from their repective lookups the resulting NHLFEid is use to lookup the NHLFE to determine how to forward the packet. Again, this might all me naming issues, but to me the confusion of FEC vs NHLFE is a fundamental one. On Fri, Dec 05, 2003 at 10:06:00AM -0600, James R. Leu wrote: > Here is the design doc that Jamal has created. > > Jamal: if there is a newer version please post it in this thread. > > -- > James R. Leu > jl...@mi... > > -------------------------------------------------------------------- > 1. Terminology: > > LER (Label Edge Router): Router which sits at edge of > IP(v4/6) and MPLS network. > LSR: Router/switch which sits inside MPLS domain. > FEC: Forwarding Equivalance class - This is like "classid" concept > we have in the QoS code. In QoS it essentially refers to a queue; > in MPLS it will refer to a MPLS LSP/tunnel/label-operations to use. > > 1.1 Ingress LER: > Router at ingress of MPLS domain from IP cloud > (often confused as Ingress device ;->). Unlabelled > packets arrive at Ingress LER and get labelled based on: > a) IPV4/6 route setup > b) ingress classification rules (use u32 classifier) > c) tunnels like IPSEC using the SPI mapped to an MPLS label > d) L2 type of technologies ex VLAN, PPP, ATM etc > > 1.2 Egress LER: > Router at egress of MPLS domain towards IP cloud > (not to be confused with egress device on Linux). > Labelled packets come in and get their labels removed > based on some rules. > > 1.3 LSR: > Switching based on labels > > 2. Tables involved: > We cant ignore these table names because tons of SNMP MIBs exist > which at least talk about them; implementation is a different > issue but at least we should be able to somehow semantically match > them. The tables are the NHLFE, FTN and ILM. > The code should use similar names when possible. > > ILM and FTN derive a FECid from their respective lookups > The result (FECid) is then used to lookup > the NHLFE to determine how to forward the packet. > > 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: > This table is looked up using the FEC as the key (maybe > + label space) although label spaces are still in the TOD below. > > A standard structure for NHLFE contains: > - FEC id > - neighbor information (IPV4/6 + egress interface) > - MPLS operations to perform > > The data on this table is to be used by other two tables as mentioned > earlier. > > 2.1.1 NHLFE Configuration: > The way i see it being setup is via netlink (this way we can take > advantage of distributed architectures later). > > tc l2conf <cmd> dev <devname> > mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> > <operation set> fec <FECid> > operation set := (op <operation>)* > > * cmd is one of: <add | del | replace | get> > * devname is the output device to be used > * index could be used to store the LSPid > * protocol to be used is one of IPV4 or V6 (used for neighbor binding) > * neighbor is either an IPV4 or V6 address; (for neighbor binding) > * operation is the MPLS operation to perform followed by its > operands if they. Note there could be a series of operations. > * FECid is the FEC identifier to be used as the key for searching. > > > 2.2 FEC to NHLFE mapping (FTN) Table > > I dont see this table existing by itself. > Each MPLS interfacing component will derive a FECid which is used > to search the NHLFE table. > > 2.2.1 IPV4/6 route component FTN > Typically, the FEC will be in the IPV4/6 FIB nexthop entry. > This way we can have equal cost multi path entries > with different FECids. > > 2.2.2 ingress classification component: > This has nothing to do with FTN rather it provides another mapping to > the NHLFE table. > (when i port tc extension code to 2.6 - we will need a new > skb field called FECid); > *ingress code matches a packet description and then sets the skb->FECid > as an action. We could use the skb->FECid to overrule the FIB FEC > when we are selecting the fast path. > [The u32 classifier could be used to map based on any header bits and select > the FECid.] > skb->FECid could also be used on egress for QoS/TE purposes. > > skb->FECid is meaningful even when not set by the tc-extension on ingress; > So whenever we extract the FECid from the FTN and the lookup operation > is successful you copy FECid from the FIB/FTN to the skb->FECid. > > 2.2.3 Tunneling and L2 technologies FTN > Revist this later. > Example IPSEC, tunnels, VLANs etc etc: > Again by having the FEC stored in f.e IPSEC specific tables etc > you could easily select NHLFE entries and operate on say > an IPSEC packet going out. So this is similar to IPV4 and IPV6. > Same with the others. > > 2.2.4 NHLFE packet path: > > As in standard Linux, the fast path is first accessed. Two > results: > 1) On success a MPLS cache entry is found and attached to the skb->dst > the skb->dst is used to forward. > 2) On failure a slow path is exercised and a new dst cache is created > from the NHLFE table. > > There are two slow path sources: forwarded and localy sourced packets > are treated by route_output() whereas incoming packets are treated > by route_input() > On input slow path use the label to lookup the FEC in the ILM. > On LER lookup the respective service (IPV4/6) to find the FEC. > > the FECid used to lookup the NHLFE for the cache entry creation. > > 2.2.5 Configuration IPV4/6 routing: > The ip tool should allow you specify route you want then > specify the FECid for that route, i.e: > ip route ... FECid <FECid> > where FECid is the NHLFE keyid we want to use > Note that multiple FECids in conjunction with the "nexthop" parameter > for Equal Cost Multi Path. > > Of course the route should fail to insert if NHLFE FECid doesnt exist > already. > [??? What would happen if the route nexthop entry and the NHLFE point > to different egress devices?] > > 2.2.6 Configuration for others > > They need to be netlink enabled. At the moment only ipsec is. > > 2.3 ILM (incoming label mapping): > > Typical entries for this table are: label, ingress dev, FECid > Lookup is based on label. > > ILM is used by both LSR or egress LER. > > 2.3.1 ILM packet processing: > > Incoming packets: > - use label to lookup the dst cache via route_input() > - on failure, ILM lookup to find the NHLFE entry > - FECid entry should exist within the ILM table > - create dst cache entry on success > - drop packet on failure > > 2.3.2 Configuration is: > > tc l2conf <cmd> dev <devname> > mpls ilm index <val> label fec <FECid> > > * cmd is one of: <add | del | replace | get> > * devname is the input device to be used > * Index is an additional identifier that could be used to > store LSP info. > * FECid is the FECid to be used for searching the NHLFE. > > 3.0 Allowed OPCODEs > > At the moment the following look valid: > > 3.1 Modifiying opcodes > > - REDIRECT: redirect a packet to a different LSP > (useful for testing or redirecting to a control plane) > - MIRROR: send a copy of a packet somewhere else for further > processing (useful for LSP pings, traceroute, debug etc) > > 3.2 Label action opcodes > > - POP_AND_LOOKUP > - POP_AND_FORWARD > - NO_POP_AND_FORWARD > - DISCARD > > TODO: > 1. look into multi next hop for loadbalancing For LSRs. > Is this necessary? If yes, there has to be multiple FECids > in the ILM table. > 2. Stats for each table which may be tricky with caching. > 3. describe policy for what happens when we have an error. > (example FECid exists in the IPV4 FIB but not in NHLFE; > current policy is drop but we could send this packet to > user space if theres a listening socket etc). The bad > thing about it is it could be used as a DOS. > 4. Labels spaces: Interfaces vs system > 5. List all netlink events we want to throw. > 6. Add used data structures representing tables and other > things like IPV4/6 protocol drivers for NH binding. > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: Ramon C. <cas...@in...> - 2003-12-05 16:15:47
|
Hi all, I'll be reading this doc over the weekend. Also, please note that a (lame && ongoing) effort to document the current implementation is at : http://perso.enst.fr/~casellas/mpls-linux/index.html It's a work in progress, but I'll be working on this this W.E. R. |
From: James R. L. <jl...@mi...> - 2003-12-05 16:07:03
|
Here is the design doc that Jamal has created. Jamal: if there is a newer version please post it in this thread. -- James R. Leu jl...@mi... -------------------------------------------------------------------- 1. Terminology: LER (Label Edge Router): Router which sits at edge of IP(v4/6) and MPLS network. LSR: Router/switch which sits inside MPLS domain. FEC: Forwarding Equivalance class - This is like "classid" concept we have in the QoS code. In QoS it essentially refers to a queue; in MPLS it will refer to a MPLS LSP/tunnel/label-operations to use. 1.1 Ingress LER: Router at ingress of MPLS domain from IP cloud (often confused as Ingress device ;->). Unlabelled packets arrive at Ingress LER and get labelled based on: a) IPV4/6 route setup b) ingress classification rules (use u32 classifier) c) tunnels like IPSEC using the SPI mapped to an MPLS label d) L2 type of technologies ex VLAN, PPP, ATM etc 1.2 Egress LER: Router at egress of MPLS domain towards IP cloud (not to be confused with egress device on Linux). Labelled packets come in and get their labels removed based on some rules. 1.3 LSR: Switching based on labels 2. Tables involved: We cant ignore these table names because tons of SNMP MIBs exist which at least talk about them; implementation is a different issue but at least we should be able to somehow semantically match them. The tables are the NHLFE, FTN and ILM. The code should use similar names when possible. ILM and FTN derive a FECid from their respective lookups The result (FECid) is then used to lookup the NHLFE to determine how to forward the packet. 2.1 Next Hop Label Forwarding Entry (NHLFE) Table: This table is looked up using the FEC as the key (maybe + label space) although label spaces are still in the TOD below. A standard structure for NHLFE contains: - FEC id - neighbor information (IPV4/6 + egress interface) - MPLS operations to perform The data on this table is to be used by other two tables as mentioned earlier. 2.1.1 NHLFE Configuration: The way i see it being setup is via netlink (this way we can take advantage of distributed architectures later). tc l2conf <cmd> dev <devname> mpls nhlfe index <val> proto <ipv4|ipv6> nh <neighbor> <operation set> fec <FECid> operation set := (op <operation>)* * cmd is one of: <add | del | replace | get> * devname is the output device to be used * index could be used to store the LSPid * protocol to be used is one of IPV4 or V6 (used for neighbor binding) * neighbor is either an IPV4 or V6 address; (for neighbor binding) * operation is the MPLS operation to perform followed by its operands if they. Note there could be a series of operations. * FECid is the FEC identifier to be used as the key for searching. 2.2 FEC to NHLFE mapping (FTN) Table I dont see this table existing by itself. Each MPLS interfacing component will derive a FECid which is used to search the NHLFE table. 2.2.1 IPV4/6 route component FTN Typically, the FEC will be in the IPV4/6 FIB nexthop entry. This way we can have equal cost multi path entries with different FECids. 2.2.2 ingress classification component: This has nothing to do with FTN rather it provides another mapping to the NHLFE table. (when i port tc extension code to 2.6 - we will need a new skb field called FECid); *ingress code matches a packet description and then sets the skb->FECid as an action. We could use the skb->FECid to overrule the FIB FEC when we are selecting the fast path. [The u32 classifier could be used to map based on any header bits and select the FECid.] skb->FECid could also be used on egress for QoS/TE purposes. skb->FECid is meaningful even when not set by the tc-extension on ingress; So whenever we extract the FECid from the FTN and the lookup operation is successful you copy FECid from the FIB/FTN to the skb->FECid. 2.2.3 Tunneling and L2 technologies FTN Revist this later. Example IPSEC, tunnels, VLANs etc etc: Again by having the FEC stored in f.e IPSEC specific tables etc you could easily select NHLFE entries and operate on say an IPSEC packet going out. So this is similar to IPV4 and IPV6. Same with the others. 2.2.4 NHLFE packet path: As in standard Linux, the fast path is first accessed. Two results: 1) On success a MPLS cache entry is found and attached to the skb->dst the skb->dst is used to forward. 2) On failure a slow path is exercised and a new dst cache is created from the NHLFE table. There are two slow path sources: forwarded and localy sourced packets are treated by route_output() whereas incoming packets are treated by route_input() On input slow path use the label to lookup the FEC in the ILM. On LER lookup the respective service (IPV4/6) to find the FEC. the FECid used to lookup the NHLFE for the cache entry creation. 2.2.5 Configuration IPV4/6 routing: The ip tool should allow you specify route you want then specify the FECid for that route, i.e: ip route ... FECid <FECid> where FECid is the NHLFE keyid we want to use Note that multiple FECids in conjunction with the "nexthop" parameter for Equal Cost Multi Path. Of course the route should fail to insert if NHLFE FECid doesnt exist already. [??? What would happen if the route nexthop entry and the NHLFE point to different egress devices?] 2.2.6 Configuration for others They need to be netlink enabled. At the moment only ipsec is. 2.3 ILM (incoming label mapping): Typical entries for this table are: label, ingress dev, FECid Lookup is based on label. ILM is used by both LSR or egress LER. 2.3.1 ILM packet processing: Incoming packets: - use label to lookup the dst cache via route_input() - on failure, ILM lookup to find the NHLFE entry - FECid entry should exist within the ILM table - create dst cache entry on success - drop packet on failure 2.3.2 Configuration is: tc l2conf <cmd> dev <devname> mpls ilm index <val> label fec <FECid> * cmd is one of: <add | del | replace | get> * devname is the input device to be used * Index is an additional identifier that could be used to store LSP info. * FECid is the FECid to be used for searching the NHLFE. 3.0 Allowed OPCODEs At the moment the following look valid: 3.1 Modifiying opcodes - REDIRECT: redirect a packet to a different LSP (useful for testing or redirecting to a control plane) - MIRROR: send a copy of a packet somewhere else for further processing (useful for LSP pings, traceroute, debug etc) 3.2 Label action opcodes - POP_AND_LOOKUP - POP_AND_FORWARD - NO_POP_AND_FORWARD - DISCARD TODO: 1. look into multi next hop for loadbalancing For LSRs. Is this necessary? If yes, there has to be multiple FECids in the ILM table. 2. Stats for each table which may be tricky with caching. 3. describe policy for what happens when we have an error. (example FECid exists in the IPV4 FIB but not in NHLFE; current policy is drop but we could send this packet to user space if theres a listening socket etc). The bad thing about it is it could be used as a DOS. 4. Labels spaces: Interfaces vs system 5. List all netlink events we want to throw. 6. Add used data structures representing tables and other things like IPV4/6 protocol drivers for NH binding. |
From: Harald W. <la...@ne...> - 2003-12-02 07:08:33
|
On Sun, Nov 23, 2003 at 02:13:32PM +0100, Ramon Casellas wrote: > Hi Harald, >=20 > Thanks for your anwser. Thanks for creating the new mpls list ;) > * Jamal and Dave are working on something. As soon as they release the > code we can get a whole picture of the state of the project. I don't know > the extent of their rewrite. Maybe jamal and/or Dave can give us an > overview of their changes. We'll see. that would definitely be interesting. >=20 > * In the meantime, we are syncing efforts with James. Right know we have a > patch that applies to test9. It has mostly been a RFC (Rewrite, Format and > Comment). What I did was: > - Format the existing code, add comments for each function, start > documenting, clean up unneeded local variables, conform to C90 > - Rewrite the procfs implementation to use seq_files (I've tested a > little and works) > - Clean up the "tunnel" module. > it compiles, boots and the entries are created in /proc/net/mpls/* >=20 > * Your netfilter and LSP Ping impl. patch would be must welcome. IIRC > correctly, James was also working on netfilter (I haven't, so I CC'ed him > here). James, can you give us an overview of that? Ok. As this was written on contract, I am waiting for that company to give permission to publish that code (any day now). > * I'm waiting (should come in some days) for James to commit my latest > changes to the P4 repository. P4 repository? Could you please enlighten me on what this is? > * CVS and sourceforge releases are out of date. We have two > possibilities: either you set up a p4 client and wait for James commit, or > I can provide you with a unified diff for 2.6.0-test9 using my private > copy. thanks, I've read the patch you've posted=20 > However, I would be glad to take a look at your patches, even if they > apply to the old code. They would apply to some quite old version of James R. Leu's code. =20 Anyway, I'll get back to you as soon as I have the permission to publish that code. > Best regards --=20 - Harald Welte <la...@ne...> http://www.netfilter.org/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie |
From: James R. L. <jl...@mi...> - 2003-12-01 01:21:26
|
Sorry for lack of activity over the last week. I'm helping build/run the network for a tradeshow in chicago IL USA. I'll be done with my part of this gig on tuesday. My first order of business is to catch up with this list and the e-mail surrounding it. Laters. On Sun, Nov 30, 2003 at 12:49:29PM -0500, jamal wrote: > Hi Ramon, > I guess this is the first email on that list? I am not even sure if i am > on it or not. I sent a design description to James last week and was > hoping to receive some feedback. Are you guys discussing in the > background? I havent seen anything back. > > cheers, > jamal > > On Sun, 2003-11-30 at 10:36, Ramon Casellas wrote: > > Hi James/Jamal/Harald/all, > > > > Well, yet another experimental release. > > > > BIG FAT WARNING: > > ===================== > > > > The kernel compiles and boots. proc entries are created and show the mpls > > subsystem status, tunnels can be created, and netlink "link layer" is set > > up ,but the user plane (forwarding) implementation does not work (most > > notably, it is impossible to add MOIs, given the latest changes in > > dst/neigh/etc.). Although ioctls are enabled for debug purposes, > > work is in progress to port to netlink (we need to define the MPLS > > kernel-userspace protocol, but a first step would be to re-use the > > existing ioctl data structures). > > > > I have (non-officially!) #define'd MPLS_NETLINK 9. Maybe the big guys from > > netdev@ can allocate a number for us :) > > > > > > Regards, > > Ramon > > > > // ------------------------------------------------------------------- > > // Ramon Casellas - GET/ENST/INFRES/RHD/A508 - cas...@in... > > > > > > PS: > > * Further discussion on mpls-linux-devel only. Sorry if you received > > several copies of this email. > > > > > > Releases: > > ===================== > > I am not proj. admin, and I cannot perform file release operations, so in > > order not to fill up your inboxes, releases can be found in: > > > > * Latest diff (2.6.0-test11 -> 2.6.0-test11-mpls) > > > > http://perso.enst.fr/~casellas/mpls-linux/linux-2.6.0-test11-mpls-1.179-rcas1.diff > > > > > > * I have started porting mplsadm -> mplsnl to use netlink. > > > > http://perso.enst.fr/~casellas/mpls-linux/mplsnl-0.1.tar.gz > > > > > > > > Overview: > > ===================== > > - I ***'up yesterday's diff (again, I know!, I know, bear with > > me, I'm a little braind..) missing define > > > > - Fixed a bug that I introduced in mpls labelspace handling. > > > > - Added basic infrastructure for MPLS netlink. Userspace apps can > > communicate with MPLS subsystem. The "link layer" is set up. We need > > to define a Userspace <-> Kernel Protocol (we can start porting the > > ioctl mechanism to netlink). cf the userspace mplsnl app. for > > details. Tested > > > > > > Changes w.r.t yesterday's release: (cf. yesterday's mail for details) > > ==================================================================== > > > > * linux-2.6.0-test11-mpls/net/mpls/mpls_netlink.c > > - Added file > > - Modified net/mpls/mpls.h (add prototypes). > > > > > > * linux-2.6.0-test11-mpls/include/linux/mpls.h > > - Added #define mir_set_tc mir_data.set_tc which was missing in > > previous diff > > > > * linux-2.6.0-test11/include/linux/netlink.h > > - Allocated a new Netlink for MPLS (non official!) > > #ifdef CONFIG_MPLS > > #define NETLINK_MPLS 9 > > #endif > > > > * linux-2.6.0-test11/net/netlink/netlink_dev.c 2003-11-27 > > + { > > .name = "mpls", > > .minor = NETLINK_MPLS, > > }, > > > > ... > > > > Some tests: > > ===================== > > ------------------------------------------ > > gandalf mplsadm# dmesg | grep MPLS > > MPLS Support v1.0 James R. Leu <jl...@mi...> -- Ramon Casellas <cas...@in...> > > MPLS Initializing Input Radix Tree > > MPLS Initializing Output Radix Tree > > MPLS Initializing MPLS ProcFS Interface > > MPLS MultiProtocol Label Switching Tunnel Module > > MPLS (c) 1999-2003 JLeu -- (c) 2003 RCas > > MPLS version 1.179 2003/11/20 > > MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35844 > > MPLS DEBUG net/mpls/mpls_ioctls.c:93:mpls_ioctl: IOCTL Add Tunnel > > MPLS DEBUG net/mpls/mpls_tunnel.c:366:mpls_tunnel_create: Allocated MPLS tunnel mpls0 > > MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35842 > > MPLS DEBUG net/mpls/mpls_ioctls.c:55:mpls_ioctl: IOCTL Set LabelSpace > > > > ------------------------------------------ > > gandalf mplsadm# cat /proc/net/mpls/* > > Debug: 1 > > Label spaces > > --------------- > > ath0 5 3 > > mpls0 -1 2 > > > > mpls0 0x00000000 > > 01010709 > > > > > > ------------------------------------------ > > gandalf mplsadm# cat /sys/class/net/mpls0/* > > 4 > > 00:00:00:00 > > 00:00:00:00 > > 0x0 > > 0x90 > > 6 > > 6 > > 1500 > > cat: /sys/class/net/mpls0/statistics: Is a directory > > 0 > > 899 > > > > > > ------------------------------------------ > > netlink > > gandalf mplsnl# ./mplsnl > > MPLSNETLINK mplsnl.h:284:open: Socket descriptor 3 > > MPLSNETLINK mplsnl.h:285:open: Local Addr 11857 (len 12) > > MPLSNETLINK mplsnl.h:286:open: Peer Addr 0 (len 12) > > MPLSNETLINK mplsnl.h:315:send_msg: SizeOf nlmsg_len 24 (24) > > MPLSNETLINK answers: Invalid argument > > > > > > dmesg > > MPLS DEBUG net/mpls/mpls_netlink.c:204:mpls_netlink_rcv: Enter > > MPLS DEBUG net/mpls/mpls_netlink.c:137:mpls_netlink_rcv_msg: Received netlink request 1 of len 24 from pid 11854 > > MPLS DEBUG net/mpls/mpls_netlink.c:226:mpls_netlink_rcv: Exit > > > > > > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us help > YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: jamal <ha...@cy...> - 2003-11-30 17:50:38
|
Hi Ramon, I guess this is the first email on that list? I am not even sure if i am on it or not. I sent a design description to James last week and was hoping to receive some feedback. Are you guys discussing in the background? I havent seen anything back. cheers, jamal On Sun, 2003-11-30 at 10:36, Ramon Casellas wrote: > Hi James/Jamal/Harald/all, > > Well, yet another experimental release. > > BIG FAT WARNING: > ===================== > > The kernel compiles and boots. proc entries are created and show the mpls > subsystem status, tunnels can be created, and netlink "link layer" is set > up ,but the user plane (forwarding) implementation does not work (most > notably, it is impossible to add MOIs, given the latest changes in > dst/neigh/etc.). Although ioctls are enabled for debug purposes, > work is in progress to port to netlink (we need to define the MPLS > kernel-userspace protocol, but a first step would be to re-use the > existing ioctl data structures). > > I have (non-officially!) #define'd MPLS_NETLINK 9. Maybe the big guys from > netdev@ can allocate a number for us :) > > > Regards, > Ramon > > // ------------------------------------------------------------------- > // Ramon Casellas - GET/ENST/INFRES/RHD/A508 - cas...@in... > > > PS: > * Further discussion on mpls-linux-devel only. Sorry if you received > several copies of this email. > > > Releases: > ===================== > I am not proj. admin, and I cannot perform file release operations, so in > order not to fill up your inboxes, releases can be found in: > > * Latest diff (2.6.0-test11 -> 2.6.0-test11-mpls) > > http://perso.enst.fr/~casellas/mpls-linux/linux-2.6.0-test11-mpls-1.179-rcas1.diff > > > * I have started porting mplsadm -> mplsnl to use netlink. > > http://perso.enst.fr/~casellas/mpls-linux/mplsnl-0.1.tar.gz > > > > Overview: > ===================== > - I ***'up yesterday's diff (again, I know!, I know, bear with > me, I'm a little braind..) missing define > > - Fixed a bug that I introduced in mpls labelspace handling. > > - Added basic infrastructure for MPLS netlink. Userspace apps can > communicate with MPLS subsystem. The "link layer" is set up. We need > to define a Userspace <-> Kernel Protocol (we can start porting the > ioctl mechanism to netlink). cf the userspace mplsnl app. for > details. Tested > > > Changes w.r.t yesterday's release: (cf. yesterday's mail for details) > ==================================================================== > > * linux-2.6.0-test11-mpls/net/mpls/mpls_netlink.c > - Added file > - Modified net/mpls/mpls.h (add prototypes). > > > * linux-2.6.0-test11-mpls/include/linux/mpls.h > - Added #define mir_set_tc mir_data.set_tc which was missing in > previous diff > > * linux-2.6.0-test11/include/linux/netlink.h > - Allocated a new Netlink for MPLS (non official!) > #ifdef CONFIG_MPLS > #define NETLINK_MPLS 9 > #endif > > * linux-2.6.0-test11/net/netlink/netlink_dev.c 2003-11-27 > + { > .name = "mpls", > .minor = NETLINK_MPLS, > }, > > ... > > Some tests: > ===================== > ------------------------------------------ > gandalf mplsadm# dmesg | grep MPLS > MPLS Support v1.0 James R. Leu <jl...@mi...> -- Ramon Casellas <cas...@in...> > MPLS Initializing Input Radix Tree > MPLS Initializing Output Radix Tree > MPLS Initializing MPLS ProcFS Interface > MPLS MultiProtocol Label Switching Tunnel Module > MPLS (c) 1999-2003 JLeu -- (c) 2003 RCas > MPLS version 1.179 2003/11/20 > MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35844 > MPLS DEBUG net/mpls/mpls_ioctls.c:93:mpls_ioctl: IOCTL Add Tunnel > MPLS DEBUG net/mpls/mpls_tunnel.c:366:mpls_tunnel_create: Allocated MPLS tunnel mpls0 > MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35842 > MPLS DEBUG net/mpls/mpls_ioctls.c:55:mpls_ioctl: IOCTL Set LabelSpace > > ------------------------------------------ > gandalf mplsadm# cat /proc/net/mpls/* > Debug: 1 > Label spaces > --------------- > ath0 5 3 > mpls0 -1 2 > > mpls0 0x00000000 > 01010709 > > > ------------------------------------------ > gandalf mplsadm# cat /sys/class/net/mpls0/* > 4 > 00:00:00:00 > 00:00:00:00 > 0x0 > 0x90 > 6 > 6 > 1500 > cat: /sys/class/net/mpls0/statistics: Is a directory > 0 > 899 > > > ------------------------------------------ > netlink > gandalf mplsnl# ./mplsnl > MPLSNETLINK mplsnl.h:284:open: Socket descriptor 3 > MPLSNETLINK mplsnl.h:285:open: Local Addr 11857 (len 12) > MPLSNETLINK mplsnl.h:286:open: Peer Addr 0 (len 12) > MPLSNETLINK mplsnl.h:315:send_msg: SizeOf nlmsg_len 24 (24) > MPLSNETLINK answers: Invalid argument > > > dmesg > MPLS DEBUG net/mpls/mpls_netlink.c:204:mpls_netlink_rcv: Enter > MPLS DEBUG net/mpls/mpls_netlink.c:137:mpls_netlink_rcv_msg: Received netlink request 1 of len 24 from pid 11854 > MPLS DEBUG net/mpls/mpls_netlink.c:226:mpls_netlink_rcv: Exit > > > > > |
From: Ramon C. <cas...@in...> - 2003-11-30 15:36:09
|
Hi James/Jamal/Harald/all, Well, yet another experimental release. BIG FAT WARNING: ===================== The kernel compiles and boots. proc entries are created and show the mpls subsystem status, tunnels can be created, and netlink "link layer" is set up ,but the user plane (forwarding) implementation does not work (most notably, it is impossible to add MOIs, given the latest changes in dst/neigh/etc.). Although ioctls are enabled for debug purposes, work is in progress to port to netlink (we need to define the MPLS kernel-userspace protocol, but a first step would be to re-use the existing ioctl data structures). I have (non-officially!) #define'd MPLS_NETLINK 9. Maybe the big guys from netdev@ can allocate a number for us :) Regards, Ramon // ------------------------------------------------------------------- // Ramon Casellas - GET/ENST/INFRES/RHD/A508 - cas...@in... PS: * Further discussion on mpls-linux-devel only. Sorry if you received several copies of this email. Releases: ===================== I am not proj. admin, and I cannot perform file release operations, so in order not to fill up your inboxes, releases can be found in: * Latest diff (2.6.0-test11 -> 2.6.0-test11-mpls) http://perso.enst.fr/~casellas/mpls-linux/linux-2.6.0-test11-mpls-1.179-rcas1.diff * I have started porting mplsadm -> mplsnl to use netlink. http://perso.enst.fr/~casellas/mpls-linux/mplsnl-0.1.tar.gz Overview: ===================== - I ***'up yesterday's diff (again, I know!, I know, bear with me, I'm a little braind..) missing define - Fixed a bug that I introduced in mpls labelspace handling. - Added basic infrastructure for MPLS netlink. Userspace apps can communicate with MPLS subsystem. The "link layer" is set up. We need to define a Userspace <-> Kernel Protocol (we can start porting the ioctl mechanism to netlink). cf the userspace mplsnl app. for details. Tested Changes w.r.t yesterday's release: (cf. yesterday's mail for details) ==================================================================== * linux-2.6.0-test11-mpls/net/mpls/mpls_netlink.c - Added file - Modified net/mpls/mpls.h (add prototypes). * linux-2.6.0-test11-mpls/include/linux/mpls.h - Added #define mir_set_tc mir_data.set_tc which was missing in previous diff * linux-2.6.0-test11/include/linux/netlink.h - Allocated a new Netlink for MPLS (non official!) #ifdef CONFIG_MPLS #define NETLINK_MPLS 9 #endif * linux-2.6.0-test11/net/netlink/netlink_dev.c 2003-11-27 + { .name = "mpls", .minor = NETLINK_MPLS, }, ... Some tests: ===================== ------------------------------------------ gandalf mplsadm# dmesg | grep MPLS MPLS Support v1.0 James R. Leu <jl...@mi...> -- Ramon Casellas <cas...@in...> MPLS Initializing Input Radix Tree MPLS Initializing Output Radix Tree MPLS Initializing MPLS ProcFS Interface MPLS MultiProtocol Label Switching Tunnel Module MPLS (c) 1999-2003 JLeu -- (c) 2003 RCas MPLS version 1.179 2003/11/20 MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35844 MPLS DEBUG net/mpls/mpls_ioctls.c:93:mpls_ioctl: IOCTL Add Tunnel MPLS DEBUG net/mpls/mpls_tunnel.c:366:mpls_tunnel_create: Allocated MPLS tunnel mpls0 MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35842 MPLS DEBUG net/mpls/mpls_ioctls.c:55:mpls_ioctl: IOCTL Set LabelSpace ------------------------------------------ gandalf mplsadm# cat /proc/net/mpls/* Debug: 1 Label spaces --------------- ath0 5 3 mpls0 -1 2 mpls0 0x00000000 01010709 ------------------------------------------ gandalf mplsadm# cat /sys/class/net/mpls0/* 4 00:00:00:00 00:00:00:00 0x0 0x90 6 6 1500 cat: /sys/class/net/mpls0/statistics: Is a directory 0 899 ------------------------------------------ netlink gandalf mplsnl# ./mplsnl MPLSNETLINK mplsnl.h:284:open: Socket descriptor 3 MPLSNETLINK mplsnl.h:285:open: Local Addr 11857 (len 12) MPLSNETLINK mplsnl.h:286:open: Peer Addr 0 (len 12) MPLSNETLINK mplsnl.h:315:send_msg: SizeOf nlmsg_len 24 (24) MPLSNETLINK answers: Invalid argument dmesg MPLS DEBUG net/mpls/mpls_netlink.c:204:mpls_netlink_rcv: Enter MPLS DEBUG net/mpls/mpls_netlink.c:137:mpls_netlink_rcv_msg: Received netlink request 1 of len 24 from pid 11854 MPLS DEBUG net/mpls/mpls_netlink.c:226:mpls_netlink_rcv: Exit |
From: Ramon C. <cas...@in...> - 2003-11-29 16:15:57
|
Hi James/all, As requested by private email, and given that James must still be eating turkey and that some of you do not have access to the p4 depot, I'm sending you a developer snapshot of my current tree, with the patch for linux-2.6.0-test11. Please note that this is only my private tree and not the official p4 head version. The may goal is to expose current status of the project, and eventually get some feedback. BIG FAT WARNING: The kernel compiles and boots. proc entries are created and show the mpls subsystem status, but the user plane implementation does not work (most notably, it is impossible to add MOIs, given the lates changes in dst/neigh/etc.) The main changes are: ------------------------ * Ported to test11. * W.R.T p4 head, I have re-enabled the ioctls, maily for debugging. * Latest p4 sync version is out of date and does not apply cleanly to test1= 1. * Code has been cleaned up, rewritten in some cases and commented * the mpls_instruction_copy method has been removed. Now, to commit a well formed instruction to a MOI or MII we just use memcpy (since James told me that this was the main goal of instructio_copy and that there is no pointer aliasing). * the mpls_tunnel (to create virtual point to point unidirectional interfaces) has seen some changes, the API has been unified. * New macros for debugging. This means that each debug message is prefixed with __FILE__ __FUNCTION__ __LINE__. All the const char* fn_name hacks are deprecated. * The procfs has been rewritten to use seq_files. Output is more verbose, and files are created under /proc/net/mpls. This change will break current userspace applications. * bug fixes setting labelspaces per interface (introduced by myself :) * misc changes... Status ----------------------- * The kernel compiles and boots, and has minimal functionality. As of today, only built-in works (there are some core parts that are protected by #ifdef CONFIG_MPLS, which should also consider the CONFIG_MPLS_MODULE option). * There are showstopper bugs. For now, most are due to the IOCTL interface, since we allocate labels (kmem_cache_alloc) and we might sleep. BUGS --------------------------------------------- MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35847 Debug: sleeping function called from invalid context at mm/slab.c:1856 in_atomic():1, irqs_disabled():0 Call Trace: [<c012a74c>] __might_sleep+0xac/0xe0 [<c0161d1a>] kmem_cache_alloc+0x1da/0x1e0 [<c03a53d5>] mpls_insert_moi+0x185/0x1e0 [<c03a5857>] mpls_add_out_label+0xa7/0x180 [<c03a3f58>] mpls_ioctl+0x658/0x9e0 MPLS DEBUG net/mpls/mpls_instr.c:171:mpls_instruction_clear: exit Debug: sleeping function called from invalid context at mm/slab.c:1856 in_atomic():1, irqs_disabled():0 Call Trace: [<c012a74c>] __might_sleep+0xac/0xe0 [<c0161d1a>] kmem_cache_alloc+0x1da/0x1e0 [<c03a1edb>] mpls_insert_mii+0x17b/0x1a0 [<c03a213f>] mpls_add_in_label+0xdf/0x170 MPLS DEBUG net/mpls/mpls_ioctls.c:38:mpls_ioctl: IOCTL 35864 MPLS DEBUG net/mpls/mpls_out_info.c:210:mpls_set_out_label_instructions: en= ter MPLS DEBUG net/mpls/mpls_instr.c:219:mpls_instruction_build: enter MPLS DEBUG net/mpls/mpls_utils.c:150:mpls_make_dst: enter ------------[ cut here ]------------ kernel BUG at net/mpls/mpls_utils.c:151! invalid operand: 0000 [#1] CPU: 0 EIP: 0060:[<c03a68ed>] Tainted: P EFLAGS: 00010246 EIP is at mpls_make_dst+0x2dd/0x330 eax: 00000000 ebx: 00000000 ecx: 00000001 edx: c041ddd8 esi: ec26b65c edi: ec26b65c ebp: ec26b568 esp: ec26b544 =09ds: 007b es: 007b ss: 0068 Process mplsadm2 (pid: 9146, threadinfo=3Dec26a000 task=3Ded98c800) =09Stack: c03e9184 c04079cf 00000096 c03d0f34 ec26b568 00000000 c1bda074 ec= 26b65c =09ec26b65c ec26b5bc c03a81cd 00000002 ec26b76c 00000000 c03d1004 ec26b5c8 =09f79621a0 00000001 00000001 00000009 00000001 00000001 00000000 00000000 =09Call Trace: =09[<c03a81cd>] mpls_instruction_build+0x8dd/0x10c0 =09[<c03a54b5>] mpls_set_out_label_instructions+0x75/0x1b0 =09[<c0120003>] wakeup_pmode_return+0x3/0x81 =09[<c0230766>] __copy_from_user_ll+0x66/0x70 =09[<c03a4275>] mpls_ioctl+0x975/0x9e0 =09[<c016e672>] handle_mm_fault+0x132/0x320 =09[<c0325a98>] dev_ioctl+0x348/0x430 =09[<c037b9bc>] inet_ioctl+0x10c/0x120 =09[<c031a904>] sock_ioctl+0x2c4/0x4a0 =09[<c019e367>] sys_ioctl+0x217/0x420 =09[<c018357f>] sys_read+0x3f/0x60 =09[<c010d26b>] syscall_call+0x7/0xb where IS THE moi ?????? (NULL pointer dereff in make_dst!!!) case MPLS_OP_SET: { =2E.. =09/* NOTE: mpls_make_dst holds the dev, =09 * so release the hold from dev lookup*/ =09 *************************************************** HERE v =09md =3D mpls_make_dst(mir->mir_instruction[i].mir_data.set.mni_if, =09=09=09&mir->mir_instruction[i].mir_data.set.mni_addr,moi); =09dev_put(dev); What remains to be done ----------------------- * The IOCTL interface is a little cumbersome and buggy. It should be replaced by a new api (there are several possibilities but, IMVHO, we should use netlink, like the routing tables and develop a new mplsadm that should be like zebra), and with a mplslib to be used by userspace control plane tools. * I'm still somewhat reluctant of the need for labelspaces/keys/radix trees. Which are the real advantages of this? Why not plainly a hash table with key being (iface index + incoming label)? * The mpls_instruction_build needs a clean up. * We should focus in two branches for 2.6 . A stable (even using old IOCTLS) that works and a new one with netlink (netfilter hooks eventually?) * Massive debugging and testing :) audit for synchronization and SMP correctness. Regards, R. // ------------------------------------------------------------------- // Ramon Casellas - GET/ENST/INFRES/RHD/A508 - cas...@in... 8<--------------8<---------------------------------8< diff -urN linux-2.6.0-test11/Makefile linux-2.6.0-test11-mpls/Makefile --- linux-2.6.0-test11/Makefile=092003-11-27 14:24:41.000000000 +0100 +++ linux-2.6.0-test11-mpls/Makefile=092003-11-29 16:11:26.358615360 +0100 @@ -1,7 +1,7 @@ VERSION =3D 2 PATCHLEVEL =3D 6 SUBLEVEL =3D 0 -EXTRAVERSION =3D -test11 +EXTRAVERSION =3D -test11-mpls # *DOCUMENTATION* # To see a list of typical targets execute "make help" diff -urN linux-2.6.0-test11/include/linux/if_arp.h linux-2.6.0-test11-mpls= /include/linux/if_arp.h --- linux-2.6.0-test11/include/linux/if_arp.h=092003-11-27 14:24:00.0000000= 00 +0100 +++ linux-2.6.0-test11-mpls/include/linux/if_arp.h=092003-11-29 16:11:26.35= 8615360 +0100 @@ -84,6 +84,7 @@ #define ARPHRD_IEEE802_TR 800=09=09/* Magic type ident for TR=09*/ #define ARPHRD_IEEE80211 801=09=09/* IEEE 802.11=09=09=09*/ #define ARPHRD_IEEE80211_PRISM 802=09/* IEEE 802.11 + Prism2 header */ +#define ARPHRD_MPLS_TUNNEL 899=09=09/* MPLS Tunnel Interface=09*/ #define ARPHRD_VOID=09 0xFFFF=09/* Void type, nothing is known */ diff -urN linux-2.6.0-test11/include/linux/ipv6_route.h linux-2.6.0-test11-= mpls/include/linux/ipv6_route.h --- linux-2.6.0-test11/include/linux/ipv6_route.h=092003-11-27 14:24:00.000= 000000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/ipv6_route.h=092003-11-29 16:11:2= 6.359615208 +0100 @@ -39,6 +39,8 @@ =09unsigned long=09=09rtmsg_info; __u32=09=09=09rtmsg_flags; =09int=09=09=09rtmsg_ifindex; +=09__u32 =09=09=09rtmsg_spec_nh_proto; +=09__u32=09=09=09rtmsg_spec_nh_data; }; #define RTMSG_NEWDEVICE=09=090x11 diff -urN linux-2.6.0-test11/include/linux/mpls.h linux-2.6.0-test11-mpls/i= nclude/linux/mpls.h --- linux-2.6.0-test11/include/linux/mpls.h=091970-01-01 01:00:00.000000000= +0100 +++ linux-2.6.0-test11-mpls/include/linux/mpls.h=092003-11-29 16:57:05.2422= 41760 +0100 @@ -0,0 +1,266 @@ +/*************************************************************************= **** + * MPLS + * An implementation of the MPLS (MultiProtocol Label + * Switching Architecture) for Linux. + * + * Authors: + * James Leu <jl...@mi...> + * Ramon Casellas <cas...@in...> + * + * (c) 1999-2003 James Leu <jl...@mi...> + * (c) 2003 Ramon Casellas <cas...@in...> + * + * _THIS_FILE_ + * Data types and structs used by userspace programs to access MPLS + * forwarding. Most interface with the MPLS subsystem is IOCTL based + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + *************************************************************************= ***/ + +#ifndef _LINUX_MPLS_H_ +#define _LINUX_MPLS_H_ + +#ifdef __KERNEL__ +#include <linux/socket.h> +#else +#include <sys/socket.h> +#endif + +#define MPLS_NUM_OPS=09=098 + +#define MPLS_LINUX_VERSION=090x01010709 + +#define SIOCMPLSFIRST=09=090x8C00 +#define SIOCGLABELSPACEMPLS=090x8C01 +#define SIOCSLABELSPACEMPLS=090x8C02 +#define SIOCMPLSDEBUG=09=090x8C03 +#define SIOCMPLSTUNNELADD=090x8C04 +#define SIOCMPLSTUNNELDEL=090x8C05 +#define SIOCMPLSTUNNELGET=090x8C06 +#define SIOCMPLSNHLFEADD=090x8C07 +#define SIOCMPLSNHLFEDEL=090x8C08 +#define SIOCMPLSNHLFEGET=090x8C09 +#define SIOCMPLSILMADD=09=090x8C0A +#define SIOCMPLSILMDEL=09=090x8C0B +#define SIOCMPLSILMGET=09=090x8C0C +#define SIOCMPLSXCADD=09=090x8C0D +#define SIOCMPLSXCDEL=09=090x8C0E +#define SIOCMPLSXCGET=09=090x8C0F +#define SIOCMPLSNHLFESETMTU=090x8C10 +#define SIOCMPLSILMSETPROTO=090x8C11 +#define SIOCMPLSILMFLUSH=090x8C12 +#define SIOCMPLSNHLFEFLUSH=090x8C13 +#define SIOCSMPLSININSTR=090x8C16 +#define SIOCGMPLSININSTR=090x8C17 +#define SIOCSMPLSOUTINSTR=090x8C18 +#define SIOCGMPLSOUTINSTR=090x8C19 +#define SIOCMPLSLAST=09=090x8C3F + +#define SIOCMPLSTUNNELADDOUT=09(SIOCDEVPRIVATE + 1) +#define SIOCMPLSTUNNELDELOUT=09(SIOCDEVPRIVATE + 2) + +#define MPLS_IPV4_EXPLICIT_NULL=090 /* only valid as sole label stac= k entry +=09=09=09=09=09 Pop label and send to IPv4 stack */ +#define MPLS_ROUTER_ALERT=091 /* anywhere except bottom, packet it i= s +=09=09=09=09=09 forwared to a software module +=09=09=09=09=09 determined by the next label, +=09=09=09=09=09 if the packet is forwarded, push this +=09=09=09=09=09 label back on */ +#define MPLS_IPV6_EXPLICIT_NULL=092 /* only valid as sole label stac= k entry +=09=09=09=09=09 Pop label and send to IPv6 stack */ +#define MPLS_IMPLICIT_NULL=093 /* a LIB with this, signifies to pop +=09=09=09=09=09 the next label and use that */ + +enum mpls_direction_enum { +=09MPLS_IN =3D 0x10, +=09MPLS_OUT =3D 0x20 +}; + +enum mpls_opcode_enum { +=09MPLS_OP_NOP =3D 0x00, +=09MPLS_OP_POP, +=09MPLS_OP_PEEK, +=09MPLS_OP_PUSH, +=09MPLS_OP_DLV, +=09MPLS_OP_FWD, +=09MPLS_OP_NF_FWD, +=09MPLS_OP_DS_FWD, +=09MPLS_OP_EXP_FWD, +=09MPLS_OP_SET, +=09MPLS_OP_SET_RX, +=09MPLS_OP_SET_TC, +=09MPLS_OP_SET_DS, +=09MPLS_OP_SET_EXP, +=09MPLS_OP_EXP2TC, +=09MPLS_OP_EXP2DS, +=09MPLS_OP_TC2EXP, +=09MPLS_OP_DS2EXP, +=09MPLS_OP_NF2EXP, +=09MPLS_OP_SET_NF, +=09MPLS_OP_MAX +}; + +enum mpls_label_type_enum { +=09MPLS_LABEL_GEN =3D 1, +=09MPLS_LABEL_ATM, +=09MPLS_LABEL_FR, +=09MPLS_LABEL_KEY +}; + +struct mpls_label_atm { +=09unsigned short mla_vpi; +=09unsigned short mla_vci; +}; + +struct mpls_label { +#ifdef __KERNEL__ +=09atomic_t __refcnt; +#else +=09int __refcnt; +#endif +=09enum mpls_label_type_enum ml_type; +=09union { +=09=09unsigned int ml_key; +=09=09unsigned int ml_gen; +=09=09unsigned int ml_fr; +=09=09struct mpls_label_atm ml_atm; +=09} u; +=09int ml_index; +}; + +struct mpls_in_label_req { +=09unsigned int mil_age; +=09unsigned int mil_proto; +=09struct mpls_label mil_label; +}; + +#define MPLS_LABELSPACE_MAX=09255 + +struct mpls_labelspace_req { +=09int mls_ifindex; /* Index to the MPLS-enab. interface*= / +=09int mls_labelspace; /* Labelspace IN/SET -- OUT/GET *= / +}; + +struct mpls_nexthop_info { +=09unsigned int mni_if; +=09struct sockaddr mni_addr; +}; + +struct mpls_out_label_req { +=09unsigned int mol_age; +=09struct mpls_label mol_label; +=09u_int32_t mol_mtu; +=09u_int8_t mol_propogate_ttl; +}; + +struct mpls_xconnect_req { +=09struct mpls_label mx_in; +=09struct mpls_label mx_out; +}; + +#define MPLS_NFMARK_NUM 64 + +struct mpls_nfmark_fwd { +=09unsigned int nf_key[MPLS_NFMARK_NUM]; +=09unsigned short nf_mask; +}; + +#define MPLS_DSMARK_NUM 64 + +struct mpls_dsmark_fwd { +=09unsigned int df_key[MPLS_DSMARK_NUM]; +=09unsigned char df_mask; +}; + +#define MPLS_TCINDEX_NUM 64 + +struct mpls_tcindex_fwd { +=09unsigned int tc_key[MPLS_TCINDEX_NUM]; +=09unsigned short tc_mask; +}; + +#define MPLS_EXP_NUM 8 + +struct mpls_exp_fwd { +=09unsigned int ef_key[MPLS_EXP_NUM]; +}; + +struct mpls_exp2tcindex { +=09unsigned short e2t[MPLS_EXP_NUM]; +}; + +struct mpls_exp2dsmark { +=09unsigned char e2d[MPLS_EXP_NUM]; +}; + +struct mpls_tcindex2exp { +=09unsigned char t2e_mask; +=09unsigned char t2e[MPLS_TCINDEX_NUM]; +}; + +struct mpls_dsmark2exp { +=09unsigned char d2e_mask; +=09unsigned char d2e[MPLS_DSMARK_NUM]; +}; + +struct mpls_nfmark2exp { +=09unsigned char n2e_mask; +=09unsigned char n2e[MPLS_NFMARK_NUM]; +}; + +struct mpls_instruction_elem { +=09unsigned short mir_opcode; +=09unsigned char mir_direction; +=09union { +=09=09struct mpls_label push; +=09=09struct mpls_label fwd; +=09=09struct mpls_nfmark_fwd nf_fwd; +=09=09struct mpls_dsmark_fwd ds_fwd; +=09=09struct mpls_exp_fwd exp_fwd; +=09=09struct mpls_nexthop_info set; +=09=09unsigned int set_rx; +=09=09unsigned short set_tc; +=09=09unsigned short set_ds; +=09=09unsigned char set_exp; +=09=09struct mpls_exp2tcindex exp2tc; +=09=09struct mpls_exp2dsmark exp2ds; +=09=09struct mpls_tcindex2exp tc2exp; +=09=09struct mpls_dsmark2exp ds2exp; +=09=09struct mpls_nfmark2exp nf2exp; +=09=09unsigned long set_nf; +=09} mir_data; +}; + +/* Standard shortcuts */ +#define mir_push mir_data.push +#define mir_fwd mir_data.fwd +#define mir_nf_fwd mir_data.nf_fwd +#define mir_ds_fwd mir_data.ds_fwd +#define mir_exp_fwd mir_data.exp_fwd +#define mir_set mir_data.set +#define mir_set_rx mir_data.set_rx +#define mir_set_tx mir_data.set_tx +#define mir_set_ds mir_data.set_ds +#define mir_set_exp mir_data.set_exp +#define mir_set_nf mir_data.set_nf +#define mir_exp2tc mir_data.exp2tc +#define mir_exp2ds mir_data.exp2ds +#define mir_tc2exp mir_data.tc2exp +#define mir_ds2exp mir_data.ds2exp +#define mir_nf2exp mir_data.nf2exp + + + + +struct mpls_instruction_req { +=09struct mpls_instruction_elem mir_instruction[MPLS_NUM_OPS]; +=09struct mpls_label mir_label; +=09unsigned char mir_instruction_length; +=09unsigned char mir_direction; +=09int mir_index; +}; + +#endif diff -urN linux-2.6.0-test11/include/linux/netdevice.h linux-2.6.0-test11-m= pls/include/linux/netdevice.h --- linux-2.6.0-test11/include/linux/netdevice.h=092003-11-27 14:24:00.0000= 00000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/netdevice.h=092003-11-29 16:11:26= =2E361614904 +0100 @@ -352,6 +352,7 @@ =09void *ip6_ptr; /* IPv6 specific data */ =09void=09=09=09*ec_ptr;=09/* Econet specific data=09*/ =09void=09=09=09*ax25_ptr;=09/* AX.25 specific data */ +=09void=09=09=09*mpls_ptr;=09/* MPLS specific data */ =09struct list_head=09poll_list;=09/* Link to poll list=09*/ =09int=09=09=09quota; diff -urN linux-2.6.0-test11/include/linux/netfilter_ipv4/ipt_MPLS.h linux-= 2.6.0-test11-mpls/include/linux/netfilter_ipv4/ipt_MPLS.h --- linux-2.6.0-test11/include/linux/netfilter_ipv4/ipt_MPLS.h=091970-01-01= 01:00:00.000000000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/netfilter_ipv4/ipt_MPLS.h=092003-= 11-29 16:11:26.362614752 +0100 @@ -0,0 +1,8 @@ +#ifndef _IPT_MPLS_H_target +#define _IPT_MPLS_H_target + +struct ipt_mpls_target_info { +=09unsigned int key; +}; + +#endif /*_IPT_MPLS_H_target*/ diff -urN linux-2.6.0-test11/include/linux/ppp_defs.h linux-2.6.0-test11-mp= ls/include/linux/ppp_defs.h --- linux-2.6.0-test11/include/linux/ppp_defs.h=092003-11-27 14:24:00.00000= 0000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/ppp_defs.h=092003-11-29 16:11:26.= 363614600 +0100 @@ -82,7 +82,7 @@ #define PPP_IPV6CP=090x8057=09/* IPv6 Control Protocol */ #define PPP_CCPFRAG=090x80fb=09/* CCP at link level (below MP bundle) */ #define PPP_CCP=09=090x80fd=09/* Compression Control Protocol */ -#define PPP_MPLSCP=090x80fd=09/* MPLS Control Protocol */ +#define PPP_MPLSCP=090x8281=09/* MPLS Control Protocol */ #define PPP_LCP=09=090xc021=09/* Link Control Protocol */ #define PPP_PAP=09=090xc023=09/* Password Authentication Protocol */ #define PPP_LQR=09=090xc025=09/* Link Quality Report protocol */ diff -urN linux-2.6.0-test11/include/linux/rtnetlink.h linux-2.6.0-test11-m= pls/include/linux/rtnetlink.h --- linux-2.6.0-test11/include/linux/rtnetlink.h=092003-11-27 14:24:00.0000= 00000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/rtnetlink.h=092003-11-29 16:11:26= =2E364614448 +0100 @@ -199,9 +199,11 @@ =09RTA_FLOW, =09RTA_CACHEINFO, =09RTA_SESSION, +=09RTA_SPEC_PROTO, +=09RTA_SPEC_DATA, }; -#define RTA_MAX RTA_SESSION +#define RTA_MAX RTA_SPEC_DATA #define RTM_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(st= ruct rtmsg)))) #define RTM_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct rtmsg)) @@ -221,6 +223,8 @@ =09unsigned char=09=09rtnh_flags; =09unsigned char=09=09rtnh_hops; =09int=09=09=09rtnh_ifindex; +=09unsigned int=09=09rtnh_spec_proto; +=09unsigned int=09=09rtnh_spec_data; }; /* rtnh_flags */ @@ -314,7 +318,6 @@ =09} u; }; - /********************************************************* *=09=09Interface address. ****/ diff -urN linux-2.6.0-test11/include/linux/sockios.h linux-2.6.0-test11-mpl= s/include/linux/sockios.h --- linux-2.6.0-test11/include/linux/sockios.h=092003-11-27 14:23:59.000000= 000 +0100 +++ linux-2.6.0-test11-mpls/include/linux/sockios.h=092003-11-29 16:11:26.3= 64614448 +0100 @@ -107,6 +107,11 @@ #define SIOCGIFVLAN=090x8982=09=09/* 802.1Q VLAN support=09=09*/ #define SIOCSIFVLAN=090x8983=09=09/* Set 802.1Q VLAN options =09*/ +/* MPLS configuration calls */ + +#define SIOCADDMPLS=090x8984=09=09/* Create MPLS objects=09=09*/ +#define SIOCDELMPLS=090x8985=09=09/* Delete MPLS objects=09=09*/ + /* bonding calls */ #define SIOCBONDENSLAVE=090x8990=09=09/* enslave a device to the bond */ diff -urN linux-2.6.0-test11/include/net/dst.h linux-2.6.0-test11-mpls/incl= ude/net/dst.h --- linux-2.6.0-test11/include/net/dst.h=092003-11-27 14:24:01.000000000 +0= 100 +++ linux-2.6.0-test11-mpls/include/net/dst.h=092003-11-29 16:11:26.3656142= 96 +0100 @@ -14,6 +14,9 @@ #include <linux/jiffies.h> #include <net/neighbour.h> #include <asm/processor.h> +#ifdef CONFIG_MPLS +#include <net/mpls.h> +#endif /* * 0 - no debugging messages @@ -65,6 +68,7 @@ =09struct neighbour=09*neighbour; =09struct hh_cache=09=09*hh; =09struct xfrm_state=09*xfrm; +=09void=09=09=09*spec_nh_data; =09int=09=09=09(*input)(struct sk_buff*); =09int=09=09=09(*output)(struct sk_buff*); @@ -74,8 +78,11 @@ #endif =09struct dst_ops=09 *ops; -=09struct rcu_head=09=09rcu_head; +=09struct rcu_head=09=09rcu_head; +#ifdef CONFIG_MPLS +=09struct mpls_out_info=09*mpls_moi; +#endif =09char=09=09=09info[0]; }; diff -urN linux-2.6.0-test11/include/net/ip_fib.h linux-2.6.0-test11-mpls/i= nclude/net/ip_fib.h --- linux-2.6.0-test11/include/net/ip_fib.h=092003-11-27 14:24:01.000000000= +0100 +++ linux-2.6.0-test11-mpls/include/net/ip_fib.h=092003-11-29 16:11:26.3666= 14144 +0100 @@ -38,11 +38,13 @@ =09u32=09=09*rta_flow; =09struct rta_cacheinfo *rta_ci; =09struct rta_session *rta_sess; +=09u32=09=09*rta_spec_proto; +=09u32=09=09*rta_spec_data; }; struct fib_nh { -=09struct net_device=09=09*nh_dev; +=09struct net_device=09*nh_dev; =09unsigned=09=09nh_flags; =09unsigned char=09=09nh_scope; #ifdef CONFIG_IP_ROUTE_MULTIPATH @@ -54,6 +56,8 @@ #endif =09int=09=09=09nh_oif; =09u32=09=09=09nh_gw; +=09u32=09=09=09nh_spec_data; +=09unsigned short=09=09nh_spec_proto; }; /* @@ -118,6 +122,8 @@ #define FIB_RES_GW(res)=09=09=09(FIB_RES_NH(res).nh_gw) #define FIB_RES_DEV(res)=09=09(FIB_RES_NH(res).nh_dev) #define FIB_RES_OIF(res)=09=09(FIB_RES_NH(res).nh_oif) +#define FIB_RES_SPEC_PROTO(res)=09=09(FIB_RES_NH(res).nh_spec_proto) +#define FIB_RES_SPEC_DATA(res)=09=09(FIB_RES_NH(res).nh_spec_data) struct fib_table { diff -urN linux-2.6.0-test11/include/net/mpls.h linux-2.6.0-test11-mpls/inc= lude/net/mpls.h --- linux-2.6.0-test11/include/net/mpls.h=091970-01-01 01:00:00.000000000 += 0100 +++ linux-2.6.0-test11-mpls/include/net/mpls.h=092003-11-29 16:11:26.368613= 840 +0100 @@ -0,0 +1,652 @@ +/*************************************************************************= **** + * MPLS + * An implementation of the MPLS (MultiProtocol Label + * Switching Architecture) for Linux. + * + * File: linux/include/net/mpls.h + * + * Authors: + * James Leu <jl...@mi...> + * Ramon Casellas <cas...@in...> + * + * (c) 1999-2003 James Leu <jl...@mi...> + * (c) 2003 Ramon Casellas <cas...@in...> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Changes: + * 20031126 RCAS: + * - Rewrite the debugging macros. + *************************************************************************= **** + */ +#ifndef __LINUX_NET_MPLS__H_ +#define __LINUX_NET_MPLS__H_ + +#include <net/spec_nh.h> +#include <net/dst.h> +#include <asm/atomic.h> +#include <linux/mpls.h> +#include <linux/netdevice.h> +#include <linux/skbuff.h> +#include <linux/rtnetlink.h> +#include <net/mpls_radix.h> +#define MPLS_ERR KERN_ERR +#define MPLS_INF KERN_ALERT +#define MPLS_DBG KERN_ALERT + +/* + * Forward declarations + */ + +struct fib_result; +struct rtable; +struct spec_nh; + +extern int mpls_debug; + +/* It is not defined in net/dst.h, it is declared as follows:*/ +extern struct dst_entry *dst_destroy(struct dst_entry * dst); + + +/* Comment this to suppress MPLS_DEBUG calls */ +#define MPLS_ENABLE_DEBUG + + +#ifdef MPLS_ENABLE_DEBUG +#define MPLS_DEBUG(f, a...) \ +{ \ +=09if (mpls_debug) {\ +=09=09printk (KERN_DEBUG "MPLS DEBUG %s:%d:%s: ", \ +=09=09=09__FILE__, __LINE__, __FUNCTION__); \ +=09=09printk (f, ##a); \ +=09}\ +} +#else +#define MPLS_DEBUG(f, a...) /**/ +#endif /* MPLS_ENABLE_DEBUG */ + +#define MPLS_INFO(f, a...) printk (KERN_INFO "MPLS INFO " f, ##a); + +#ifdef MPLS_ENABLE_DEBUG +#define MPLS_ASSERT(expr) \ +if(unlikely(!(expr))) { \ +=09printk(KERN_ERR "MPLS Assertion failed! %s,%s,%s,line=3D%d\n",#expr,\ +=09__FILE__,__FUNCTION__,__LINE__); \ +} +#else +#define MPLS_ASSERT(expr) /**/ +#endif /* MPLS_ENABLE_DEBUG */ + + +/*************************************************************************= *** + * MPLS Interface "Extension" + * In the current implementation the "all loved" net_device struct is + * extended with one field struct mpls_interface (cast'd to void) called + * mpls_ptr; This holds basically the "per interface" labelspace. + * + *************************************************************************= ***/ + +struct mpls_interface { +=09/* +=09 * (any mif object)->list_out is a circular d-linked list. Each node +=09 * of this list is a MOI. MOI's are added to this list when adding a +=09 * OP_SET opcode to a moi instruction array. +=09 * +=09 * list_add(&moi->dev_entry, &mpls_if->list_out) : adds moi to this +=09 * list. +=09 * +=09 * "List of all MOIs that use this device (e.g. eth0) as output" +=09 * cf. mpls_init.c +=09 */ +=09struct list_head list_out; + + +=09/* +=09 * (any mif object)->list_in is a circular d-linked list. Each node +=09 * of this list is a MII. MII's are added to this list when +=09 */ +=09struct list_head list_in; + +=09int labelspace; /* Label Space for this interface */ +}; + +typedef struct mpls_interface mpls_interface_t; + + +extern mpls_interface_t* mpls_create_if_info(void); +extern void mpls_delete_if_info(mpls_interface_t* mpls_if); + + + +typedef struct mpls_label mpls_label_t; + + + +struct mpls_push_data { +=09unsigned int label:20; +=09unsigned int ttl:8; +=09unsigned int exp:3; +=09unsigned int bos:1; +=09unsigned char flag; +=09char popped_bos; +}; + +typedef struct mpls_push_data mpls_push_data_t; + +/*************************************************************************= *** + * Socket Buffer Mangement & Hacks... + * + * + * + *************************************************************************= ***/ + +struct mpls_skb_parm { +=09unsigned int gap; +}; + +#define MPLSCB(skb) ((struct mpls_skb_parm*)((skb)->cb)) + + + + + + +/*************************************************************************= *** + * Result codes for Input/Output Opcodes. + * net/mpls/{mpls_opcode,mpls_opcode_in,mpls_opcode_out}.c + * + * + * + *************************************************************************= ***/ + +#define MPLS_RESULT_SUCCESS=090 +#define MPLS_RESULT_RECURSE=091 +#define MPLS_RESULT_DROP=092 +#define MPLS_RESULT_DLV=09=093 +#define MPLS_RESULT_FWD=09=094 + + + +struct mpls_instruction { +=09unsigned short mi_opcode; +=09void *mi_data; +}; +typedef struct mpls_instruction mpls_instruction_t; + +struct mpls_nfmark_fwd_info { +=09struct mpls_out_info *nfi_moi[MPLS_NFMARK_NUM]; +=09unsigned short nfi_mask; +}; + +struct mpls_dsmark_fwd_info { +=09struct mpls_out_info *dfi_moi[MPLS_DSMARK_NUM]; +=09unsigned char dfi_mask; +}; + +struct mpls_tcindex_fwd_info { +=09struct mpls_out_info *tfi_moi[MPLS_TCINDEX_NUM]; +=09unsigned short tfi_mask; +}; + +struct mpls_exp_fwd_info { +=09struct mpls_out_info *efi_moi[MPLS_EXP_NUM]; +}; + +struct mpls_exp2dsmark_info { +=09unsigned char e2d[MPLS_EXP_NUM]; +}; + +struct mpls_exp2tcindex_info { +=09unsigned short e2t[MPLS_EXP_NUM]; +}; + +struct mpls_tcindex2exp_info { +=09unsigned char t2e_mask; +=09unsigned char t2e[MPLS_TCINDEX_NUM]; +}; + +struct mpls_dsmark2exp_info { +=09unsigned char d2e_mask; +=09unsigned char d2e[MPLS_DSMARK_NUM]; +}; + +struct mpls_nfmark2exp_info { +=09unsigned char n2e_mask; +=09unsigned char n2e[MPLS_NFMARK_NUM]; +}; + + + +/*************************************************************************= *** + * Instruction (OPCODEs) Management + * + * + *************************************************************************= ***/ +extern void +mpls_instruction_clear (struct mpls_instruction*,int,enum mpls_direction_e= num,void*); + +extern int +mpls_instruction_build (struct mpls_instruction_req*, struct mpls_instruct= ion*,int, void*); + + + + + + +/*************************************************************************= *** + * MPLS INPUT INFO (MII) OBJECT MANAGEMENT + * net/mpls/mpls_in_info.c + * + * + *************************************************************************= ***/ + +struct mpls_in_info { +=09/* Ref count */ +=09atomic_t __refcnt; +=09/* List of Devices */ +=09struct list_head dev_entry; +=09/* List of MOI */ +=09struct list_head moi_entry; +=09/* Array of Instructions to execute for this MII */ +=09struct mpls_instruction mii_instruction[MPLS_NUM_OPS]; +=09/* size of the array */ +=09unsigned short mii_instruction_length; +=09struct mpls_label mii_label; +=09unsigned int mii_key; +=09unsigned int mii_age; +=09unsigned short mii_proto; +=09unsigned short mii_labelspace; +=09unsigned long mii_packets; +=09unsigned long mii_bytes; +=09unsigned long mii_drops; +}; + +typedef struct mpls_in_info mpls_in_info_t; + +extern mpls_in_info_t ipv4_explicit_null; +#ifdef CONFIG_IPV6 +extern mpls_in_info_t ipv6_explicit_null; +#endif + +/**** + * Input Radix Tree Management + **/ +struct mpls_in_info_node { +=09RADIX_ENTRY(mpls_in_info_node,MPLS_TREE_BITS) next; +=09struct mpls_in_info *mii; +}; + +extern struct mpls_in_info_tree mii_tree; +RADIX_HEAD(mpls_in_info_tree,mpls_in_info_node); + +extern void mpls_flush_in_tree(void); +extern int mpls_insert_mii(unsigned int key, mpls_in_info_t* = mii); +extern mpls_in_info_t* mpls_get_mii (unsigned int key); +extern mpls_in_info_t* mpls_get_mii_by_label(struct mpls_push_data *mpr, +=09=09struct mpls_label *label, int labelspace); + + +/*************************************************************************= *** + * MPLS OUTPUT INFO (MOI) OBJECT MANAGEMENT + * net/mpls/mpls_in_info.c + * + * + *************************************************************************= ***/ + +struct mpls_out_info { +=09atomic_t __refcnt; +=09struct notifier_block* moi_notifier_list; +=09struct list_head list_out; +=09struct list_head list_in; +=09struct list_head dev_entry; +=09struct list_head moi_entry; +=09struct mpls_instruction moi_instruction[MPLS_NUM_OPS]; +=09unsigned short moi_instruction_length; +=09int moi_ifindex; +=09unsigned int moi_key; +=09unsigned int moi_age; +=09unsigned short moi_mtu_limit; +=09unsigned short moi_mtu; +=09unsigned char moi_propogate_ttl; +=09unsigned long moi_packets; +=09unsigned long moi_bytes; +=09unsigned long moi_drops; +=09struct dst_entry* moi_dst; + +}; + +typedef struct mpls_out_info mpls_out_info_t; + +struct mpls_fwd_block { +=09struct notifier_block notifier_block; +=09struct mpls_out_info *owner; +=09struct mpls_out_info *fwd; +}; + + + +/**** + * Output Radix Tree Management + **/ +struct mpls_out_info_node { +=09RADIX_ENTRY(mpls_out_info_node,MPLS_TREE_BITS) next; +=09struct mpls_out_info *moi; +}; +RADIX_HEAD(mpls_out_info_tree,mpls_out_info_node); +extern struct mpls_out_info_tree moi_tree; + +extern mpls_out_info_t* mpls_get_moi(unsigned int key); +extern int mpls_insert_moi(unsigned int key, mpls_out_info_t*= moi); +extern void mpls_flush_out_tree(void); + + + +/*************************************************************************= *** + * Helper Functions + * + * + * + *************************************************************************= ***/ + +extern void mpls_skb_dump (struct sk_buff* sk); +extern unsigned int mpls_label2key (int, const struct mpls_label*); +extern struct net_device* mpls_dev_by_labelspace(int); + + +/*************************************************************************= *** + * INCOMING (INPUT) LABELLED PACKET MANAGEMENT + * net/mpls/mpls_input.c + * + * + * + *************************************************************************= ***/ + +extern int mpls_input_init(void); + +extern void mpls_input_exit(void); + +extern int mpls_dlv (struct sk_buff*,struct mpls_push_data*); + +extern int mpls_skb_recv(struct sk_buff *skb, struct net_device *dev, +=09=09struct packet_type* ptype); + +extern int mpls_skb_recv_mc(struct sk_buff *skb, struct net_device *dev, +=09=09struct packet_type* ptype); + +extern int mpls_input(struct sk_buff*,struct net_device*, struct packet_t= ype*, +=09=09struct mpls_label* label,struct mpls_push_data* mpr, +=09=09int labelspace); + + +/*************************************************************************= *** + * OUTGOING (OUTPUT) LABELLED PACKET MANAGEMENT + * net/mpls/mpls_output.c + * + * + * + *************************************************************************= ***/ + + +struct mpls_dst { +=09atomic_t __refcnt; +=09struct dst_entry *md_dst; +=09struct sockaddr md_nh; +}; + +extern int mpls_output_init(void); +extern void mpls_output_exit(void); + +extern int mpls_send (struct sk_buff *skb); +extern int mpls_bogus_output(struct sk_buff *skb); +extern int mpls_set_nexthop(struct dst_entry *dst, u32 nh_data, struct sp= ec_nh *spec); +extern int mpls_output(struct sk_buff *skb); +extern int mpls_output_shim (struct sk_buff *skb, struct mpls_out_info *m= oi); +extern int mpls_output2(struct sk_buff *skb,struct mpls_out_info *moi, +=09=09struct mpls_push_data *mpr); + +/*************************************************************************= *** + * Next Hop & dst Management. + *************************************************************************= ***/ + +extern int +mpls_spec_nh(struct dst_entry *, u32, struct spec_nh *); + +extern struct mpls_dst * +mpls_make_dst(unsigned int ifi,struct sockaddr *nh, struct mpls_out_info *= moi); + + + + + +/*************************************************************************= *** + * INPUT/OUTPUT INSTRUCTION OPCODES + * net/mpls/{mpls_opcode,mpls_opcode_in,mpls_opcode_out}.c + * + *************************************************************************= ***/ +#define MPLS_OPCODE_PROTOTYPE(NAME) \ +int (NAME) (struct sk_buff** skb,struct mpls_in_info *mii, \ +=09=09struct mpls_out_info **moi, struct mpls_push_data *mpr, void *data) + +#define MPLS_IN_OPCODE_PROTOTYPE(NAME) MPLS_OPCODE_PROTOTYPE(NAME) +#define MPLS_OUT_OPCODE_PROTOTYPE(NAME) MPLS_OPCODE_PROTOTYPE(NAME) + +struct mpls_ops { +=09MPLS_OPCODE_PROTOTYPE(*fn); +=09char *msg; +=09int extra; +}; + +/* Array holding input opcodes */ +extern struct mpls_ops mpls_in_ops [MPLS_OP_MAX]; + +/* Array holding output opcodes */ +extern struct mpls_ops mpls_out_ops[MPLS_OP_MAX]; + + +extern void mpls_finish(struct sk_buff *skb); +extern int mpls_opcode_peek(struct sk_buff *skb, mpls_push_data_t *mpr)= ; +extern int mpls_push(struct sk_buff **skb,mpls_label_t *ml,mpls_push_da= ta_t *mpr); + + + +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_nop); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_pop); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_peek); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_push); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_dlv); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_fwd); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_exp_fwd); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_set_rx); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_set_tc); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_set_ds); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_set_exp); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_exp2tc); +extern MPLS_IN_OPCODE_PROTOTYPE(mpls_in_op_exp2ds); + +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_nop); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_push); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_fwd); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_nf_fwd); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_ds_fwd); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_exp_fwd); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_set); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_set_tc); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_set_exp); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_tc2exp); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_ds2exp); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_nf2exp); +extern MPLS_OUT_OPCODE_PROTOTYPE(mpls_out_op_exp2tc); + + + +/*************************************************************************= *** + * IOCTL/System Calls Management + * net/mpls/mpls_ioctl.c + * + * (Cf. include/linux/mpls.h for IOCTLS) + *************************************************************************= ***/ + +struct mpls_ioctl_args { + +}; + +extern int mpls_ioctl (unsigned int cmd,void *arg); +extern void mpls_ioctl_set(int (*hook)(unsigned int, void *)); + + +/* Query/Update Incoming Labels */ +extern int mpls_add_in_label (const struct mpls_in_label_req *in); +extern int mpls_get_in_label (struct mpls_in_label_req *in); +extern int mpls_del_in_label (struct mpls_in_label_req *in); +extern int mpls_set_in_label_proto (struct mpls_in_label_req *in); + +/* Query/Update Outgoing Labels */ +extern int mpls_add_out_label (struct mpls_out_label_req *out); +extern int mpls_get_out_label (struct mpls_out_label_req *out); +extern int mpls_del_out_label (struct mpls_out_label_req *out); +extern int mpls_set_out_label_mtu (struct mpls_out_label_req *out); + +/* Query/Update Crossconnects */ +extern int mpls_attach_in2out (struct mpls_xconnect_req *req); +extern int mpls_detach_in2out (struct mpls_xconnect_req *req); +extern int mpls_get_in2out (struct mpls_xconnect_req *req); + +/* Instruction Management */ +extern int mpls_info_default_in_instruction(struct mpls_in_info*); +extern int mpls_set_default_out_instruction(struct mpls_out_info*); +extern int mpls_set_in_label_instructions (struct mpls_instruction_req *i= n); +extern int mpls_get_in_label_instructions (struct mpls_instruction_req *i= n); +extern int mpls_set_out_label_instructions (struct mpls_instruction_req *o= ut); +extern int mpls_get_out_label_instructions (struct mpls_instruction_req *o= ut); + +/* Query/Update Labelspaces*/ +extern int mpls_get_labelspace (struct mpls_labelspace_req *req); +extern int mpls_set_labelspace (struct mpls_labelspace_req *req); + +/* Auxiliary */ +extern void __mpls_del_in_label(struct mpls_in_info*); +extern void __mpls_del_out_label(struct mpls_out_info*); + + +/*************************************************************************= *** + * REFERENCE COUNT MANAGEMENT + * + * + * + * + *************************************************************************= ***/ + +/* Hold */ +static inline void mpls_in_info_hold(struct mpls_in_info* mii) +{ +=09BUG_ON(!mii); +=09atomic_inc(&mii->__refcnt); +} + +static inline void mpls_out_info_hold(struct mpls_out_info* moi) +{ +=09BUG_ON(!moi); +=09atomic_inc(&moi->__refcnt); +} + +static inline void mpls_label_hold(struct mpls_label* label) +{ +=09BUG_ON(!label); +=09atomic_inc(&label->__refcnt); +} + +static inline void mpls_dst_hold(struct mpls_dst* md) +{ +=09BUG_ON(!md); +=09atomic_inc(&md->__refcnt); +} + + +/* Release */ +static inline void mpls_in_info_release(struct mpls_in_info* mii) +{ +=09BUG_ON(!mii); +=09if (atomic_dec_and_test(&mii->__refcnt)) +=09=09__mpls_del_in_label(mii); +} + +static inline void mpls_out_info_release(struct mpls_out_info* moi) +{ +=09BUG_ON(!moi); +=09if (atomic_dec_and_test(&moi->__refcnt)) +=09=09__mpls_del_out_label(moi); +} + +static inline void mpls_label_release(struct mpls_label* label) +{ +=09extern kmem_cache_t* mpls_label_cachep; +=09BUG_ON(!label); +=09if (atomic_dec_and_test(&label->__refcnt)) +=09=09kmem_cache_free(mpls_label_cachep,label); +} + +static inline void mpls_dst_release(struct mpls_dst* md) +{ +=09BUG_ON(!md); +=09if (atomic_dec_and_test(&md->__refcnt)) +=09=09if (md->md_dst) +=09=09=09dst_destroy(md->md_dst); +=09kfree(md); +} + + +/*************************************************************************= *** + * PROCFS Implementation + * net/mpls/mpls_proc.c + * + * + * + *************************************************************************= ***/ + +extern int mpls_procfs_init (void); +extern void mpls_procfs_exit (void); + + + + +/*************************************************************************= *** + * Virtual Intefaces (Tunnel) Management + * (e.g. mpls0, mpls1, TXXethN, etc.) + * net/mpls/mpls_tunnel.c + * + * + *************************************************************************= ***/ + +struct mpls_tunnel_private { +=09struct mpls_out_info *mtp_moi; +=09struct net_device *mtp_dev; +=09struct sockaddr mtp_dest; +=09struct mpls_tunnel_private* next; +=09struct net_device_stats stat; +}; + +extern struct net_device* mpls_tunnel_get_by_name (const char* name); +extern struct net_device* mpls_tunnel_get (struct ifreq *ifr); +extern void mpls_tunnel_put (struct net_device *dev)= ; +extern struct net_device* mpls_tunnel_create (struct ifreq *ifr); +extern void mpls_tunnel_destroy (struct ifreq *ifr); + +/* these should be moved when we fix the tunnel module */ +extern void __exit mpls_tunnel_exit_module(void); +extern int __init mpls_tunnel_init_module(void); + + +/* Casts */ +#define _mpls_as_label(PTR) ((struct mpls_label*)(PTR)) +#define _mpls_as_mii(PTR) ((struct mpls_in_info*)(PTR)) +#define _mpls_as_moi(PTR) ((struct mpls_out_info*)(PTR)) +#define _mpls_as_dfi(PTR) ((struct mpls_dsmark_fwd_info*)(PTR)) +#define _mpls_as_nfi(PTR) ((struct mpls_nfmark_fwd_info*)(PTR)) +#define _mpls_as_efi(PTR) ((struct mpls_exp_fwd_info*)(PTR)) +#define _mpls_as_netdev(PTR)((struct net_device*)(PTR)) +#define _mpls_as_dst(PTR) ((struct mpls_dst*)(PTR)) + +#endif diff -urN linux-2.6.0-test11/include/net/mpls_radix.h linux-2.6.0-test11-mp= ls/include/net/mpls_radix.h --- linux-2.6.0-test11/include/net/mpls_radix.h=091970-01-01 01:00:00.00000= 0000 +0100 +++ linux-2.6.0-test11-mpls/include/net/mpls_radix.h=092003-11-29 16:11:26.= 369613688 +0100 @@ -0,0 +1,167 @@ +/*************************************************************************= **** + * MPLS + * An implementation of the MPLS (MultiProtocol Label + * Switching Architecture) for Linux. + * + * File: linux/include/net/mpls_radix.h + * + * Authors: + * James Leu <jl...@mi...> + * Ramon Casellas <cas...@in...> + * + * (c) 1999-2003 James Leu <jl...@mi...> + * (c) 2003 Ramon Casellas <cas...@in...> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + *************************************************************************= **** + */ + + +#ifndef _NET_MPLS_RADIX_H_ +#define _NET_MPLS_RADIX_H_ + + +#define RADIX_INIT(head)=09=09=09=09=09=09\ +{=09=09=09=09=09=09=09=09=09\ +=09(head)->RdX_root =3D NULL;=09=09=09=09=09=09\ +} + +#define RADIX_ENTRY(type,bits)=09=09=09=09=09=09\ +struct {=09=09=09=09=09=09=09=09\ +=09struct type *RdX_next[(0x01 << bits)];=09=09=09=09\ +} + +#define RADIX_HEAD(name,type)=09=09=09=09=09=09\ +struct name {=09=09=09=09=09=09=09=09\ +=09struct type *RdX_root;=09=09=09=09=09=09\ +}; + +#define RADIX_INSERT(head,type,field,bits,tdef,key,klen,result)=09=09\ +{=09=09=09=09=09=09=09=09=09\ +=09struct type **RdX_node =3D &((head)->RdX_root);=09=09=09=09\ +=09=09tdef RdX_mask;=09=09=09=09=09=09=09\ +=09=09int RdX_index;=09=09=09=09=09=09=09\ +=09=09int RdX_size =3D sizeof(tdef)*8;=09=09=09=09=09\ +=09=09char RdX_status =3D 0;=09=09=09=09=09=09=09\ +=09=09int RdX_count =3D 0;=09=09=09=09=09=09=09\ +=09=09\ +=09=09if(klen % bits) result =3D -1;=09=09=09=09=09=09\ +=09=09else {=09=09=09=09=09=09=09=09\ +=09=09=09RdX_mask =3D (0x01 << bits)-1;=09=09=09=09=09\ +=09=09=09=09while((RdX_count*bits) < klen) {=09=09=09=09=09\ +=09=09=09=09=09RdX_status =3D 1;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09if(!(*RdX_node)) {=09=09=09=09=09=09\ +=09=09=09=09=09=09=09RdX_status =3D 2;=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09(*RdX_node) =3D (struct type *)kmalloc(sizeof(stru= ct type),=09\ +=09=09=09=09=09=09=09=09=09=09=09=09 GFP_KERNEL);=09=09=09=09=09=09=09= \ +=09=09=09=09=09=09=09=09if(!(*RdX_node)) {=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09=09result =3D -2;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09=09=09break;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09}=09=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09RdX_status =3D 3;=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09memset((*RdX_node),0,sizeof(struct type));=09=09= =09\ +=09=09=09=09=09=09=09=09RdX_status =3D 4;=09=09=09=09=09=09\ +=09=09=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09=09=09RdX_index =3D (key >> (RdX_size - bits - (RdX_count * bits)= )) & RdX_mask; \ +=09=09=09=09=09=09RdX_status =3D 5;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09RdX_node =3D &((*RdX_node)->field.RdX_next[RdX_index]);= =09=09\ +=09=09=09=09=09=09RdX_status =3D 6;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09RdX_count++;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09result =3D 0;=09=09=09=09=09=09=09\ +=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09if(result !=3D -2 && (!(*RdX_node))) {=09=09=09=09\ +=09=09=09=09(*RdX_node) =3D (struct type *)kmalloc(sizeof(struct type),=09= =09\ +=09=09=09=09=09=09=09=09 GFP_KERNEL);=09=09=09=09=09=09=09\ +=09=09=09=09=09if(!(*RdX_node)) {=09=09=09=09=09=09\ +=09=09=09=09=09=09result =3D -2;=09=09=09=09=09=09=09\ +=09=09=09=09=09} else {=09=09=09=09=09=09=09=09\ +=09=09=09=09=09=09memset((*RdX_node),0,sizeof(struct type));=09=09=09\ +=09=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09}=09=09=09=09=09=09=09=09=09\ +} + +#define RADIX_DO(head,type,field,bits,tdef,key,klen,flm,elem,result,RxD_se= t) \ +{=09=09=09=09=09=09=09=09=09\ +=09struct type *RdX_node =3D (head)->RdX_root;=09=09=09=09\ +=09=09tdef RdX_mask;=09=09=09=09=09=09=09\ +=09=09int RdX_index;=09=09=09=09=09=09=09\ +=09=09int RdX_size =3D sizeof(tdef)*8;=09=09=09=09=09\ +=09=09int RdX_count =3D 0;=09=09=09=09=09=09=09\ +=09=09\ +=09=09if(klen % bits) result =3D -1;=09=09=09=09=09=09\ +=09=09else {=09=09=09=09=09=09=09=09\ +=09=09=09RdX_mask =3D (0x01 << bits)-1;=09=09=09=09=09\ +=09=09=09=09while((RdX_count*bits) < klen) {=09=09=09=09=09\ +=09=09=09=09=09if(!RdX_node) {=09=09=09=09=09=09=09\ +=09=09=09=09=09=09result =3D -2;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09break;=09=09=09=09=09=09=09=09\ +=09=09=09=09=09} else {=09=09=09=09=09=09=09=09\ +=09=09=09=09=09=09RdX_index =3D (key >> (RdX_size - bits - (RdX_count * bi= ts))) & RdX_mask;\ +=09=09=09=09=09=09=09RdX_node =3D RdX_node->field.RdX_next[RdX_index];=09= =09=09\ +=09=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09=09=09RdX_count++;=09=09=09=09=09=09=09\ +=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09if(!RdX_node) {=09=09=09=09=09=09=09\ +=09=09=09=09result =3D -3;=09=09=09=09=09=09=09\ +=09=09=09} else {=09=09=09=09=09=09=09=09\ +=09=09=09=09if(RxD_set) {=09=09=09=09=09=09=09\ +=09=09=09=09=09RdX_node->flm =3D elem;=09=09=09=09=09=09\ +=09=09=09=09} else {=09=09=09=09=09=09=09=09\ +=09=09=09=09=09elem =3D RdX_node->flm;=09=09=09=09=09=09\ +=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09=09result =3D 0;=09=09=09=09=09=09=09\ +=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09}=09=09=09=09=09=09=09=09=09\ +} + +#define RADIX_SET(head,type,field,bits,tdef,key,klen,flm,elem,result)=09\ +RADIX_DO(head,type,field,bits,tdef,key,klen,flm,elem,result,1) +#define RADIX_GET(head,type,field,bits,tdef,key,klen,flm,elem,result)=09\ +RADIX_DO(head,type,field,bits,tdef,key,klen,flm,elem,result,0) + +#define RADIX_VISIT_ALL(head,type,field,bits,visit,extra)=09=09\ +{=09=09=09=09=09=09=09=09=09\ +=09int RdX_node_count =3D (0x1 << bits);=09=09=09=09=09\ +=09=09struct type *RdX_stack[RdX_node_count];=09=09=09=09\ +=09=09int RdX_stack_count[RdX_node_count];=09=09=09=09=09\ +=09=09int RdX_stack_depth =3D 0;=09=09=09=09=09=09\ +=09=09struct type *RdX_node;=09=09=09=09=09=09\ +=09=09int RdX_done =3D 0;=09=09=09=09=09=09=09\ +=09=09int R =3D 0;=09=09=09=09=09=09=09=09\ +=09=09\ +=09=09if((RdX_node =3D (head)->RdX_root)) {=09=09=09=09=09\ +=09=09=09while(!RdX_done) {=09=09=09=09=09=09=09\ +=09=09=09=09if(RdX_node->field.RdX_next[R]) {=09=09=09=09=09\ +=09=09=09=09=09RdX_stack[RdX_stack_depth] =3D RdX_node;=09=09=09=09\ +=09=09=09=09=09=09RdX_stack_count[RdX_stack_depth] =3D R + 1;=09=09=09\ +=09=09=09=09=09=09RdX_stack_depth++;=09=09=09=09=09=09\ +=09=09=09=09=09=09RdX_node =3D RdX_node->field.RdX_next[R];=09=09=09=09\ +=09=09=09=09=09=09R =3D 0;=09=09=09=09=09=09=09=09\ +=09=09=09=09} else {=09=09=09=09=09=09=09=09\ +=09=09=09=09=09R++;=09=09=09=09=09=09=09=09\ +=09=09=09=09=09=09if(R >=3D RdX_node_count) {=09=09=09=09=09\ +=09=09=09=09=09=09=09if(visit(RdX_node,extra)) {=09=09=09=09=09\ +=09=09=09=09=09=09=09=09RdX_done =3D 1;=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09=09break;=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09}=09=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09RdX_stack_depth--;=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09if(RdX_stack_depth >=3D 0) {=09=09=09=09=09\ +=09=09=09=09=09=09=09=09=09RdX_node =3D RdX_stack[RdX_stack_depth];=09=09= =09\ +=09=09=09=09=09=09=09=09=09=09R =3D RdX_stack_count[RdX_stack_depth];=09= =09=09\ +=09=09=09=09=09=09=09=09} else {=09=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09=09RdX_done =3D 1;=09=09=09=09=09=09\ +=09=09=09=09=09=09=09=09}=09=09=09=09=09=09=09=09\ +=09=09=09=09=09=09}=09=09=09=09=09=09=09=09\ +=09=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09=09}=09=09=09=09=09=09=09=09=09\ +=09=09}=09=09=09=09=09=09=09=09=09\ +} + +#define MPLS_TREE_BITS=09=094 +#define MPLS_TREE_DEPTH=09=0932 + +#endif diff -urN linux-2.6.0-test11/include/net/spec_nh.h linux-2.6.0-test11-mpls/= include/net/spec_nh.h --- linux-2.6.0-test11/include/net/spec_nh.h=091970-01-01 01:00:00.00000000= 0 +0100 +++ linux-2.6.0-test11-mpls/include/net/spec_nh.h=092003-11-29 16:11:26.370= 613536 +0100 @@ -0,0 +1,32 @@ +/* + *=09SPEC NH Interface for special nexthop handling + * + *=09=09This program is free software; you can redistribute it and/or + *=09=09modify it under the terms of the GNU General Public License + *=09=09as published by the Free Software Foundation; either version + *=09=092 of the License, or (at your option) any later version. + * + *=09Authors:=09James R. Leu <jl...@mi...> + */ +#ifndef _LINUX_SPEC_NH_H +#define _LINUX_SPEC_NH_H + +#include <linux/netdevice.h> +#include <linux/skbuff.h> +#include <net/dst.h> +#include <linux/list.h> + +struct spec_nh +{ +=09char *name; +=09int (*func)(struct dst_entry *, u32, struct spec_nh *); +=09struct list_head list; +=09unsigned short type; +}; + +extern void spec_nh_add(struct spec_nh *spec); +extern void spec_nh_remove(struct spec_nh *spec); +extern struct spec_nh *spec_nh_find(unsigned short proto); +extern void spec_nh_init(void); + +#endif diff -urN linux-2.6.0-test11/net/Kconfig linux-2.6.0-test11-mpls/net/Kconfi= g --- linux-2.6.0-test11/net/Kconfig=092003-11-27 14:23:54.000000000 +0100 +++ linux-2.6.0-test11-mpls/net/Kconfig=092003-11-29 16:11:26.371613384 +01= 00 @@ -37,6 +37,36 @@ =09 If unsure, say Y. +config MPLS +=09tristate "Multiprotocol Label Switching" +=09depends on INET && EXPERIMENTAL +=09---help--- +=09 In conventional IP forwarding, a particular router will typically + =09 consider two packets to be in the same FEC if there is some address + =09 prefix X in that router's routing tables such that X is the "longes= t +=09 match" for each packet's destination address. As the packet +=09 traverses the network, each hop in turn reexamines the packet and +=09 assigns it to a FEC. + +=09 In MPLS, the assignment of a particular packet to a particular FEC is +=09 done just once, as the packet enters the network. The FEC to which +=09 the packet is assigned is encoded as a short fixed length value known +=09 as a "label". When a packet is forwarded to its next hop, the label +=09 is sent along with it; that is, the packets are "labeled" before they +=09 are forwarded. + +=09 At subsequent hops, there is no further analysis of the packet's +=09 network layer header. Rather, the label is used as an index into a +=09 table which specifies the next hop, and a new label. The old label +=09 is replaced with the new label, and the packet is forwarded to its +=09 next hop. + +=09 In the MPLS forwarding paradigm, once a packet is assigned to a FEC, +=09 no further header analysis is done by subsequent routers; all +=09 forwarding is driven by the labels. + +=09 If unsure, say N. + config PACKET_MMAP =09bool "Packet socket: mmapped IO" =09depends on PACKET diff -urN linux-2.6.0-test11/net/Makefile linux-2.6.0-test11-mpls/net/Makef= ile --- linux-2.6.0-test11/net/Makefile=092003-11-27 14:23:54.000000000 +0100 +++ linux-2.6.0-test11-mpls/net/Makefile=092003-11-29 16:11:26.371613384 +0= 100 @@ -18,6 +18,7 @@ obj-$(CONFIG_UNIX)=09=09+=3D unix/ obj-$(CONFIG_IPV6)=09=09+=3D ipv6/ obj-$(CONFIG_PACKET)=09=09+=3D packet/ +obj-$(CONFIG_MPLS)=09=09+=3D mpls/ obj-$(CONFIG_NET_KEY)=09=09+=3D key/ obj-$(CONFIG_NET_SCHED)=09=09+=3D sched/ obj-$(CONFIG_BRIDGE)=09=09+=3D bridge/ @@ -38,6 +39,7 @@ obj-$(CONFIG_ECONET)=09=09+=3D econet/ obj-$(CONFIG_VLAN_8021Q)=09+=3D 8021q/ obj-$(CONFIG_IP_SCTP)=09=09+=3D sctp/ +obj-$(CONFIG_MPLS)=09=09+=3D mpls/ ifeq ($(CONFIG_NET),y) obj-$(CONFIG_SYSCTL)=09=09+=3D sysctl_net.o diff -urN linux-2.6.0-test11/net/core/Makefile linux-2.6.0-test11-mpls/net/= core/Makefile --- linux-2.6.0-test11/net/core/Makefile=092003-11-27 14:23:54.000000000 +0= 100 +++ linux-2.6.0-test11-mpls/net/core/Makefile=092003-11-29 16:11:26.3726132= 32 +0100 @@ -7,7 +7,8 @@ obj-$(CONFIG_SYSCTL) +=3D sysctl_net_core.o obj-y=09=09 +=3D flow.o dev.o ethtool.o net-sysfs.o dev_mcast.o dst.o = \ -=09=09=09neighbour.o rtnetlink.o utils.o link_watch.o filter.o +=09=09=09neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ +=09=09=09spec_nh.o obj-$(CONFIG_NETFILTER) +=3D netfilter.o obj-$(CONFIG_NET_DIVERT) +=3D dv.o diff -urN linux-2.6.0-test11/net/core/dev.c linux-2.6.0-test11-mpls/net/cor= e/dev.c --- linux-2.6.0-test11/net/core/dev.c=092003-11-27 14:23:54.000000000 +0100 +++ linux-2.6.0-test11-mpls/net/core/dev.c=092003-11-29 16:11:26.374612928 = +0100 @@ -110,6 +110,7 @@ #include <net/iw_handler.h> #endif=09/* CONFIG_NET_RADIO */ #include <asm/current.h> +#include <net/spec_nh.h> /* This define, if set, will randomly drop a packet when congestion * is more than moderate. It helps fairness in the multi-interface @@ -2581,6 +2582,15 @@ =09=09=09=09return ret; =09=09=09} #endif=09/* WIRELESS_EXT */ + +#ifdef CONFIG_MPLS +/* RCAS Added 20031128 for 2.4 Compat. We are moving to netlink...*/ +=09=09=09if (cmd >=3D SIOCMPLSFIRST && cmd <=3D SIOCMPLSLAST) { +=09=09=09=09ret =3D mpls_ioctl(cmd,(void *) arg); +=09=09=09=09return ret; +=09=09=09} +#endif /* CONFIG_MPLS */ + =09=09=09return -EINVAL; =09} } @@ -3038,6 +3048,7 @@ =09dst_init(); =09dev_mcast_init(); +=09spec_nh_init(); #ifdef CONFIG_NET_SCHED =09pktsched_init(); diff -urN linux-2.6.0-test11/net/core/dst.c linux-2.6.0-test11-mpls/net/cor= e/dst.c --- linux-2.6.0-test11/net/core/dst.c=092003-11-27 14:23:54.000000000 +0100 +++ linux-2.6.0-test11-mpls/net/core/dst.c=092003-11-29 16:11:26.376612624 = +0100 @@ -19,6 +19,10 @@ #include <net/dst.h> +#ifdef CONFIG_MPLS +#include <net/mpls.h> +#endif + /* Locking strategy: * 1) Garbage collection state of dead destination cache * entries is protected by dst_lock. @@ -192,6 +196,10 @@ #if RT_CACHE_DEBUG >=3D 2 =09atomic_dec(&dst_total); #endif +#ifdef CONFIG_MPLS +=09if (dst->mpls_moi) +=09=09mpls_out_info_release(dst->mpls_moi); +#endif =09kmem_cache_free(dst->ops->kmem_cachep, dst); =09dst =3D child; diff -urN linux-2.6.0-test11/net/core/neighbour.c linux-2.6.0-test11-mpls/n= et/core/neighbour.c --- linux-2.6.0-test11/net/core/neighbour.c=092003-11-27 14:23:54.000000000= +0100 +++ linux-2.6.0-test11-mpls/net/core/neighbour.c=092003-11-29 16:11:26.3776= 12472 +0100 @@ -999,7 +999,7 @@ =09=09if (dev->hard_header_cache && !dst->hh) { =09=09=09write_lock_bh(&neigh->lock); =09=09=09if (!dst->hh) -=09=09=09=09neigh_hh_init(neigh, dst, dst->ops->protocol); +=09=09=09=09neigh_hh_init(neigh, dst, skb->protocol); =09=09=09err =3D dev->hard_header(skb, dev, ntohs(skb->protocol), =09=09=09=09=09 neigh->ha, NULL, skb->len); =09=09=09write_unlock_bh(&neigh->lock); diff -urN linux-2.6.0-test11/net/core/spec_nh.c linux-2.6.0-test11-mpls/net= /core/spec_nh.c --- linux-2.6.0-test11/net/core/spec_nh.c=091970-01-01 01:00:00.000000000 += 0100 +++ linux-2.6.0-test11-mpls/net/core/spec_nh.c=092003-11-29 16:11:26.378612= 320 +0100 @@ -0,0 +1,282 @@ +/* + * SPEC NH Interface for special nexthops + * + *=09=09This program is free software; you can redistribute it and/or + *=09=09modify it under the terms of the GNU General Public License + *=09=09as published by the Free Software Foundation; either version + *=09=092 of the License, or (at your option) any later version. + * + *=09Heavily borrowed from dev_remove_pack/dev_add_pack + * + *=09Authors:=09James R. Leu <jl...@mi...> + */ + +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/spinlock.h> +#include <asm/byteorder.h> +#include <linux/list.h> +#include <net/spec_nh.h> +#include <linux/proc_fs.h> +#include <linux/seq_file.h> + +static spinlock_t spec_nh_lock =3D SPIN_LOCK_UNLOCKED; +static struct list_head spec_nh_base[16]; /* 16 way hashed list */ + +/** + *=09spec_nh_add - add a special nexthop handler + *=09@spec: special nexthop declaration + * + * =09Add a special nexthop handler to the networking stack. The + *=09passed &spec_nh is linked into the kernel list and may not be + *=09freed until it has been removed from the kernel list. + * + *=09This call does not sleep therefore is can not guarantee all + *=09CPU's that are in middle of processing packets will see the + *=09new special nexthop handler (until they process another packet) + */ + +void spec_nh_add(struct spec_nh *spec) +{ +=09int hash; + +=09spin_lock_bh(&spec_nh_lock); + +=09hash =3D ntohs(spec->type) & 15; +=09list_add_rcu(&spec->list, &spec_nh_base[hash]); + +=09spin_unlock_bh(&spec_nh_lock); +} + +/** + *=09spec_nh_remove - remove a special nexthop handler + *=09@spec: special nexthop declaration + * + *=09Remove a special nexthop handler that was previously added to the + *=09kernels list of special nexthop handlers by spec_nh_add(). The + *=09pass &spec_nh is removed from the kernels list and can be freed + *=09or reused once this function returns. + * + *=09This call sleeps to guarantee that no CPU is looking at the + *=09special nexthop handler after return. + */ + +void spec_nh_remove(struct spec_nh *spec) +{ +=09struct list_head *head; +=09struct spec_nh *spec1; + +=09spin_lock_bh(&spec_nh_lock); +=09head =3D &spec_nh_base[ntohs(spec->type) & 15]; + +=09list_for_each_entry(spec1, head, list) { +=09=09if (spec =3D=3D spec1) { +=09=09=09list_del_rcu(&spec->list); +=09=09=09goto out; +=09=09} +=09} +=09printk(KERN_WARNING "spec_nh_remove: %p not found.\n", spec); +out: +=09spin_unlock_bh(&spec_nh_lock); + +=09synchronize_net(); + +} + +/** + *=09spec_nh_find - find a special nexthop handler by it's protocol type + *=09@proto: protocol type declaration + * + *=09Search the kernels list of special nexthops handlers looking for + *=09a handler for this specific protocol. + */ +struct spec_nh *spec_nh_find(unsigned short proto) +{ +=09struct list_head *head; +=09struct spec_nh *spec; + +=09spin_lock_bh(&spec_nh_lock); +=09head =3D &spec_nh_base[ntohs(proto) & 15]; + +=09list_for_each_entry(spec, head, list) { +=09=09if (proto =3D=3D spec->type) { +=09=09=09goto out; +=09=09} +=09} +=09spec =3D NULL; +out: +=09spin_unlock_bh(&spec_nh_lock); + +=09return spec; +} + +/* + * Proc filesystem directory entries. + */ + +/* + * /proc/net/spec_nh + */ + +static struct proc_dir_entry *proc_spec_nh_dir; + +/* + * /proc/net/spec_nh/config + */ + +static struct proc_dir_entry *proc_spec_nh_conf; + +/* + * Names of the proc directory entries + */ + +static const char name_root[] =3D "spec_nh"; +static const char name_conf[] =3D "config"; + +/* + * The following few functions build the content of /proc/net/spec_nh/conf= ig + */ + +/* starting at spec, find the next registered protocol */ +struct spec_nh *spec_nh_skip(struct spec_nh *spec) +{ +=09struct list_head *head; +=09struct spec_nh *spec1; +=09int next =3D 0; +=09int slot =3D 0; + +=09if (spec) +=09=09slot =3D ntohs(spec->type) & 15; +=09else +=09=09next =3D 1; + +=09for (;slot < 16;slot++) { +=09=09head =3D &spec_nh_base[slot]; +=09=09list_for_each_entry(spec1, head, list) { +=09=09=09if (next) +=09=09=09=09return spec1; + +=09=09=09if (spec1 =3D=3D spec) +=09=09=09=09next =3D 1; +=09=09} +=09} + +=09return NULL; +} + + +/* start read of /proc/net/spec_nh/config */ +static void *spec_nh_seq_start(struct seq_file *seq, loff_t *pos) +{ +=09struct spec_nh *spec; +=09loff_t i =3D 1; + +=09spin_lock_bh(&spec_nh_lock); + +=09if (*pos =3D=3D 0) +=09=09return SEQ_START_TOKEN; + +=09for (spec =3D spec_nh_ski... [truncated message content] |
From: James R. L. <jl...@mi...> - 2003-11-25 14:53:53
|
Test -- James R. Leu jl...@mi... |
From: jamal <ha...@cy...> - 2003-11-25 11:12:30
|
Well, I didnt see the echo of my message back. Am i subscribed? cheers, jamal On Tue, 2003-11-25 at 05:27, Ramon Casellas wrote: > On 24 Nov 2003, jamal wrote: > > > > > Do i get a pong back? > > Pong! > > > |
From: Ramon C. <cas...@in...> - 2003-11-25 10:28:03
|
On 24 Nov 2003, jamal wrote: > > Do i get a pong back? Pong! |
From: jamal <ha...@cy...> - 2003-11-25 03:41:23
|
Do i get a pong back? cheers, jamal |