Thread: [mpls-linux-devel] Jamal's kernel patch
Status: Beta
Brought to you by:
jleu
From: Ramon C. <cas...@in...> - 2004-02-13 22:24:40
|
Jamal, I am still in the middle of understanding your patch. One of the things that worries me (most probably due to my lack of understanding) is that it seems quite intrusive w.r.t other parts of the stack. IMVHO, I often consider strong coupling not_a_so_good_thing, and I defend duplicating some parts of code in the sake of clarity and modularity. So some ideas/questions: * I appreciated your effort with the design document. I am a paranoid guy regarding documentation (that's why I wrote down the devel guide on James' implementation). A design document stating the required changes of core parts for MPLS support and the reasons would be much welcome, and it would allow further discussion (you stated in a previous mail, that this time, as a premiere in Linux, you wanted to do things right :)) . Do you plan to write something about that? I know it is the most ungrateful part.. * In this sense, to truly modularize the MPLS implementation, I think it would be appropriate to make things in such a way that the user could be able to select "Core MPLS support" and "Full MPLS Support" (or something like that) when configuring the kernel. Core LSRs would only be able to forward mpls labelled packets without knowledge of L3 protocols (think of a BGP/MPLS VPN 'P' router that is used to forward L3 and L2 frames) and only a minimal set modifications to IPv4/IPv6 would be compiled in (in other words, the FIB Table need only be extended in the second case). Is this level of granularity common practice in the Linux kernel? * It's just a simple question, take no offense :) but do you consider the patch you sent quite "feature freeze" and "written in stone" or are you willing to open development and allow changes *iff* common consensus justifies it? I think this is an important point for us. Thoughts? R. |
From: Jamal H. S. <ha...@zn...> - 2004-02-14 00:32:53
|
On Fri, 2004-02-13 at 17:22, Ramon Casellas wrote: > Jamal, > > I am still in the middle of understanding your patch. One of the things > that worries me (most probably due to my lack of understanding) is that it > seems quite intrusive w.r.t other parts of the stack. IMVHO, I often > consider strong coupling not_a_so_good_thing, and I defend duplicating > some parts of code in the sake of clarity and modularity. So some > ideas/questions: There are certain things that you cant avoid. Example you will have ifdefs in the v6 and v4 for FTN support. The less ifdefs the better. I think once you start attaching IPSEC to MPLS, same thing will happen. There are things which are v4 and v6 specific that are totaly abstracted out but dependent on those protocols - example neighbor binding. But this is really clean right now. look at the mpls_prot_driver code. I think this code is as decoupled as you can go but i may have missed your point. > * I appreciated your effort with the design document. I am a paranoid guy > regarding documentation (that's why I wrote down the devel guide on James' > implementation). A design document stating the required changes of core > parts for MPLS support and the reasons would be much welcome, and it would > allow further discussion (you stated in a previous mail, that this time, > as a premiere in Linux, you wanted to do things right :)) . Do you plan to > write something about that? I know it is the most ungrateful part.. I am capable of writing good doc with proper motivation. I dont have it right now but you could do that ;-> If you want you can take over the spec doc. I will try to clarify things when i can. > * In this sense, to truly modularize the MPLS implementation, I think it > would be appropriate to make things in such a way that the user could be > able to select "Core MPLS support" and "Full MPLS Support" (or something > like that) when configuring the kernel. Core LSRs would only be able to > forward mpls labelled packets without knowledge of L3 protocols (think of > a BGP/MPLS VPN 'P' router that is used to forward L3 and L2 frames) and > only a minimal set modifications to IPv4/IPv6 would be compiled in (in > other words, the FIB Table need only be extended in the second case). Is > this level of granularity common practice in the Linux kernel? Are you refering to being able to compile out FTN support? I think this is doable; you just need to introduce a config probably one for each of v4 or 6. > * It's just a simple question, take no offense :) but do you consider the > patch you sent quite "feature freeze" and "written in stone" or are you > willing to open development and allow changes *iff* common consensus > justifies it? I think this is an important point for us. > Consensus is key between us at least. caveat: What i would like though is to avoid having to stress Dave when theres no clear win in some change to be made. I would like to make it easy for him to accept things - so lets discuss changes first like the dst changes then have some good reasons before we talk to him. cheers, jamal |
From: Ramon C. <cas...@in...> - 2004-02-14 22:08:51
|
Jamal, All, Please, find my comments inline below. They concern userspace app grammar and syntax, as well as discussing opcodes. GENERIC COMMENTS (no flaming intented). ######################################################## Although I think that we are indeed on the right path, IMHO, for the moment, your implementation is lagging functionnality w.r.t. James' one (available userspace apps, diffserv mapping, tunnels, procfs, sysfs,etc. although some are not strictly required), although I admit that there are still serious issues with James' impl (locking and SMP safeness are the most notorious ones), and that it is just a matter of time and work. Given the fact that DaveM explicitely supports yours, it seems clear to me that we should focus on it (James? yours is the last word... I have been working on yours for only several months, you have spent the last five years), and avoid any other fork. So the question is "what can we do now?". When I started working on James implementation, I appreciated being almost immediately given write access, so I could do some documenting tasks while I was understanding the inner works..., and I was trusted. I understand that you may see things differently. What is your position on this? Neither James nor I have (for the moment?) access to the patch/CVS repository, l2c userspace application.... Somehow I feel hand tied :) I have spare time and I'm afraid that you may want a centralized approach, which may have some inconvenients (although you have all right to). In other words, if you were to be this project manager ;). How would you define the tasks so everyone may contribute to the project, see the others recognize his work, etc? Personally, I am interested and I would like to play a nice part on this. What is missing? QUESTIONS ######################################################## question: Are there performance studies regarding radix trees w.r.t Hash buckets and linked lists? If the number of labels is large, isn't the O(N) walk op going to slow things down? How many labels are managed in average? for example, if we assume 100000 BGP prefixes and (why not) a label per prefix, with hash&walk (1024 hash buckets) it makes 100 entries (average) per bucket vs approx log2(N) with binary trees? what about other advanced ADT, like Hash buckets and radix trees or similar? Thoughts? question: regarding dst management. Maybe Alexey could enlighten us. It may be interesting to know his point of view about adding a specific mpls ptr to the generic dst struct, or he may even propose alternate solutions... COMMENTS ON USERSPACE APP ######################################################## Jamal's proposal: l2c mpls nhlfe <cmd> dev <devname> index <val> proto <ipv4|ipv6> nh <neighbor> <operation set> fec <FECid> operation set := (op <operation>)* * cmd is one of: <add | del | replace | get> 20040214-RCAS- We should work on both grammars. I understand that they are a work in progress, but they are imprecise and inconvenient. for example the "del" operation should not require the user to give the neighbour. OTOH, I think we don't need replace (simple remove and add) * index could be used to store the LSPid 20040214-RCAS- (I don't understand this. ???) * FECid is the FEC identifier to be used as the key for searching. 20040214-RCAS- nhlfe_id :) Well, IMHO, I think the grammar can be improved. All the opcodes need to be defined, with their arguments (get them from James Implementation. They are quite complete and comprehensive) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUGGESTION What about this ? l2c mpls nhlfe COMMAND <nhlfe_id> COMMAND := [add | del | get | SETCOMMAND] SETCOMMAND := set proto <ipv4|ipv6> nh <neighbour> "OPERATIONSET" OPERATIONSET := [swap SWAPARGS | pop | dlv | mapexp....],+ SWAPARGS := labelvalue[:labelspace].. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Examples # l2c mpls nhlfe add <nhlfe_id> Add an empty entry (default, drop). Error if exists. # l2c mpls nhlfe del <nhlfe_id> Remove the given entry. Ignore if not exists. # l2c mpls nhlfe get <nhlfe_id> Dump entry. Ignore if not exists. # l2c mpls nhlfe set proto ipv4 nh 10.0.0.2 "swap 20,push 50" <nhlfe_id> ############################################################## The ip tool should allow you specify route you want then specify the FECid for that route, i.e: ip route ... FECid <FECid> where FECid is the NHLFE keyid we want to use Example: ip route add 10.0.0.21/32 via 10.0.0.9 dev eth0 fecid 1 20040214-RCAS all occurrences of FECId should be changed to nhlfe_id. :) ############################################################################## JAMAL: l2c mpls ilm <cmd> dev <devname> index <val> label fec <FECid> RCAS : This should be <label> otherwise it looks like a keyword. * cmd is one of: <add | del | replace | get> RCAS: I think we don't need replace. Let the user del and add. Too many commands are cumbersome. RCAS: Let's work on the grammar. The user should only need to give the incoming label to remove, not the nhlfe_id that it points to. * devname is the input device to be used RCAS: right, but we need more flexibility. RCAS: one option would be to use wilcards, e.g. RCAS: l2c mpls ilm add "ethO:15" RCAS: l2c mpls ilm add "*:15" RCAS: but, I do think that the labelspace approach in james impl. RCAS: is better. Let the user set a labelspace as a netdevice RCAS: attribute and let the user define ILM entries as RCAS: labelspace+value. * Index is an additional identifier that could be used to store LSP info. RCAS : What is val? RCAS: I still don't understand this. Could you please give examples? * FECid is the FECid to be used for searching the NHLFE. RCAS: nhlfe_id is the NHLFE id to use. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUGGESTION What about this ? l2c mpls ilm COMMAND <ls:label> COMMAND := [add | del | get | BINDCOMMAND ] BINDCOMMAND := bind <nhlfe_id> to ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ well, or something like that ###################################################################### 3.0 Allowed OPCODEs 20040214-RCAS: We need more advanced (DiffServ, etc.) opcodes. We can leverage James implementation for this. 3.1 Modifiying opcodes - REDIRECT: redirect a packet to a different LSP (useful for testing or redirecting to a control plane) 20040214-RCAS: this can be useful. Nice. - MIRROR: send a copy of a packet somewhere else for further processing (useful for LSP pings, traceroute, debug etc) 20040214-RCAS: Idem. 3.2 Label action opcodes 20040214-RCAS: Why do we need to introduce two concepts "Label action opcodes" and "Atomic operations"? aren't all "Atomic operations" "label actions" and viceversa??? The atomic operations are:" - POP - PUSH - REPLACE 20040214-RCAS: Whats wrong with the standard name "SWAP"??? Note: a stack of consisting atomic operations can be implemented; example: a pop followed by several pushes. 20040214-RCAS: Ein??? Be more specific. Well, I have some urgent boring things to do, more comments to follow, Thanks, Ramon // ------------------------------------------------------------------- // Ramon Casellas - GET/ENST/INFRES/RHD/A508 - cas...@in... |
From: Jamal H. S. <ha...@zn...> - 2004-02-15 02:38:17
|
On Sat, 2004-02-14 at 17:04, Ramon Casellas wrote: > > Although I think that we are indeed on the right path, IMHO, for the > moment, your implementation is lagging functionnality w.r.t. James' one > (available userspace apps, diffserv mapping, tunnels, procfs, sysfs,etc. User space app for static management is already there - thats what "l2c mpls" is. Maybe i didnt make myself clear before, diffserv and more is there they are just not locked into mpls; they can be associated but are independent apps. Tunels whatever James has can be merged. Procfs or sysfs i care less about and wouldnt loose sleep if they didnt exists - we use netlink which should cover most of what these things try to do. If someone wants to add that go ahead. I apologize i never got around to looking at James code (and weekend is mostly for a pregnant woman who occasionaly looks away and i sneak to check mail), but theres a lot of stuff that could be merged in (i noticed PPP, ATM, FR for example). IIRC correclty from those old days, there exists some form of LDP implementation. That could be part of the userspace tools or safer separate. > although some are not strictly required), although I admit that there are > still serious issues with James' impl (locking and SMP safeness are the > most notorious ones), and that it is just a matter of time and work. > > Given the fact that DaveM explicitely supports yours, it seems clear to me As i said before this is NOT my implementation. I tried to document and sanitize what it does - mostly so we can have a useful discussion. My piece is user space to kernel. If you look at the code you will see my name appearing in only about two files or so. My preference is to use this implemantation and to have as little fight with Dave as possible and only make changes to the base when we see it appropriate (if theres a 5% improvement, we could think about talking to him, if theres a > 20% improvement we have more strength);-> I wanna see MPLS in 2.6 soon and i think this is the fastest way to get there. > that we should focus on it (James? yours is the last word... I have > been working on yours for only several months, you have spent the last > five years), and avoid any other fork. So the question is "what can we do > now?". When I started working on James implementation, I appreciated being > almost immediately given write access, so I could do some documenting > tasks while I was understanding the inner works..., and I was trusted. I > understand that you may see things differently. What is your position on > this? Neither James nor I have (for the moment?) access to the patch/CVS > repository, l2c userspace application.... Somehow I feel hand tied :) You have access to the patches. What more do you want? > I > have spare time and I'm afraid that you may want a centralized approach, > which may have some inconvenients (although you have all right to). OK, Set up a CVS repository. I am old fashioned and dont use it very much. I still refuse to use bitkeeper. The easiest thing for me is people send me patches and i merge them. Again i dont care if its CVS. Maybe we can try something more exciting like that competition to bitkeeper;-> > In other words, if you were to be this project manager ;). How would you define > the tasks so everyone may contribute to the project, see the others recognize > his work, etc? Personally, I am interested and I would like to play a nice part > on this. What is missing? > I think we need to discuss then someone codes or merges. For example, we need to settle on the multihop; i think the idea i suggested is the way to go for ILM - i dont care who codes; i could. Also we need to settle the dst issue and that may result in coding. Like i pointed out i think the ATM, PPP and FR features are missing. Someone adventorous could get some LDP code ported over or document the API so LDP porters could run over it. Look at the tod list and maybe add more to it - and lest start there. > > QUESTIONS > ######################################################## > > question: Are there performance studies regarding radix trees w.r.t Hash > buckets and linked lists? If the number of labels is large, isn't the O(N) walk > op going to slow things down? How many labels are managed in average? for > example, if we assume 100000 BGP prefixes and (why not) a label per prefix, > with hash&walk (1024 hash buckets) it makes 100 entries (average) per bucket vs > approx log2(N) with binary trees? I am indifferent and frankly dont care how it is done. If someone needs to change code like that (which is Daves) just come with some justification. I will support it if it looks valuable. Like i said hit that 20% threshold. > what about other advanced ADT, like Hash buckets and radix trees or similar? > Thoughts? For example for something like the ILM, where lookup is based on a 12 bit label, then i would think making it anything more than a hash and walk is overkill. If you can put 64 hash buckets thats already taking off 6 bits; which means worst case you will walk is 64. make it 256 buckets and suddenly you are looking at 16 worst case. So evaluate for each table what needs to be done then make a call. > question: regarding dst management. Maybe Alexey could enlighten us. It may be > interesting to know his point of view about adding a specific mpls ptr to the > generic dst struct, or he may even propose alternate solutions... I think dst is the way to go; whether we end up using a child or a ptr is something we need to settle first. James hasnt repsonded to my last email. I am also confident Davem knows this space well. Alexey we can use at the end so he can spit at the code. > > COMMENTS ON USERSPACE APP > ######################################################## > > Jamal's proposal: > > l2c mpls nhlfe <cmd> dev <devname> > index <val> proto <ipv4|ipv6> nh <neighbor> > <operation set> fec <FECid> > operation set := (op <operation>)* > * cmd is one of: <add | del | replace | get> > > > 20040214-RCAS- We should work on both grammars. I understand that they are a > work in progress, but they are imprecise and inconvenient. for example the > "del" operation should not require the user to give the neighbour. OTOH, I > think we don't need replace (simple remove and add) You can leave out the other parts on del and it would work. That idea is consistent with tc structure. Replace is an atomic del/add. A lot of table management has it. Imagine many applications trying to manage the same table. > * index could be used to store the LSPid > 20040214-RCAS- (I don't understand this. ???) NHLFE_id aka fecid is local i.e not spreadable over LDP for example. A management application such as a dynamic daemon which has a bigger view of the world may wish to identify further by LSPid - hence the existence of "index". If it doesnt make sense we could remove it; right now i see no harm in it. If you dont specify it, it gets zeroed. > * FECid is the FEC identifier to be used as the key for searching. > > 20040214-RCAS- nhlfe_id :) yep ;-> > > Well, IMHO, I think the grammar can be improved. All the opcodes > need to be defined, with their arguments (get them from James > Implementation. They are quite complete and comprehensive) Do you see anything in the base set other than push and pop? Everything else is a combination of these. I am trying to remember what James did - i think he had opcodes like multi-push (in the above case you just specify as many pushes as you want). Actually i am guilty of influencing this piece in Daves code. I was influenced by what i saw from the ASICs i looked at and the two implementations. Lets discuss. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > SUGGESTION What about this ? > > l2c mpls nhlfe COMMAND <nhlfe_id> > > COMMAND := [add | del | get | SETCOMMAND] > SETCOMMAND := set proto <ipv4|ipv6> nh <neighbour> "OPERATIONSET" > OPERATIONSET := [swap SWAPARGS | pop | dlv | mapexp....],+ > SWAPARGS := labelvalue[:labelspace].. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Examples > > # l2c mpls nhlfe add <nhlfe_id> > Add an empty entry (default, drop). Error if exists. > > # l2c mpls nhlfe del <nhlfe_id> > Remove the given entry. Ignore if not exists. > > # l2c mpls nhlfe get <nhlfe_id> > Dump entry. Ignore if not exists. > > # l2c mpls nhlfe set proto ipv4 nh 10.0.0.2 "swap 20,push 50" <nhlfe_id> > Is there a reason to make it two separate updates? Is the command too long maybe? > > > ############################################################## > The ip tool should allow you specify route you want then > specify the FECid for that route, i.e: > ip route ... FECid <FECid> > where FECid is the NHLFE keyid we want to use > Example: > ip route add 10.0.0.21/32 via 10.0.0.9 dev eth0 fecid 1 > > 20040214-RCAS all occurrences of FECId should be changed to nhlfe_id. > :) > Edit the doc and send an update ;-> > ############################################################################## > JAMAL: > l2c mpls ilm <cmd> dev <devname> > index <val> label fec <FECid> > > > RCAS : This should be <label> otherwise it looks like a keyword. thats a typo; should be: index <val> label <labelvalue> nhlfe_id <nhval> > * cmd is one of: <add | del | replace | get> > > RCAS: I think we don't need replace. Let the user del and add. Too > many commands are cumbersome. Like i said replace is there for atomicity of the two operations. All database operations typically have the above four commands. Look at this as a table that will be manipulated by many users concurently. > RCAS: Let's work on the grammar. The user should only need to give the > incoming label to remove, not the nhlfe_id that it points to. ?? The nhlfe_id must exist before the entry is allowed. Look at the architecture of the tables in the doc. All roads lead to the NHLFE table. > * devname is the input device to be used > RCAS: right, but we need more flexibility. > RCAS: one option would be to use wilcards, e.g. > RCAS: l2c mpls ilm add "ethO:15" > RCAS: l2c mpls ilm add "*:15" > RCAS: but, I do think that the labelspace approach in james impl. > RCAS: is better. Let the user set a labelspace as a netdevice > RCAS: attribute and let the user define ILM entries as > RCAS: labelspace+value. The labelsapce issue is still open. I can see its value in the L2VPN where an additional VPNid comes in with the labelspace. I am really struggling trying to see its value here. James and I had a small discussion we need to revive that. If you look at the code you will see, at the moment the label space is zero always. If there is something clear in the incoming packet that can be used to map to a device, then using labelspace becomes valuable. > > * Index is an additional identifier that could be used to > store LSP info. > > RCAS : What is val? > RCAS: I still don't understand this. Could you please give examples? Same idea as in the NHLFE. If it doesnt prove valuable we could remove it. > * FECid is the FECid to be used for searching the NHLFE. > > RCAS: nhlfe_id is the NHLFE id to use. > yep. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > SUGGESTION What about this ? > > l2c mpls ilm COMMAND <ls:label> > > COMMAND := [add | del | get | BINDCOMMAND ] > BINDCOMMAND := bind <nhlfe_id> to > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > well, or something like that > I have issues with the labelspace as i described above. > > ###################################################################### > > > 3.0 Allowed OPCODEs > > 20040214-RCAS: We need more advanced (DiffServ, etc.) opcodes. We can leverage > James implementation for this. > > > > > 3.1 Modifiying opcodes > > - REDIRECT: redirect a packet to a different LSP > (useful for testing or redirecting to a control plane) > > 20040214-RCAS: this can be useful. Nice. > > > > - MIRROR: send a copy of a packet somewhere else for further > processing (useful for LSP pings, traceroute, debug etc) > > 20040214-RCAS: Idem. > > > 3.2 Label action opcodes > > > 20040214-RCAS: Why do we need to introduce two concepts "Label action opcodes" > and "Atomic operations"? aren't all "Atomic operations" "label actions" and > viceversa??? The way i saw it (or was influenced to think of it is as follows): - there are three basic operations (just like there basic types in C prgramming eg integer, char, short). Then you can build complex compositions from the rest of them (just like you can build data structures in C froim teh atomic data types) > > The atomic operations are:" > > - POP > - PUSH > - REPLACE > > 20040214-RCAS: Whats wrong with the standard name "SWAP"??? sure, swap it is. > > Note: > a stack of consisting atomic operations can be implemented; example: > a pop followed by several pushes. > > 20040214-RCAS: Ein??? Be more specific. > The analogy of atomic data types and structures i described above applies. > > > Well, I have some urgent boring things to do, more comments to follow, And i have someone who is looking for me right now - i took too long to go to the washroom ;-> cheers, jamal |
From: Ramon C. <cas...@in...> - 2004-02-15 07:45:32
|
Jamal, Glad to know your wife is pregnant. Best wishes :) On 14 Feb 2004, Jamal Hadi Salim wrote: > IIRC correclty from those old days, there exists some form of LDP > implementation. That could be part of the userspace tools or safer > separate. James, How do you see porting LDP portable to this version? > You have access to the patches. What more do you want? > OK, Set up a CVS repository. I am old fashioned and dont use it very > much. I still refuse to use bitkeeper. The easiest thing for me is > people send me patches and i merge them. Again i dont care if its CVS. > Maybe we can try something more exciting like that competition to > bitkeeper;-> James, What about adding a new kernel version to the p4 repository? something like mpls-kernel-dm with the docs and patches, giving write access to J.R.L., J.H.S, D.S.M, R.C ? I am working on the spec doc and other docs. Later I want to start documenting DaveM with kerneldoc. Thoughts? > Edit the doc and send an update ;-> Working on it R. |
From: James R. L. <jl...@mi...> - 2004-02-19 05:24:32
|
Comments in line On Sun, Feb 15, 2004 at 08:41:51AM +0100, Ramon Casellas wrote: > > > Jamal, > > Glad to know your wife is pregnant. Best wishes :) > > > > On 14 Feb 2004, Jamal Hadi Salim wrote: > > > IIRC correclty from those old days, there exists some form of LDP > > implementation. That could be part of the userspace tools or safer > > separate. > > James, > > How do you see porting LDP portable to this version? My LDP implementation has an abstraction layer. It is just a matter of porting the abstraction layer to the new API. > > > > > > You have access to the patches. What more do you want? > > OK, Set up a CVS repository. I am old fashioned and dont use it very > > much. I still refuse to use bitkeeper. The easiest thing for me is > > people send me patches and i merge them. Again i dont care if its CVS. > > Maybe we can try something more exciting like that competition to > > bitkeeper;-> > > James, > > What about adding a new kernel version to the p4 repository? > something like mpls-kernel-dm with the docs and patches, giving > write access to J.R.L., J.H.S, D.S.M, R.C ? I am working on the spec doc > and other docs. Later I want to start documenting DaveM with kerneldoc. > Thoughts? > > > > > > > Edit the doc and send an update ;-> > > Working on it > > R. > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: Ramon C. <cas...@in...> - 2004-02-15 10:39:44
|
re-hi, FYI: http://www.enst.fr/~casellas/mpls-linux-2.6/spec/spec.pdf http://www.enst.fr/~casellas/mpls-linux-2.6/spec/index.html Work in progress. Things to fix. Regards, Ramon w.r.t Downstream on demand: I *do* think it's valuable and we *must* support it (e.g RSVP-TE). If I cannot use RSVP-TE to setup LSPs in Linux I'm going back right now to James implementation ;-) RCAS: I think we are confusing label distribution modes with implementation details. RSVP-TE uses downstream on demand as a label distribution mode, and it could be implemented as a two step process where a dummy NHLFE is created during the RSVP_PATH message so the ILM may point to it and then replaced by the right one upon reception of the RSVP_RESV. What we do not support is sending orphan packets to userspace, and that when an entry is added in the ILM or FTN there must be an exisiting NHLFEid. I'm not saying that we need to match the exact words of the RFC, but we can (and must) support downstream on demand. On 14 Feb 2004, Jamal Hadi Salim wrote: > As i said before this is NOT my implementation. I tried to document and > sanitize what it does - mostly so we can have a useful discussion. My right, sorry. It's DaveM's implementation. > him, if theres a > 20% improvement we have more strength);-> I wanna see > MPLS in 2.6 soon and i think this is the fastest way to get there. Well, me too :) but I'd rather see it in 2.6 when it's ready. Most probably you're right and it is.... > For example for something like the ILM, where lookup is based on a 12 > bit label, then i would think making it anything more than a hash and (...) > make it 256 buckets and suddenly you are looking at 16 worst case. Where do you get this numbers ? :) I thought the label was 20 bits, and with 256 buckets (2**8) you have 2**12 = 4096 worst case. even with 1024 buckets you get 1024 worst case. am I missing something? Are these values acceptable? In that case 4ill shut up :) > So evaluate for each table what needs to be done then make a call. You are right. A choice should be made with performance numbers around. R. |
From: Jamal H. S. <ha...@zn...> - 2004-02-16 14:25:38
|
On Sun, 2004-02-15 at 05:36, Ramon Casellas wrote: > re-hi, > > FYI: > http://www.enst.fr/~casellas/mpls-linux-2.6/spec/spec.pdf > http://www.enst.fr/~casellas/mpls-linux-2.6/spec/index.html > > Work in progress. Things to fix. I will print this and look at it then respond to the rest of your email. Off to the office so response may be a little slow. cheers, jamal |
From: Jamal H. S. <ha...@zn...> - 2004-02-16 15:54:15
|
On Sun, 2004-02-15 at 05:36, Ramon Casellas wrote: > re-hi, > > FYI: > http://www.enst.fr/~casellas/mpls-linux-2.6/spec/spec.pdf > http://www.enst.fr/~casellas/mpls-linux-2.6/spec/index.html > > Work in progress. Things to fix. That document looks pretty now ;-> Some comments: -Names: List by last name first (Casellas, Hadi Salim, Leu, Miller). This way noone inteprets the documentation to mean it is listed by contribution. - General comments: Originally the doc was written informally with "I" meaning myself. Its all over the doc. You may wanna fix that. - 1.2.1: The TODO is a separate document now. - 1.2.2.5: You still have that fecid in there. We may also need to provide an example on ECMP using "ip route nexhop .." - General: there should be consistency with the name for nhid - at times it reads nhlfe_id and others nhlfeid. - 1.2.3.1: Your comment on downstream on demand; i will respond below since you have that comment in this email as well. - figure 1.1: Logically you should draw an arrow from the route cache to the NHLFE entry without anything in between. Implementation wise at the moment there is a dst->mpls_route; but lets ignore that since we are not talking implementation here. What would be useful in this document as well is to describe the interface between the kernel and user space. i.e describe the packets used, events generated (at the moment any addition to NHLFE or ILM will generate an event); start by looking at: include/linux/l2cnetlink.h; to cutnpaste from there: ---------------------------------- /* ILM related */ struct ilmmsg { __u32 in_fecid; __u32 in_ifindex; __u32 in_space; __u32 in_label; __u8 in_owner; }; /* ILM attributes */ enum { ILM_UNSPEC, ILM_STATS, }; /* NHLFE related */ struct nhlfemsg { __u32 nh_fecid; __u32 nh_index; __u32 nh_ifindex; __u32 nh_space; __u32 nh_class; __u32 nh_flags; __u8 nh_owner; __u8 nh_proto; __u8 nh_dscp; __u8 nh_ttl; __u32 nh_ltype; }; /* owner - who installed the rule */ enum { L2C, /* the l2c tool */ }; /* nh_proto choices */ enum { MPLS_IPV4, MPLS_IPV6, }; /* nh_flags */ #define MPLS_FLAG_I_TC_INDEX 0x01 /* Input: Classify packet */ #define MPLS_FLAG_I_DIFFSERV 0x02 /* Input: Propagate diffserv bits */ #define MPLS_FLAG_O_TC_INDEX 0x04 /* Output: Classify packet */ #define MPLS_FLAG_O_DIFFSERV 0x08 /* Output: Propagate diffserv bits */ #define MPLS_FLAG_TTL_PROPAGATE 0x10 /* Input/Output: TTL propagation */ #define MIR_FLAG_TTL_PROPAGATE MPLS_FLAG_TTL_PROPAGATE /* NHLFE attributes */ enum { NH_UNSPEC, NH_OP_INS, NH_STATS, NH_NEIGH_IP, }; struct mpls_op_u { __u32 op; __u32 operand; }; ---------------------------------------- For how we document this typically look at: http://www.faqs.org/rfcs/rfc3549.html > > w.r.t Downstream on demand: > > I *do* think it's valuable and we *must* support it (e.g RSVP-TE). If I > cannot use RSVP-TE to setup LSPs in Linux I'm going back right now to > James implementation ;-) > > RCAS: I think we are confusing label distribution modes with > implementation details. RSVP-TE uses downstream on demand as a label > distribution mode, and it could be implemented as a two step process where > a dummy NHLFE is created during the RSVP_PATH message so the ILM may point > to it and then replaced by the right one upon reception of the RSVP_RESV. Ok. So we may need some extra speacilized NHLFE entries. I am not a big fan of the two step process unless you guys really insist - then we can go and convince davem. My opinion is lets have 3 new speacial NHLFEs: - something that sends the packet to a blackhole which will work for such a scenarion as above. - Another one will send the packet to user space via netlink. This may also be used for resolving what you have above. - A third one is for locally destined packets. I was not sure whether this should just be a flag which says neighbor = local or not. > What we do not support is sending orphan packets to userspace, and that > when an entry is added in the ILM or FTN there must be an exisiting > NHLFEid. I'm not saying that we need to match the exact words of the RFC, > but we can (and must) support downstream on demand. > sure. Let me know what you think of the above. > > him, if theres a > 20% improvement we have more strength);-> I wanna see > > MPLS in 2.6 soon and i think this is the fastest way to get there. > > Well, me too :) but I'd rather see it in 2.6 when it's ready. Most > probably you're right and it is.... > As you can see, we are fixing things; good it didnt go in right away. > > For example for something like the ILM, where lookup is based on a 12 > > bit label, then i would think making it anything more than a hash and > (...) > > make it 256 buckets and suddenly you are looking at 16 worst case. > > Where do you get this numbers ? :) > I thought the label was 20 bits, and > with 256 buckets (2**8) you have 2**12 = 4096 worst case. > even with 1024 buckets you get 1024 worst case. Never mid - too many things being computed in my brain. I was thinking of VLAN tags. > am I missing something? Are these values acceptable? In that case 4ill > shut up :) Well, have some student do a project ;-> Let them measure the perfomance differences under different scenarios with hash-and-walk vs radix tree or another funky lookup scheme for say many many entries.. With data we can challenge the current scheme. cheers, jamal |
From: Ramon C. <cas...@in...> - 2004-02-16 17:52:01
|
On 16 Feb 2004, Jamal Hadi Salim wrote: > On Sun, 2004-02-15 at 05:36, Ramon Casellas wrote: > Some comments: Ok. I'll do it asap. (nhlfeid ;) so when we prefix it there is just one underscore... agreed? :) > > What would be useful in this document as well is to describe the > interface between the kernel and user space. i.e describe the packets Agreed. I'll take care of this. > For how we document this typically look at: > http://www.faqs.org/rfcs/rfc3549.html ok. > Ok. So we may need some extra speacilized NHLFE entries. I am not a big > fan of the two step process unless you guys really insist - then we can > go and convince davem. Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set up process, and you usually need to define a 'prestate'. Another possibility is to consider RSVP as using the unsollicited downstream label distribution and only process the RSVP-RESV message from control space (when the message comes up from your downstream router), I am not sure about this though. > My opinion is lets have 3 new speacial NHLFEs: > - something that sends the packet to a blackhole which will work for > such a scenarion as above. A 'disabled' NHLFE. I think that this can be useful, for example for liberal retention mode. > - Another one will send the packet to user space via netlink. This may > also be used for resolving what you have above. So we can conform to the RFC (although sometimes it is just IETF jargon) But the question is 'which packet?' I assume that it is the first packet that according to the FIB_RES should be mapped to a NHLFEid that just does not exist. Don't we risk flooding userspace? Should it be only the first packet? what a bout a single netlink event (in plain english: hey, I don't know what to do with this FEC, can you do something about it?) > - A third one is for locally destined packets. I was not sure whether > this should just be a flag which says neighbor = local or not. IIRC, locally destined packets means that the LSR is egress (for all hierarchical levels) and pops the last packet. As one possibility, the default action should be just call IP module packet reception if we just popped the last label, so the packet is locally delivered or forwarded per dest address. Thanks, R. |
From: Jamal H. S. <ha...@zn...> - 2004-02-16 21:45:33
|
BTW, I am fine with whatever you guys end up picking for the code repository; you will have to teach me about its usage. p4 sounds good. Also James i know you are a big fan of UML - i am trying to see if it valubale - getting tired of hanging my laptop (though i run ext3 these days ;->); so if you can share your setup on a test environment i would appreaciate it. I looked at Qemu it does look very interesting; any thoughts on that? On Mon, 2004-02-16 at 12:47, Ramon Casellas wrote: > On 16 Feb 2004, Jamal Hadi Salim wrote: > > > > Ok. So we may need some extra speacilized NHLFE entries. I am not a big > > fan of the two step process unless you guys really insist - then we can > > go and convince davem. > > > Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set > up process, and you usually need to define a 'prestate'. Another > possibility is to consider RSVP as using the unsollicited downstream > label distribution and only process the RSVP-RESV message from control > space (when the message comes up from your downstream router), I am not > sure about this though. > Could something in user space be responsible for maintaining the prestate? When full state is available, it gets downloaded to the kernel. > > My opinion is lets have 3 new speacial NHLFEs: > > > - something that sends the packet to a blackhole which will work for > > such a scenarion as above. > > A 'disabled' NHLFE. I think that this can be useful, for example for > liberal retention mode. > Ok, so we could add something this: l2c mpls nhlfe add dev eth0 proto ipv4 nhlfeid 3 blackhole > > > - Another one will send the packet to user space via netlink. This may > > also be used for resolving what you have above. > > So we can conform to the RFC (although sometimes it is just IETF jargon) > But the question is 'which packet?' I assume that it is the first packet > that according to the FIB_RES should be mapped to a NHLFEid that just does > not exist. Don't we risk flooding userspace? Should it be only the first > packet? what a bout a single netlink event (in plain english: hey, I don't > know what to do with this FEC, can you do something about it?) Well, something along the same lines. Example: l2c mpls nhlfe add dev eth0 proto ipv4 nhlfeid 4 control-redirect The above could be a result of intentional policy such as preceeded by: l2c mpls ilm add dev eth0 label 9 nhlfeid 4 or as a result of it being the default NHLFE rule which gets consulted because bothing else was found, example: l2c mpls nhlfe add dev eth0 nhlfeid 4 default control-redirect Thoughts? > > - A third one is for locally destined packets. I was not sure whether > > this should just be a flag which says neighbor = local or not. > > IIRC, locally destined packets means that the LSR is egress (for all > hierarchical levels) and pops the last packet. As one possibility, the > default action should be just call IP module packet reception if we just > popped the last label, so the packet is locally delivered or forwarded per > dest address. If the last label has been popped then it would make sense to redirect to the stack. The one that i was worried about is it having a stack of labels hiding a local host IP packet. Can we assume that the user can shoot themselves in the feet and we wouldnt care? cheers, jamal |
From: James R. L. <jl...@mi...> - 2004-02-16 22:07:18
|
On Mon, Feb 16, 2004 at 04:41:37PM -0500, Jamal Hadi Salim wrote: > BTW, I am fine with whatever you guys end up picking for the > code repository; you will have to teach me about its usage. > p4 sounds good. > Also James i know you are a big fan of UML - i am trying to see if it > valubale - getting tired of hanging my laptop (though i run ext3 these > days ;->); so if you can share your setup on a test environment i would > appreaciate it. > I looked at Qemu it does look very interesting; any thoughts on that? I never tied Qemu, but your right it does look interesting. I'll put my UML environment on a ftp server some place for your to download. It consists of a couple of scripts I thew together and some rh8.0 files systems. I'll arrange that this evening. > On Mon, 2004-02-16 at 12:47, Ramon Casellas wrote: > > On 16 Feb 2004, Jamal Hadi Salim wrote: > > > > > > > > Ok. So we may need some extra speacilized NHLFE entries. I am not a big > > > fan of the two step process unless you guys really insist - then we can > > > go and convince davem. > > > > > > Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set > > up process, and you usually need to define a 'prestate'. Another > > possibility is to consider RSVP as using the unsollicited downstream > > label distribution and only process the RSVP-RESV message from control > > space (when the message comes up from your downstream router), I am not > > sure about this though. > > > > Could something in user space be responsible for maintaining the > prestate? When full state is available, it gets downloaded to the > kernel. > > > > My opinion is lets have 3 new speacial NHLFEs: > > > > > - something that sends the packet to a blackhole which will work for > > > such a scenarion as above. > > > > A 'disabled' NHLFE. I think that this can be useful, for example for > > liberal retention mode. > > > > Ok, so we could add something this: > l2c mpls nhlfe add dev eth0 proto ipv4 nhlfeid 3 blackhole > > > > > > - Another one will send the packet to user space via netlink. This may > > > also be used for resolving what you have above. > > > > So we can conform to the RFC (although sometimes it is just IETF jargon) > > But the question is 'which packet?' I assume that it is the first packet > > that according to the FIB_RES should be mapped to a NHLFEid that just does > > not exist. Don't we risk flooding userspace? Should it be only the first > > packet? what a bout a single netlink event (in plain english: hey, I don't > > know what to do with this FEC, can you do something about it?) > > Well, something along the same lines. Example: > > l2c mpls nhlfe add dev eth0 proto ipv4 nhlfeid 4 control-redirect > > The above could be a result of intentional policy such as preceeded by: > > l2c mpls ilm add dev eth0 label 9 nhlfeid 4 > > or as a result of it being the default NHLFE rule which gets consulted > because bothing else was found, example: > l2c mpls nhlfe add dev eth0 nhlfeid 4 default control-redirect > > Thoughts? > > > > - A third one is for locally destined packets. I was not sure whether > > > this should just be a flag which says neighbor = local or not. > > > > IIRC, locally destined packets means that the LSR is egress (for all > > hierarchical levels) and pops the last packet. As one possibility, the > > default action should be just call IP module packet reception if we just > > popped the last label, so the packet is locally delivered or forwarded per > > dest address. > > If the last label has been popped then it would make sense to redirect > to the stack. The one that i was worried about is it having a stack of > labels hiding a local host IP packet. Can we assume that the user can > shoot themselves in the feet and we wouldnt care? > > cheers, > jamal > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: James R. L. <jl...@mi...> - 2004-02-19 04:26:21
|
Comments in line. On Mon, Feb 16, 2004 at 06:47:27PM +0100, Ramon Casellas wrote: > On 16 Feb 2004, Jamal Hadi Salim wrote: > > > On Sun, 2004-02-15 at 05:36, Ramon Casellas wrote: > > > > Some comments: > > Ok. I'll do it asap. > > (nhlfeid ;) so when we prefix it there is just one underscore... agreed? > :) > > > > > > > > What would be useful in this document as well is to describe the > > interface between the kernel and user space. i.e describe the packets > > Agreed. I'll take care of this. > > > > > > For how we document this typically look at: > > http://www.faqs.org/rfcs/rfc3549.html :-) Nice example :-) > > ok. > > > > > > > > Ok. So we may need some extra speacilized NHLFE entries. I am not a big > > fan of the two step process unless you guys really insist - then we can > > go and convince davem. > > > Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set > up process, and you usually need to define a 'prestate'. Another > possibility is to consider RSVP as using the unsollicited downstream > label distribution and only process the RSVP-RESV message from control > space (when the message comes up from your downstream router), I am not > sure about this though. The state that is being stores is only in the control plane, not the forwarding plan. The real reason you want to be able to modify existing entries of for the fail-over cases. This is also a reason why a clean layer of indirection is required. Imaging 1000's of VC or VPN labels associated with one tunnel label. Now imagine that tunnel label changing (fast re-route, primary/backup tunnel, etc). In our implementation VC and VPN are out-label which have have a FWD instruction which all point to the same out-label. The out-label contains a PUSH instructions. By changing just one PUSH instruction you in essence fail over to another tunnel label. > > > > My opinion is lets have 3 new speacial NHLFEs: > > > - something that sends the packet to a blackhole which will work for > > such a scenarion as above. > > A 'disabled' NHLFE. I think that this can be useful, for example for > liberal retention mode. Not needed. Just because the signaling protocol is holding label state does not mean it must be installed in the forwardin plane. Only active segments and cross connects should be installed. > > - Another one will send the packet to user space via netlink. This may > > also be used for resolving what you have above. > > So we can conform to the RFC (although sometimes it is just IETF jargon) > But the question is 'which packet?' I assume that it is the first packet > that according to the FIB_RES should be mapped to a NHLFEid that just does > not exist. Don't we risk flooding userspace? Should it be only the first > packet? what a bout a single netlink event (in plain english: hey, I don't > know what to do with this FEC, can you do something about it?) Why would you want to do this? Are you trying to enable flow based label allocation? Eveyone has decided this is a bad idea (example NHRP). I could see needing to support MPLS sockets, where the sock addr is a in or out segment (or both) and all packets rx'd on the in segment goto the socket or all data written to the socket get tx'd on the out-segment. > > - A third one is for locally destined packets. I was not sure whether > > this should just be a flag which says neighbor = local or not. The correct way it to utilize the same instruction for pop/lookup and pop/rx locally. That way tunnel in segments do not need to change when VC or VPN labels are associated with them. Plus it is not always a clear case of always being stacked or not. > IIRC, locally destined packets means that the LSR is egress (for all > hierarchical levels) and pops the last packet. As one possibility, the > default action should be just call IP module packet reception if we just > popped the last label, so the packet is locally delivered or forwarded per > dest address. The lowest level label cannot dictate that (except for router alert, but in that case the stack above the RA is sent up as data). If the lowest level label say pop, you MUST pop and lookup the next level. The only time you can pop-all is in the error cases (and that is even questionable). > > Thanks, > > R. > > > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > mpls-linux-devel mailing list > mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mpls-linux-devel -- James R. Leu jl...@mi... |
From: Jamal H. S. <ha...@zn...> - 2004-02-19 14:23:01
|
On Wed, 2004-02-18 at 23:14, James R. Leu wrote: > > > For how we document this typically look at: > > > http://www.faqs.org/rfcs/rfc3549.html > > :-) Nice example :-) ;-> We could have written a better draft; maybe a revise of that to include the MPLs messages. > > Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set > > up process, and you usually need to define a 'prestate'. Another > > possibility is to consider RSVP as using the unsollicited downstream > > label distribution and only process the RSVP-RESV message from control > > space (when the message comes up from your downstream router), I am not > > sure about this though. > > The state that is being stores is only in the control plane, not the > forwarding plan. The real reason you want to be able to modify > existing entries of for the fail-over cases. This is also a reason why > a clean layer of indirection is required. Imaging 1000's of VC or VPN > labels associated with one tunnel label. Now imagine that tunnel label > changing (fast re-route, primary/backup tunnel, etc). In our > implementation VC and VPN are out-label which have have a FWD instruction > which all point to the same out-label. The out-label contains a PUSH > instructions. By changing just one PUSH instruction you in essence fail over > to another tunnel label. > But how much execution advantage would you really gain by only changing one piece at a time? The most expensive thing in updating that table would be crossing from user space to kernel. i.e it doesnt matter how much data you are sending. Am i off? > > > > > - something that sends the packet to a blackhole which will work for > > > such a scenarion as above. > > > > A 'disabled' NHLFE. I think that this can be useful, for example for > > liberal retention mode. > > Not needed. Just because the signaling protocol is holding label > state does not mean it must be installed in the forwardin plane. Only > active segments and cross connects should be installed. Explain the cross-connect part. Is this related to the indirection you are refering to? Leaving label retention for a second: Is the idea of a blackhole neighbor useful? > > > - Another one will send the packet to user space via netlink. This may > > > also be used for resolving what you have above. > > > > So we can conform to the RFC (although sometimes it is just IETF jargon) > > But the question is 'which packet?' I assume that it is the first packet > > that according to the FIB_RES should be mapped to a NHLFEid that just does > > not exist. Don't we risk flooding userspace? Should it be only the first > > packet? what a bout a single netlink event (in plain english: hey, I don't > > know what to do with this FEC, can you do something about it?) > > Why would you want to do this? Are you trying to enable flow based > label allocation? Eveyone has decided this is a bad idea (example NHRP). > I could see needing to support MPLS sockets, where the sock addr is a in > or out segment (or both) and all packets rx'd on the in segment > goto the socket or all data written to the socket get tx'd on the > out-segment. I think ability to program this is valuable. One good reason could be for debugging or handling exceptions. Of course such a feature could be (ab)used like you say for flow based label allocation (in which case - bless those who want to use a misfeature). > > > - A third one is for locally destined packets. I was not sure whether > > > this should just be a flag which says neighbor = local or not. > > The correct way it to utilize the same instruction for pop/lookup > and pop/rx locally. That way tunnel in segments do not need to > change when VC or VPN labels are associated with them. Plus it is > not always a clear case of always being stacked or not. so a pop/rx locally would be equivalent to remove all labels if theres more than one, correct? > > IIRC, locally destined packets means that the LSR is egress (for all > > hierarchical levels) and pops the last packet. As one possibility, the > > default action should be just call IP module packet reception if we just > > popped the last label, so the packet is locally delivered or forwarded per > > dest address. > > The lowest level label cannot dictate that (except for router alert, but > in that case the stack above the RA is sent up as data). If the lowest > level label say pop, you MUST pop and lookup the next level. The only time > you can pop-all is in the error cases (and that is even questionable). So what you are saying is let whoever programmed the instructions shoot themselves. i.e they could have specified pop, rx-local, am i correct? BTW, you mention RAs above - which would be considered exceptions. I think this is an example of a packet that could be sent via netlink as well. Note with distributed control where the control plane may be one ethernet hop away, this is useful (wrap the RA into a netlink packet and shve it onto the control board - at least thats what netlink2 is preaching) cheers, jamal |
From: James R. L. <jl...@mi...> - 2004-02-19 16:17:24
|
Comments in line On Thu, Feb 19, 2004 at 09:17:26AM -0500, Jamal Hadi Salim wrote: > On Wed, 2004-02-18 at 23:14, James R. Leu wrote: > > > > > For how we document this typically look at: > > > > http://www.faqs.org/rfcs/rfc3549.html > > > > :-) Nice example :-) > > ;-> We could have written a better draft; maybe a revise of that to > include the MPLs messages. > > > > Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set > > > up process, and you usually need to define a 'prestate'. Another > > > possibility is to consider RSVP as using the unsollicited downstream > > > label distribution and only process the RSVP-RESV message from control > > > space (when the message comes up from your downstream router), I am not > > > sure about this though. > > > > The state that is being stores is only in the control plane, not the > > forwarding plan. The real reason you want to be able to modify > > existing entries of for the fail-over cases. This is also a reason why > > a clean layer of indirection is required. Imaging 1000's of VC or VPN > > labels associated with one tunnel label. Now imagine that tunnel label > > changing (fast re-route, primary/backup tunnel, etc). In our > > implementation VC and VPN are out-label which have have a FWD instruction > > which all point to the same out-label. The out-label contains a PUSH > > instructions. By changing just one PUSH instruction you in essence fail over > > to another tunnel label. > > > > But how much execution advantage would you really gain by only changing > one piece at a time? > The most expensive thing in updating that table would be > crossing from user space to kernel. i.e it doesnt matter how much > data you are sending. Am i off? I think you missed the point. A single instruction change would fail the 1000's of VC or VPN labels over to the new tunnel. Think how you handle a BPG next hop change when 100K routes are using that same BGP next hop. > > > > > > > - something that sends the packet to a blackhole which will work for > > > > such a scenarion as above. > > > > > > A 'disabled' NHLFE. I think that this can be useful, for example for > > > liberal retention mode. > > > > Not needed. Just because the signaling protocol is holding label > > state does not mean it must be installed in the forwardin plane. Only > > active segments and cross connects should be installed. > > Explain the cross-connect part. Is this related to the indirection you > are refering to? > Leaving label retention for a second: Is the idea of a blackhole > neighbor useful? Signaling protocols running on an LSR needs to keep track of how in-segments and out-segments are related. The cross connect is the term used to refer to that relationship (I'm using terms from the LSR MIB). Protocols that run in DoD ordered control will not issue an in-segment until it has recieved an out-segment or has determined it is the egress of the LSP. At the time the in-segment is issued the forwarding plan is installed and the cross connect is made. So no I do not think a blackhole is needed, but having it can't hurt. > > > > > - Another one will send the packet to user space via netlink. This may > > > > also be used for resolving what you have above. > > > > > > So we can conform to the RFC (although sometimes it is just IETF jargon) > > > But the question is 'which packet?' I assume that it is the first packet > > > that according to the FIB_RES should be mapped to a NHLFEid that just does > > > not exist. Don't we risk flooding userspace? Should it be only the first > > > packet? what a bout a single netlink event (in plain english: hey, I don't > > > know what to do with this FEC, can you do something about it?) > > > > Why would you want to do this? Are you trying to enable flow based > > label allocation? Eveyone has decided this is a bad idea (example NHRP). > > I could see needing to support MPLS sockets, where the sock addr is a in > > or out segment (or both) and all packets rx'd on the in segment > > goto the socket or all data written to the socket get tx'd on the > > out-segment. > > I think ability to program this is valuable. > One good reason could be for debugging or handling exceptions. Of course > such a feature could be (ab)used like you say for flow based label > allocation (in which case - bless those who want to use a misfeature). I think the best way to handle this and RA is via MPLS sockets. How does IPv4 handle RA? Userland has to create a socket which registers for it. I think the MPLS RA should be handled the same. How do other L2ish protocols handle the passing of PDUs to userland. The only example I can think of is ATM. It uses sockets to accomplish this. I have nothing against using netlink, but I just think we should use mechanisms that people are use to. > > > > - A third one is for locally destined packets. I was not sure whether > > > > this should just be a flag which says neighbor = local or not. > > > > The correct way it to utilize the same instruction for pop/lookup > > and pop/rx locally. That way tunnel in segments do not need to > > change when VC or VPN labels are associated with them. Plus it is > > not always a clear case of always being stacked or not. > > so a pop/rx locally would be equivalent to remove all labels if theres > more than one, correct? You can only pop/rx locally for the label with the BOS. All others must be pop lookup. (ofcoures the lookup could say swap, or RA in which case we are no longer in the 'pop/lookup' loop) > > > IIRC, locally destined packets means that the LSR is egress (for all > > > hierarchical levels) and pops the last packet. As one possibility, the > > > default action should be just call IP module packet reception if we just > > > popped the last label, so the packet is locally delivered or forwarded per > > > dest address. > > > > The lowest level label cannot dictate that (except for router alert, but > > in that case the stack above the RA is sent up as data). If the lowest > > level label say pop, you MUST pop and lookup the next level. The only time > > you can pop-all is in the error cases (and that is even questionable). > > So what you are saying is let whoever programmed the instructions shoot > themselves. i.e they could have specified pop, rx-local, am i correct? What I'm saying is that the meaning of 'pop' is derived from which position in the stack it is being applied to. > > BTW, you mention RAs above - which would be considered exceptions. I > think this is an example of a packet that could be sent via netlink as > well. > Note with distributed control where the control plane may be one > ethernet hop away, this is useful (wrap the RA into a netlink packet and > shve it onto the control board - at least thats what netlink2 is > preaching) See my comments above. > > cheers, > jamal > In general I have the feeling something isn't clicking. Am I explaining these issue well or should backup and approach each one in depth? -- James R. Leu jl...@mi... |