Re: [mpls-linux-devel] More comments on Jamal's spec

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Sat, 2004-02-14 at 17:04, Ramon Casellas wrote:

> 
> Although I think that we are indeed on the right path, IMHO, for the
> moment, your implementation is lagging functionnality w.r.t. James' one
> (available userspace apps, diffserv mapping, tunnels, procfs, sysfs,etc.

User space app for static management is already there - thats what "l2c
mpls" is. Maybe i didnt make myself clear before, diffserv and more is
there they are just not locked into mpls; they can be associated but are
independent apps. Tunels whatever James has can be merged. Procfs or
sysfs i care less about and wouldnt loose sleep if they didnt exists -
we use netlink which should cover most of what these things try to do.
If someone wants to add that go ahead.
I apologize i never got around to looking at James code (and weekend is
mostly for a pregnant woman who occasionaly looks away and i sneak to
check mail), but theres a lot of stuff that could be merged in (i
noticed PPP, ATM, FR for example).
IIRC correclty from those old days, there exists some form of LDP
implementation. That could be part of the userspace tools or safer
separate.

> although some are not strictly required), although I admit that there are
> still serious issues with James' impl (locking and SMP safeness are the
> most notorious ones), and that it is just a matter of time and work.
> 
> Given the fact that DaveM explicitely supports yours, it seems clear to me

As i said before this is NOT my implementation. I tried to document and
sanitize what it does - mostly so we can  have a useful discussion. My
piece is user space to kernel. If you look at the code you will see my
name appearing in only about two files or so.
My preference is to use this implemantation and to have as little fight
with Dave as possible and only make changes to the base when we see it
appropriate (if theres a 5% improvement, we could think about talking to
him, if theres a > 20% improvement we have more strength);-> I wanna see
MPLS in 2.6 soon and i think this is the fastest way to get there.

> that we should focus on it (James? yours is the last word... I have
> been working on yours for only several months, you have spent the last
> five years), and avoid any other fork. So the question is "what can we do
> now?". When I started working on James implementation, I appreciated being
> almost immediately given write access, so I could do some documenting
> tasks while I was understanding the inner works..., and I was trusted. I
> understand that you may see things differently. What is your position on
> this? Neither James nor I have (for the moment?) access to the patch/CVS
> repository, l2c userspace application.... Somehow I feel hand tied :) 

You have access to the patches. What more do you want?

> I
> have spare time and I'm afraid that you may want a centralized approach,
> which may have some inconvenients (although you have all right to).

OK, Set up a CVS repository. I am old fashioned and dont use it very
much. I still refuse to use bitkeeper. The easiest thing for me is
people send me patches and i merge them. Again i dont care if its CVS.
Maybe we can try something more exciting like that competition to
bitkeeper;->

> In other words, if you were to be this project manager ;). How would you define
> the tasks so everyone may contribute to the project, see the others recognize
> his work, etc? Personally, I am interested and I would like to play a nice part
> on this. What is missing?
> 

I think we need to discuss then someone codes or merges. 
For example, we need to settle on the multihop; i think the idea i
suggested is the way to go for ILM - i dont care who codes; i could.
Also we need to settle the dst issue and that may result in coding.
Like i pointed out i think the ATM, PPP and FR features are missing.
Someone adventorous could get some LDP code ported over or document the
API so LDP porters could run over it.
Look at the tod list and maybe add more to it - and lest start there.

> 
> QUESTIONS
> ########################################################
> 
> question: Are there performance studies regarding radix trees w.r.t Hash
> buckets and linked lists? If the number of labels is large, isn't the O(N) walk
> op going to slow things down?  How many labels are managed in average? for
> example, if we assume 100000 BGP prefixes and (why not) a label per prefix,
> with hash&walk (1024 hash buckets) it makes 100 entries (average) per bucket vs
> approx log2(N) with binary trees?

I am indifferent and frankly dont care how it is done. If someone needs
to change code like that (which is Daves) just come with some
justification. I will support it if it looks valuable.
Like i said hit that 20% threshold.

> what about other advanced ADT, like Hash buckets and radix trees or similar?
> Thoughts?

For example for something like the ILM, where lookup is based on a 12
bit label, then i would think making it anything more than a hash and
walk is overkill. If you can put 64 hash buckets thats already taking
off 6 bits; which means worst case you will walk is 64. make it 256
buckets and suddenly you are looking at 16 worst case.
So evaluate for each table what needs to be done then make a call.

> question: regarding dst management. Maybe Alexey could enlighten us. It may be
> interesting to know his point of view about adding a specific mpls ptr to the
> generic dst struct, or he may even propose alternate solutions...

I think dst is the way to go; whether we end up using a child or a ptr
is something we need to settle first. James hasnt repsonded to my last
email. I am also confident Davem knows this space well. Alexey we can
use at the end so he can spit at the code.

> 
> COMMENTS ON USERSPACE APP
> ########################################################
> 
> Jamal's proposal:
> 
> l2c mpls nhlfe <cmd> dev <devname>
> index <val> proto <ipv4|ipv6> nh <neighbor>
> <operation set> fec <FECid>
> operation set := (op <operation>)*
> * cmd is one of:  <add | del | replace | get>
> 
> 
> 20040214-RCAS- We should work on both grammars. I understand that they are a
> work in progress, but they are imprecise and inconvenient. for example the
> "del" operation should not require the user to give the  neighbour. OTOH,  I
> think we don't need replace (simple remove and add)

You can leave out the other parts on del and it would work.
That idea is consistent with tc structure. 
Replace is an atomic del/add. A lot of table management has it.
Imagine many applications trying to manage the same table.

> * index could be used to store the LSPid
> 20040214-RCAS- (I don't understand this. ???)

NHLFE_id aka fecid is local i.e not spreadable over LDP for example. 
A management application such as a dynamic daemon which has a bigger
view of the world may wish to identify further by LSPid - hence the
existence of "index". If it doesnt make sense we could remove it; right
now i see no harm in it. If you dont specify it, it gets zeroed.

> * FECid is the FEC identifier to be used as the key for searching.
> 
> 20040214-RCAS- nhlfe_id :)

yep ;->

> 
> Well, IMHO, I think the grammar can be improved. All the opcodes
> need to be defined, with their arguments (get them from James
> Implementation. They are quite complete and comprehensive)

Do you see anything in the base set other than push and pop?
Everything else is a combination of these.
I am trying to remember what James did - i think he had opcodes like
multi-push (in the above case you just specify as many pushes as you
want). Actually i am guilty of influencing this piece in Daves code.
I was influenced by what i saw from the ASICs i looked at and the
two implementations. 
Lets discuss.

> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> SUGGESTION What about this ?
> 
> l2c mpls nhlfe COMMAND <nhlfe_id>
> 
> COMMAND      := [add | del | get | SETCOMMAND]
> SETCOMMAND   := set proto <ipv4|ipv6> nh <neighbour> "OPERATIONSET"
> OPERATIONSET := [swap SWAPARGS | pop | dlv | mapexp....],+
> SWAPARGS     := labelvalue[:labelspace]..
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Examples
> 
> # l2c mpls nhlfe add <nhlfe_id>
> Add an empty entry (default, drop). Error if exists.
> 
> # l2c mpls nhlfe del <nhlfe_id>
> Remove the given entry. Ignore if not exists.
> 
> # l2c mpls nhlfe get <nhlfe_id>
> Dump entry. Ignore if not exists.
> 
> # l2c mpls nhlfe set proto ipv4 nh 10.0.0.2 "swap 20,push 50" <nhlfe_id>
> 

Is there a reason to make it two separate updates?
Is the command too long maybe? 

> 
> 
> ##############################################################
> The ip tool should allow you specify route you want then
> specify the FECid for that route, i.e:
> ip route ... FECid <FECid>
> where FECid is the NHLFE keyid we want to use
> Example:
> ip route add 10.0.0.21/32 via 10.0.0.9 dev eth0 fecid 1
> 
> 20040214-RCAS all occurrences of FECId should be changed to nhlfe_id.
> :)
> 

Edit the doc and send an update ;->

> ##############################################################################
> JAMAL:
> l2c mpls ilm <cmd> dev <devname>
> index <val> label fec <FECid>
> 
> 
> RCAS : This should be <label> otherwise it looks like a keyword.

thats a typo; should be:
index <val> label <labelvalue> nhlfe_id <nhval>

> * cmd is one of:  <add | del | replace | get>
> 
> RCAS: I think we don't need replace. Let the user del and add. Too
> many commands are cumbersome.

Like i said replace is there for atomicity of the two operations.
All database operations typically have the above four commands. Look at
this as a table that will be manipulated by many users concurently.

> RCAS: Let's work on the grammar. The user should only need to give the
> incoming label to remove, not the nhlfe_id that it points to.

?? The nhlfe_id must exist before the entry is allowed.
Look at the architecture of the tables in the doc. All roads lead to the
NHLFE table.

> * devname is the input device to be used
> RCAS: right, but we need more flexibility.
> RCAS: one option would be to use wilcards, e.g.
> RCAS: l2c mpls ilm add "ethO:15"
> RCAS: l2c mpls ilm add "*:15"
> RCAS: but, I do think that the labelspace approach in james impl.
> RCAS: is better. Let the user set a labelspace as a netdevice
> RCAS: attribute and let the user define ILM entries as
> RCAS: labelspace+value.

The labelsapce issue is still open. I can see its value in the L2VPN
where an additional VPNid comes in with the labelspace. I am really
struggling trying to see its value here. James and I had a small
discussion we need to revive that. If you look at the code you will see,
at the moment the label space is zero always.
If there is something clear in the incoming packet that can be used to
map to a device, then using labelspace becomes valuable.

> 
> * Index is an additional identifier that could be used to
> store LSP info.
> 
> RCAS : What is val?
> RCAS: I still don't understand this. Could you please give examples?

Same idea as in the NHLFE. If it doesnt prove valuable we could remove
it.

> * FECid is the FECid to be used for searching the NHLFE.
> 
> RCAS:  nhlfe_id is the NHLFE id to use.
> 

yep.

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> SUGGESTION What about this ?
> 
> l2c mpls ilm COMMAND <ls:label>
> 
> COMMAND      := [add | del | get | BINDCOMMAND ]
> BINDCOMMAND   := bind <nhlfe_id> to
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> well, or something like that
> 

I have issues with the labelspace as i described above.

> 
> ######################################################################
> 
> 
> 3.0 Allowed OPCODEs
> 
> 20040214-RCAS: We need more advanced (DiffServ, etc.) opcodes. We can leverage
> James implementation for this.
> 
> 
> 
> 
> 3.1 Modifiying opcodes
> 
> - REDIRECT: redirect a packet to a different LSP
> (useful for testing or redirecting to a control plane)
> 
> 20040214-RCAS: this can be useful. Nice.
> 
> 
> 
> - MIRROR: send a copy of a packet somewhere else for further
> processing (useful for LSP pings, traceroute, debug etc)
> 
> 20040214-RCAS: Idem.
> 
> 
> 3.2 Label action opcodes
> 
> 
> 20040214-RCAS: Why do we need to introduce two concepts "Label action opcodes"
> and "Atomic operations"?  aren't all "Atomic operations" "label actions" and
> viceversa???

The way i saw it (or was influenced to think of it is as follows):
- there are three basic operations (just like there basic types in
C prgramming eg integer, char, short). Then you can build complex
compositions from the rest of them (just like you can build data
structures in C froim teh atomic data types)
> 
> The atomic operations are:"
> 
> - POP
> - PUSH
> - REPLACE
> 
> 20040214-RCAS: Whats wrong with the standard name "SWAP"???

sure, swap it is.

> 
> Note:
> a stack of consisting atomic operations can be implemented; example:
> a pop followed by several pushes.
> 
> 20040214-RCAS: Ein??? Be more specific.
> 

The analogy of atomic data types and structures i described above
applies.

> 
> 
> Well, I have some urgent boring things to do,  more comments to follow,

And i have someone who is looking for me right now - i took too long to
go to the washroom ;->

cheers,
jamal