Re: [mpls-linux-devel] Current state of dst stacking on davem implementation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Jamal,

Before we can talk about LSP hierarchy we need to discuss the
single array of instructions on the NHLFE versus an array of
instructions on the ILM and the NHLFE.

Their is one main reason for the 'dual' model, NHLFE re-use.
NHLFE re-use comes into play for an LSP which an LSR is ingress
and transit, and for LSP hierarchy.

In the case where a LSR is ingress and transit for an LSP, a single set
of instructions on the NHLFE results in needing to create two NHLFE,
one with an ADD instruction, which is used by the ingress code path, and
one which contains a XCHANGE instruction for the transit code path.  Where
as with a 'dual' model, the instructions on the ILM contain POP and the
instructions on the NHLFE contain a PUSH.  The ingress code path only hits
the PUSH in the NHLFE and the transit code path hits the POP on the ILM
and the PUSH on the NHLFE.

The reasoning for the 'dual' model with respect to LSP hierarchy is similar,
but pertains to NHLFE which are 'stacked'.  (an LSR which adds hierarchy
really is just a special case of an ingress LER/LSR, remember that hierarchy
can be added at any point in the MPLS domain, not just the edge) The NHLFE
for the hierarchy label (an example of a hierarchy label is a VPN label)
contains a PUSH and FWD to another NHLFE, which contains a PUSH for the
tunnel label.  With this separation the tunnel NHLFE can be used for a
transit LSP as well.  This separation also lends itself well to fast
fail over between primary and backup LSPs.  Image 1000s of VPN NHLFEs
pointing to the same tunnel NHLFE, if the tunnel NHLFE needs to be
changed (to fail over to a backup tunnel) we can change just one NHLFE
and all of the VPN NHLFEs start using the backup tunnel.

That is enough for now.  I'm not ignoring the rest of your email, but
we can address the other issues/questions/points after we talk about
this one.

-- 
James R. Leu
jl...@mi...

On Tue, Mar 23, 2004 at 10:15:06PM -0500, Jamal Hadi Salim wrote:
> Hi James,
> 
> You will have to forgive my delayed responses; seems like time is not
> my best friend right now.
> 
> On Mon, 2004-03-22 at 12:29, James R. Leu wrote:
> > See comments in line.
> > 
> 
> [..] 
> > > To be fair, all the above are resolvable issues. Some are even
> > > mentioned in the TODO list.
> > 
> > Some are quite trivial, but some require changes to the architecture.
> 
> Ok.
> 
> > > > -no support for LSP hierarchy (ingress or egress)
> > > >
> > > > The "no support for LSP hierarchy" is only one element in the list, but
> > > > it is a fairly large issue.
> > > 
> > > I didnt understand this one. Did we discuss this?
> > 
> > I mentioned in a previous email that evaluating the current architecture
> > without considering LSP hierarchy was a bit foolish.
> 
> So this seems to be the big one. I am trying to grasp me it myself so if
> you can explain it we can take it from there.
> 
> 
> > > Maybe we can do parallel approach and have
> > > both coming in together?
> > 
> > I reality the implementations are already converging because I'm flexible
> > with my implementation and I'm willing to learn from others and have thus
> > implemented what I see as the positive points of the DaveM code.
> 
> This is good.
> 
> > The remaining areas of difference are:
> > 
> > -instructions - I have 'ILM' and 'NHLFE' instructions, the DaveM code only
> >  has 'NHLFE' instructions.  Separating 'ILM' and 'NHLFE' processing in my
> >  mind is one of the key requirements for supporting scalable LSP hierarchy.
> >  If the 'single instruction' model is desirable for some configurations, it
> >  can be accomplished with my 'dual' instructions model, by making all of the
> >  'ILMs' have one instruction, which FWDs it to a NHLFE for processing.
> 
> Isnt the separation as it is right now in the Davem code ok?
> 
> > -storage - I use a generated key for storing 'NHLFE' info, the DaveM code
> >  uses the label value itself (see above list of issues as to why this
> >  it bad).  I store the 'ILM' and 'NHLFE' info in a radix tree, the DaveM
> >  code uses a hash table.
> 
> Radix tree, i see no issues changing to. Its probably a lesser concern
> right away.
> I think we discussed the nhlfe key a while back; can you elaborate more?
> 
> 
> > -user input - my implementation does not have a netlink interface (yet),
> >  I use IOCTLs().  The DaveM code uses netlink.
> 
> This is my doamin. Netlink is definetely the way to go. I am working on
> some distributed stuff and i really dont see anything else going in.
> 
> > -labelspace - I have thought out the issues with respect to labelspace and
> >  and have implemented a 'best of both worlds' scheme which allows the users
> >  application to implement whatever type of labelspace management they choose.
> >  The DaveM code only has labelspace 0.
> 
> That can be fixed; on a slightly different topic, theres some huge push
> to virtualize the linux stack (actually all of linux, but i care about
> the stack). I know you have some interest in VRs; i will ping you on
> this.
>  
> > -labeling types - I have support for all labeling types in my code (ATM, FR,
> >  and generic) and support for multiple interface type (ethernet, PPP, GRE).
> >  At one point in time I even had direct support for ATM interfaces (has
> >  since been removed because the ATM stack for linux is not designed for
> >  routers or switches).  The DaveM code only supports ethernet.
> 
> Yes, this is also in the TODO.
> 
> > -availability of kernel information - My implementation has implemented
> >  crude yet effective PROCFS and SYSFS interfaces, the DaveM code has neither.
> 
> I puke on both of them. Actually i was supposed to add procfs to it but
> didnt see the need for it. I think having them is gravy, but not a huge
> value differentiator. Maintaining large tables using sysfs is a
> guarnateed disaster.
> I could see use for sysfs for something like "turn on the debug flag for
> nhlfe"..
> 
> > Basically, if you rip out support for LSP hierarchy, PROCFS, SYSFS, and
> > a couple other features and add netlink support and convert from radix
> > to a hash table, you could convert my implementation into the DaveM
> > implementation.
> 
> Lets do that.
> The way i see it at this point is as follows: 
> We need to have a convergence between your code and Daves. DaveM will
> take care of making sure things move smoothly from one kernel to a new
> one (eg 2.6->2.7->2.8) etc him being the maintainer of the net code. You
> will be the other guy that owns this code. You know about MPLS more than
> Dave does - the architecture convergence is the marriage of the two
> styles (yours and Daves). Both have to be comfortable with the changes.
> There should really be a convergence otherwise this is going to become
> another freeswan divergence.  My role here is not a lot more than to
> shepherd and make sure that the code makes it in. My interest is really
> in the netlink code. I can make calls on any of the netlink stuff.
> 
> 
> > I think we should identify the desirable features of the DaveM code and
> > my code and decide whether it is easier to propagate the changes from mine
> > to DaveM or vice versa.  That is just my $.02.
> >
> > > Thoughts?
> > 
> > If we are still dead set on enhancing the DaveM code, lets decide
> > what is the first area to focus on and fix it.  I think the most
> > important area to look at is support for LSP hierarchy.
> 
> Ok, lets discuss the LSP hierachy thing. Note i am not dead set on
> anything. I wanna make sure that we do whats best for linux. I think
> that in the interest of a fast merge, we need to take the best of both
> and fuse into one. As an example the dst stacking thing from your code
> was great change. Radix tree we can make - although i would think that
> would be a lower priority compared to say the hierachy issue. Lets
> discuss the changes, make the code changes, stick your name in the
> authorship and lets get the code merged.
> 
> cheers,
> jamal