Re: [mpls-linux-devel] More comments on Jamal's spec

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Comments in line

On Thu, Feb 19, 2004 at 09:17:26AM -0500, Jamal Hadi Salim wrote:
> On Wed, 2004-02-18 at 23:14, James R. Leu wrote:
>  
> > > > For how we document this typically look at:
> > > > http://www.faqs.org/rfcs/rfc3549.html
> > 
> > :-)  Nice example :-)
> 
> ;-> We could have written a better draft; maybe a revise of that to
> include the MPLs messages.
> 
> > > Well, the problem with CR-LDP and/or RSVP is that it is a 'ping-pong' set
> > > up process, and you usually need to define a 'prestate'. Another
> > > possibility is to consider RSVP as using the unsollicited downstream
> > > label distribution and only process the RSVP-RESV message from control
> > > space (when the message comes up from your downstream router), I am not
> > > sure about this though.
> > 
> > The state that is being stores is only in the control plane, not the
> > forwarding plan.  The real reason you want to be able to modify
> > existing entries of for the fail-over cases.  This is also a reason why
> > a clean layer of indirection is required.  Imaging 1000's of VC or VPN
> > labels associated with one tunnel label. Now imagine that tunnel label
> > changing (fast re-route, primary/backup tunnel, etc).  In our
> > implementation VC and VPN are out-label which have have a FWD instruction
> > which all point to the same out-label. The out-label contains a PUSH
> > instructions.  By changing just one PUSH instruction you in essence fail over
> > to another tunnel label.
> > 
> 
> But how much execution advantage would you really gain by only changing
> one piece at a time?
> The most expensive thing in updating that table would be
> crossing from user space to kernel. i.e it doesnt matter how much
> data you are sending. Am i off?

I think you missed the point.  A single instruction change would fail the
1000's of VC or VPN labels over to the new tunnel.  Think how you handle a
BPG next hop change when 100K routes are using that same BGP next hop.

> > > 
> > > > - something that sends the packet to a blackhole which will work for
> > > > such a scenarion as above.
> > > 
> > > A 'disabled' NHLFE. I think that this can be useful, for example for
> > > liberal retention mode.
> > 
> > Not needed.  Just because the signaling protocol is holding label
> > state does not mean it must be installed in the forwardin plane.  Only
> > active segments and cross connects should be installed.
> 
> Explain the cross-connect part. Is this related to the indirection you
> are refering to?
> Leaving label retention for a second: Is the idea of a blackhole
> neighbor useful?

Signaling protocols running on an LSR needs to keep track of how
in-segments and out-segments are related.  The cross connect is the term
used to refer to that relationship (I'm using terms from the LSR MIB).
Protocols that run in DoD ordered control will not issue an in-segment
until it has recieved an out-segment or has determined it is the egress of
the LSP.  At the time the in-segment is issued the forwarding plan is
installed and the cross connect is made.  So no I do not think a blackhole
is needed, but having it can't hurt.

> 
> > > > - Another one will send the packet to user space via netlink. This may
> > > > also be used for resolving what you have above.
> > > 
> > > So we can conform to the RFC (although sometimes it is just IETF jargon)
> > > But the question is 'which packet?' I assume that it is the first packet
> > > that according to the FIB_RES should be mapped to a NHLFEid that just does
> > > not exist. Don't we risk flooding userspace? Should it be only the first
> > > packet? what a bout a single netlink event (in plain english: hey, I don't
> > > know what to do with this FEC, can you do something about it?)
> > 
> > Why would you want to do this?  Are you trying to enable flow based
> > label allocation?  Eveyone has decided this is a bad idea (example NHRP).
> > I could see needing to support MPLS sockets, where the sock addr is a in
> > or out segment (or both) and all packets rx'd on the in segment
> > goto the socket or all data written to the socket get tx'd on the
> > out-segment.
> 
> I think ability to program this is valuable.
> One good reason could be for debugging or handling exceptions. Of course
> such a feature could be (ab)used like you say for flow based label
> allocation (in which case - bless those who want to use a misfeature).

I think the best way to handle this and RA is via MPLS sockets.  How does
IPv4 handle RA?  Userland has to create a socket which registers for it.
I think the MPLS RA should be handled the same.  How do other L2ish protocols
handle the passing of PDUs to userland.  The only example I can think of is
ATM.  It uses sockets to accomplish this.  I have nothing against using
netlink, but I just think we should use mechanisms that people are use to.

> > > > - A third one is for locally destined packets. I was not sure whether
> > > > this should just be a flag which says neighbor = local or not.
> > 
> > The correct way it to utilize the same instruction for pop/lookup
> > and pop/rx locally.  That way tunnel in segments do not need to
> > change when VC or VPN labels are associated with them.  Plus it is
> > not always a clear case of always being stacked or not.
> 
> so a pop/rx locally would be equivalent to remove all labels if theres
> more than one, correct?

You can only pop/rx locally for the label with the BOS.  All others must be
pop lookup. (ofcoures the lookup could say swap, or RA in which case we
are no longer in the 'pop/lookup' loop)

> > > IIRC, locally destined packets means that the LSR is egress (for all
> > > hierarchical levels) and pops the last packet. As one possibility, the
> > > default action should be just call IP module packet reception if we just
> > > popped the last label, so the packet is locally delivered or forwarded per
> > > dest address.
> > 
> > The lowest level label cannot dictate that (except for router alert, but
> > in that case the stack above the RA is sent up as data).  If the lowest
> > level label say pop, you MUST pop and lookup the next level.  The only time
> > you can pop-all is in the error cases (and that is even questionable).
> 
> So what you are saying is let whoever programmed the instructions shoot
> themselves. i.e they could have specified pop, rx-local, am i correct?

What I'm saying is that the meaning of 'pop' is derived from which
position in the stack it is being applied to.

> 
> BTW, you mention RAs above - which would be considered exceptions. I
> think this is an example of a packet that could be sent via netlink as
> well.
> Note with distributed control where the control plane may be one
> ethernet hop away, this is useful (wrap the RA into a netlink packet and
> shve it onto the control board - at least thats what netlink2 is
> preaching)

See my comments above.

> 
> cheers,
> jamal
> 

In general I have the feeling something isn't clicking.  Am I explaining
these issue well or should backup and approach each one in depth?

-- 
James R. Leu
jl...@mi...