Re: [mpls-linux-devel] Merging into the kernel?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello Steven,

On Sun, Feb 26, 2006 at 05:47:22PM +0000, Steven Whitehouse wrote:
> Hi,
>=20
> On Wed, Feb 15, 2006 at 09:40:57PM -0600, James R. Leu wrote:

<snip original XFRM discussion>

> Yes - I wonder though whether we could use a different selector mechanism
> but keeping some of the general framework. When I looked at it, the main
> thing which struck me was that the difficulty in changing the selector
> mechanism was mostly down to the interface (via netlink) to userland.
> Actually changing it on the kernel side is not impossible I think.

If this can be done in an efficient manor, I think it would be more readily
accepted by the powers that be.  Although, every time I look at the problem
it still comes down to something has to be attached to a node in the L3 rou=
ting
table.  I thought about adding a XFRM reference to the IPv4|6 nodes, but ca=
me
to the conclusion that a generic system (the shim layer) would be more
flexible for other protocols.  Also, I thought it would be easier to
implement a new 'shim' hook in other L3 protocols, as opposed to implementi=
ng
a XFRM interface for them.

Perhaps I've overlooked something.  Let me know if you have any ideas
about how to go about this.

<snip XC/radix discussion>

> And of it occured to me that in order to find this out we'd need some
> tools to test against. Please find attached a patch for pktgen (as
> current in davem's net-2.6.17 git tree at kernel.org) to generate
> MPLS packets.
>=20
> The extension allows you to add a stack of labels onto the packets its
> sending out. There is one extra hack which I included: since we know
> how many labels there are in the stack, I've used the bottom of stack
> bit to indicate whether the label should be randomly generated or not.
>=20
> You can thus push a stack of (up to 16 labels) where each label in the
> stack is either a fixed value or random.
>=20
> pgset "mpls 0001000a,0002000a,0000000a"
>=20
> for example pushes labels 16, 32 and 0 (ipv4 null) each with a ttl of 10.
> If you set the bottom of stack bit in one of the labels it will turn on
> the MPLS_RND flag. You can also set and/or reset that flag in the=20
> normal way as well.
>=20
> Patches to pktgen have become very popular of late it seems
> so I'm going to wait until the latest set which are pending at the
> moment have made it into Dave's tree before making a final diff to send
> to Robert Olsson, the maintainer of pktgen.
>=20
> Also if anyone has feedback about this feature, please let me know.

Excellent!  I will play around with this.

> > > I have been giving some thought as to the efficiency of the forwarding
> > > process itself recently, with the idea of "transcoding" the instructi=
ons
> > > as provided via netlink into an efficient byte code to allow faster
> > > execution. The would appear to be considerable scope for merging cert=
ain
> > > instructions (e.g. a pull followed by a push) into one internal instr=
uction
> > > (i.e. the interface would be the same and the effect the same so it
> > > wouldn't break the protocol at all).
> >=20
> > I like the idea.  This is much like what I'm used to in the hardware
> > forwarding world. What you're kind of hinting at it a packet translation
> > engine, this would make it easier to map the forwarding of packets onto=
 FPGA
> > or ASIC based hardware (isn't there a couple of projects doing this
> > for packet filtering? nf-HIPAC)
> >
> Its possible it might make it easier. I have to say that although I'm a
> hardware engineer by training I've never really got into details of
> network interfaces and what its possible to do on the cards. I wouldn't
> be at all surprised if it was the case though and it would be nice to
> do :-)

I think this is a great idea, but would like to worry about getting the base
MPLS code accepted first.  After that we can work on the optimizations.

> > > The various instructions to set/get tcindex and nfmark seem like a
> > > very good plan. I'm considering writing a patch to add setting nfmark
> > > through the ipv4/6/decnet routing tables which I think would be a
> > > generally useful plan. I wonder also if using one or the other or both
> > > of nfmark and/or tcindex as a key in looking up the nhlfe and/or ilm
> > > isn't a bad idea either.
> >=20
> > That might be against the RFCs.  I know I'm already overstepping the
> > RFCs by allowing the EXP bits to determine a NHLFE.
> >
> I wouldn't worry too much about overstepping what the RFCs say so long
> as the result makes sense and the stack can still comply with them on
> all the required points. The main worry with schemes like this is really
> just a question of forwarding speed and whether it will slow things down
> too much.
>=20
> > > If nfmark could be 1:1 with mpls fec, then it might be possible to use
> > > it together with xfrm as the interface for higher level protocols.
> >=20
> > Not sure I follow you here.  Currently with the shim setup there is no
> > NHLFE lookup in the forward path, the NHLFE is bound to the IPv4|6 rout=
e or
> > the eb|iptables rule.
> >
> Ok, let me explain a bit more then.... I'm assuming a scenario where the
> NHFLE is determined based upon nfmark and nfmark is set in the route
> (of whatever protocol). If nfmark were also a key for xfrm then it
> should be possible to "bundle" a set of dst_entry with the MPLS nhlfe
> as the last entry in the stack.

OK.  I understand now.  The existing nffwd instruction handles this but via
a second lookup.  Your idea would eliminate the second lookup.  My
technique allows the nfmark to be used at any node in a LSP, not just on
the ingress LER.

> I haven't got any further with the DECnet interface since I last posted
> but I may well make that my next project,

Let me know if I can be of assistance.

> Steve.

--=20
James R. Leu
jl...@mi...