Re: [mpls-linux-devel] Re: 2.6 Spec: Random comments.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 2004-02-13 at 12:12, James R. Leu wrote:

> > >From user space this would look like:
> > 
> > l2c mpls ilm add dev eth0 label 22 nhalg roundrobin nhid 2 nhid 3 nhid 4
> 
> What about adding a new func ptr to the protocol driver.  Then we could
> do protocol dependent stuff like hashing the IPv4|6 header or ethernet
> header (ethernet over MPLS).

Ok, so you are looking at only IP packets at the edge of an MPLS
network. Describe a little packet walk. Are you planning to 
not use the ECMP features?

> The task is trival if the stack only has one label, for more then one label
> we would have to be creative.  Hashing the label stack, or use the PW ID
> (suggestion in PWE3 WG which adds a word after the labelstack to indicate
> what protocol lies below.)  The PW ID could be used to lookup the protocol
> driver to generate the hash.

Point me to some doc if you dont mind. Is this for some of the VPN
encapsulations?

> Or of course we could just add an options for which algo to use.

Note what i suggested is only for ILM level; And there you could add any
algorithms you want. With the protocol driver are you suggesting to do
something at the IPV4/6 FTN level only?

> Here are some snippits.  I think XFRM may remove the need for these,
> but for now it works.

> Setup the dst stacking
> ----------------------
> 
>     net/mpls/mpls_output.c
> 
>     int
>     mpls_set_nexthop (struct dst_entry *dst, u32 nh_data, struct spec_nh *spec)
>     {
>             struct mpls_out_info *moi = NULL;

I take it mpls_out_info is an nhlfe entry?

>             MPLS_ENTER;
>             moi = mpls_get_moi(nh_data);
>             if (unlikely(!moi))
>                     return -1;
>                                                                                     
>             dst->metrics[RTAX_MTU-1] = moi->moi_mtu;
>             dst->child = dst_clone(&moi->moi_dst);
>             MPLS_DEBUG("moi: %p mtu: %d dst: %p\n", moi, moi->moi_mtu,
>                     &moi->moi_dst);
>             MPLS_EXIT;
>             return 0;
>     }
> 
>     mpls_set_nexthop is called from ipv4:rt_set_nexthop and from
>     ipv6:ip6_route_add (I have a 'special nextop' system developed which
>     would be replaced by XFRM).  It is very similar to your RTA_MPLS_FEC,
>     but has 2 pieces of data a RTA_SPEC_PROTO and RTA_SPEC_DATA.  It is
>     intended for multiple protocols to be able to register special nexthop.
>     Right now only MPLS registers :-)  Again I have every intention of
>     ripping it out in favor XFRM.
> 
> Using the dst stack
> -------------------
> 
>     net/ipv4/ip_output.c
> 
>     static inline int ip_finish_output2(struct sk_buff *skb)
>     {
>             struct dst_entry *dst = skb->dst;
>             struct hh_cache *hh = dst->hh;
>             struct net_device *dev = dst->dev;
>             int hh_len = LL_RESERVED_SPACE(dev);
> 
>             if (dst->child) {
>                     skb->dst = dst_pop(skb->dst);
>                     return skb->dst->output(skb);
>             }
>     ...
> 
>     Something very similar exists in net/ipv6/ip6_output.c ip6_output_finish() 
> 

On the outset this does look a bit cleaner but i would have to ping my
brain on Daves approach. Take a look at his code.
Q: Can you stack more than one of those dsts? If yes, then it may be
even safer to have the nhlfe_route in the dst instead, no?
i.e how sure can you be that child will be MPLS related; in other case
it is guaranteed to (it does say dst->xxmplsxx).
There are a few pieces for the current approach that i didnt like ;
example the net_output_maybe_reroute() thing. Or having to mod dst.c
to add ifdefs for MPLS. There could be a marriage of the two approaches
maybe?

cheers,
jamal