From: Jon M. <jm...@re...> - 2020-05-05 00:15:16
|
Hi all, I was pondering a little more about this possible feature. First of all, I realized that the following test bool tipc_msg_validate(struct sk_buff **_skb) { [...] if (unlikely(msg_version(hdr) != TIPC_VERSION)) return false; [...] } makes it very hard to update the version number in a backwards compatible way. Even discovery messages will be rejected by v2 nodes, and we don't get around that unless we do discovery with v2 messages, or send out a duplicate set (v2 +v3) of discovery messages. And, we can actually achieve exactly what we want with just setting another capability bit. So, we set bit #12 to mean "TIPC_EXTENDED", to also to mean "all previous capabilities are valid if this bit is set, no need to test for it" That way, we can zero out these bits and start reusing them for new capabilities when we need to. AF_TIPC3 now becomes AF_TIPCE, tipc_addr becomes tipce_addr etc. The binding table needs to be updated the following way: union publ_item { struct { __be32 type; __be32 lower; __be32 upper; } legacy; struct { u8 type[16]; u8 lower[16]; u8 upper[16]; } extended; }; struct publication { u8 extended; u8 scope; /* This can only take values [0:3] */ u8 spare[2]; union publ_item publ; u8 node[16]; u32 port; u32 key; struct list_head binding_node; struct list_head binding_sock; struct list_head local_publ; struct list_head all_publ; struct rcu_head rcu; }; struct distr_item { union publ_item; __be32 port; __be32 key; }; The NAME_DISTR protocol must be extended with a field indicating if it contains legacy publication(s) or extended publication(s). 'Extended' nodes receive separate bulks for legacy and extended publications, since it is hard to mix them in the same message. Legacy nodes only receive legacy publications, so in this case the distributor just send a bulk for those. The topology subscriber must be updated in a similar manner, but we can assume that the same socket cannot issue two types of subscriptions and receive two types of events; it has to be on or the other. This should simplify the task somewhat. User message header format needs to be changed for Service Address (Port Name) messages: - Type occupies word [8:B], i.e. bytes [32:47] - Instance occupies word [C:F], i.e. bytes [48:64] This is where it gets tricky. The 'header size' field is only 4 bits and counts 32-bit words. This means that current max header size that can be indicated is 60 bytes. A simple way might be to just extend the field with one of the tree unused bits [16:18] in word 1 as msb. That would be backwards compatible since those bits are currently 0, and no special tricks are needed. Regarding TIPC_MCAST_MSG we need yet another 16 bytes, [65:80] if we want to preserve the current semantics on [lower,upper]. However, I am highly uncertain if that feature is ever used and needed. We may be good by just keeping one 'instance' field just as in NAMED messages. The group cast protocol could be left for later, once we understand the consequences better than now, but semantically it should work just like now, except with a longer header and type/instance fields. It would also be nice if the 16 byte node identity replaces the current 4 byte node address/number in all interaction with user land, inclusive the presentation of the neighbor monitoring status in monitor.c. That can possibly also be left for later. Finally, would it be possible to mark a socket at 'legacy' or 'extended' without adding a new AF_TIPCE value? If this can be done in a not-too-ugly way it might be worth considering. ///jon On 4/27/20 9:53 PM, jm...@re... wrote: > From: Jon Maloy <jm...@re...> > > TIPC would be more attractive in a modern user environment such > as Kubernetes if it could provide a larger address range. > > Advantages: > - Users could directly use UUIDs, strings or other values as service > instances types and instances. > - No more risk of collisions between randomly selected service types > > The effect on the TIPC implementation and protocol would be significant, > but this is still worth considering. > --- > include/linux/socket.h | 5 ++- > include/uapi/linux/tipc3.h | 79 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 82 insertions(+), 2 deletions(-) > create mode 100644 include/uapi/linux/tipc3.h > > diff --git a/include/linux/socket.h b/include/linux/socket.h > index 54338fac45cb..ff2268ceedaf 100644 > --- a/include/linux/socket.h > +++ b/include/linux/socket.h > @@ -209,8 +209,8 @@ struct ucred { > * reuses AF_INET address family > */ > #define AF_XDP 44 /* XDP sockets */ > - > -#define AF_MAX 45 /* For now.. */ > +#define AF_TIPC3 45 /* TIPC version 3 sockets */ > +#define AF_MAX 46 /* For now.. */ > > /* Protocol families, same as address families. */ > #define PF_UNSPEC AF_UNSPEC > @@ -260,6 +260,7 @@ struct ucred { > #define PF_QIPCRTR AF_QIPCRTR > #define PF_SMC AF_SMC > #define PF_XDP AF_XDP > +#define PF_TIPC3 AF_TIPC3 > #define PF_MAX AF_MAX > > /* Maximum queue length specifiable by listen. */ > diff --git a/include/uapi/linux/tipc3.h b/include/uapi/linux/tipc3.h > new file mode 100644 > index 000000000000..0d385bc41b66 > --- /dev/null > +++ b/include/uapi/linux/tipc3.h > @@ -0,0 +1,79 @@ > +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ > +/* > + * include/uapi/linux/tipc3.h: Header for TIPC v3 socket interface > + * > + * Copyright (c) 2020 Red Hat Inc > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions are met: > + * > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * 3. Neither the names of the copyright holders nor the names of its > + * contributors may be used to endorse or promote products derived from > + * this software without specific prior written permission. > + * > + * Alternatively, this software may be distributed under the terms of the > + * GNU General Public License ("GPL") version 2 as published by the Free > + * Software Foundation. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _LINUX_TIPC3_H_ > +#define _LINUX_TIPC3_H_ > + > +#include <linux/types.h> > +#include <linux/sockios.h> > +#include <linux/tipc.h> > + > +struct tipc3_addr { > + __u8[16] type; /* zero if socket address */ > + __u8[16] instance; /* port if socket address */ > + __u8[16] node; /* zero if whole cluster */ > +}; > + > +struct tipc3_subscr { > + __u8[16] type; > + __u8[16] lower; > + __u8[16] upper; > + __u8[16] node; > + __u32 timeout; /* subscription duration (in ms) */ > + __u32 filter; /* bitmask of filter options */ > + __u8 usr_handle[16]; /* available for subscriber use */ > +}; > + > +struct tipc3_event { > + __u8[16] lower; /* matching range */ > + __u8[16] upper; /* " " */ > + struct tipc3_addr socket; /* associated socket */ > + struct tipc2_subscr sub; /* associated subscription */ > + __u32 event; /* event type */ > +}; > + > +struct sockaddr_tipc3 { > + unsigned short family; > + bool mcast; > + struct tipc3_addr addr; > +}; > + > +struct tipc3_group_req { > + struct tipc3_addr addr; > + __u32 flags; > +}; > + > +#endif |