From: <eri...@er...> - 2014-12-18 16:48:23
|
From: Erik Hugne <eri...@er...> This have probably been idling for a few years by now.. and i just stumbled upon this which caused me to do a resend of the udp set: https://www.buckhill.co.uk/blog/how-to-enable-broadcast-and-multicast-on-amazon-aws-ec2/2#.VJMCGB-c1hE I haven't changed much (hardly anything) since last round. Ying wanted to have UDP bearer opt-in, but i disagree with that and kept it always-on. He also had a point about the deferred setup work that i have not addressed yet. http://thread.gmane.org/gmane.network.tipc.general/7238/focus=7244 I know IPv6 is missing, but i think that adding that (and all MLD stuff needed for mcast discovery) is best saved for a followup patchset. Going on christmas holiday tomorrow, back 7'th of January. If net-next is open by then i will send it off. Erik Hugne (5): tipc: rename media/msg related definitions tipc: make media address offset a common define tipc: allow ':' character in link names tipc: expand interface/bearer/link name limits tipc: add ip/udp media type include/uapi/linux/tipc.h | 12 +- net/tipc/Makefile | 2 +- net/tipc/bearer.c | 1 + net/tipc/bearer.h | 11 +- net/tipc/core.c | 5 + net/tipc/eth_media.c | 8 +- net/tipc/ib_media.c | 2 +- net/tipc/link.c | 6 +- net/tipc/msg.h | 4 +- net/tipc/udp_media.c | 503 ++++++++++++++++++++++++++++++++++++++++++++++ 10 files changed, 534 insertions(+), 20 deletions(-) create mode 100644 net/tipc/udp_media.c -- 2.1.3 |
From: <eri...@er...> - 2014-12-18 16:48:25
|
From: Erik Hugne <eri...@er...> With the exception of infiniband media which does not use media offsets, the media address is always located at offset 4 in the media info field as defined by the protocol, so we move the definition to the generic bearer.h Signed-off-by: Erik Hugne <eri...@er...> --- net/tipc/bearer.h | 2 ++ net/tipc/eth_media.c | 6 ++---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index caa29d4..4f0907e 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -51,6 +51,8 @@ */ #define TIPC_MEDIA_INFO_SIZE 32 #define TIPC_MEDIA_TYPE_OFFSET 3 +#define TIPC_MEDIA_ADDR_OFFSET 4 + /* * Identifiers of supported TIPC media types diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 085d3a0..f69a2fd 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -37,8 +37,6 @@ #include "core.h" #include "bearer.h" -#define ETH_ADDR_OFFSET 4 /* MAC addr position inside address field */ - /* Convert Ethernet address (media address format) to string */ static int tipc_eth_addr2str(struct tipc_media_addr *addr, char *strbuf, int bufsz) @@ -55,7 +53,7 @@ static int tipc_eth_addr2msg(char *msg, struct tipc_media_addr *addr) { memset(msg, 0, TIPC_MEDIA_INFO_SIZE); msg[TIPC_MEDIA_TYPE_OFFSET] = TIPC_MEDIA_TYPE_ETH; - memcpy(msg + ETH_ADDR_OFFSET, addr->value, ETH_ALEN); + memcpy(msg + TIPC_MEDIA_ADDR_OFFSET, addr->value, ETH_ALEN); return 0; } @@ -79,7 +77,7 @@ static int tipc_eth_msg2addr(struct tipc_bearer *b, char *msg) { /* Skip past preamble: */ - msg += ETH_ADDR_OFFSET; + msg += TIPC_MEDIA_ADDR_OFFSET; return tipc_eth_raw2addr(b, addr, msg); } -- 2.1.3 |
From: <eri...@er...> - 2014-12-18 16:48:24
|
From: Erik Hugne <eri...@er...> The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names are misleading, as they actually define the size and offset of the whole media info field and not the address part. This patch does not have any functional changes. Signed-off-by: Erik Hugne <eri...@er...> --- net/tipc/bearer.h | 4 ++-- net/tipc/eth_media.c | 2 +- net/tipc/ib_media.c | 2 +- net/tipc/msg.h | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 2c1230a..caa29d4 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -49,7 +49,7 @@ * - the field's actual content and length is defined per media * - remaining unused bytes in the field are set to zero */ -#define TIPC_MEDIA_ADDR_SIZE 32 +#define TIPC_MEDIA_INFO_SIZE 32 #define TIPC_MEDIA_TYPE_OFFSET 3 /* @@ -65,7 +65,7 @@ * @broadcast: non-zero if address is a broadcast address */ struct tipc_media_addr { - u8 value[TIPC_MEDIA_ADDR_SIZE]; + u8 value[TIPC_MEDIA_INFO_SIZE]; u8 media_id; u8 broadcast; }; diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 5e1426f..085d3a0 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -53,7 +53,7 @@ static int tipc_eth_addr2str(struct tipc_media_addr *addr, /* Convert from media address format to discovery message addr format */ static int tipc_eth_addr2msg(char *msg, struct tipc_media_addr *addr) { - memset(msg, 0, TIPC_MEDIA_ADDR_SIZE); + memset(msg, 0, TIPC_MEDIA_INFO_SIZE); msg[TIPC_MEDIA_TYPE_OFFSET] = TIPC_MEDIA_TYPE_ETH; memcpy(msg + ETH_ADDR_OFFSET, addr->value, ETH_ALEN); return 0; diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c index 8522eef..e8c1671 100644 --- a/net/tipc/ib_media.c +++ b/net/tipc/ib_media.c @@ -57,7 +57,7 @@ static int tipc_ib_addr2str(struct tipc_media_addr *a, char *str_buf, /* Convert from media address format to discovery message addr format */ static int tipc_ib_addr2msg(char *msg, struct tipc_media_addr *addr) { - memset(msg, 0, TIPC_MEDIA_ADDR_SIZE); + memset(msg, 0, TIPC_MEDIA_INFO_SIZE); memcpy(msg, addr->value, INFINIBAND_ALEN); return 0; } diff --git a/net/tipc/msg.h b/net/tipc/msg.h index d5c83d7..2d36a29 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -75,7 +75,7 @@ #define MAX_MSG_SIZE (MAX_H_SIZE + TIPC_MAX_USER_MSG_SIZE) -#define TIPC_MEDIA_ADDR_OFFSET 5 +#define TIPC_MEDIA_INFO_OFFSET 5 struct tipc_msg { @@ -661,7 +661,7 @@ static inline void msg_set_redundant_link(struct tipc_msg *m, u32 r) static inline char *msg_media_addr(struct tipc_msg *m) { - return (char *)&m->hdr[TIPC_MEDIA_ADDR_OFFSET]; + return (char *)&m->hdr[TIPC_MEDIA_INFO_OFFSET]; } /* -- 2.1.3 |
From: <eri...@er...> - 2014-12-18 16:48:30
|
From: Erik Hugne <eri...@er...> The string representation of a tipc link is in the form: <local node>:<local bearer>-<remote node>:<remote bearer> For an Ethernet bearer, this can look like 1.1.1:eth0-1.1.2:eth0 The remote bearer name is learned during link activation, and the link layer code incorrectly assumes that this name shall be stored after the last ':' character in the link. If a link is reset/reactivated, this will cause the link name to be corrupted if the remote bearer name contains a ':' character. We solve this by first isolating the remote endpoint part of the link name, and then append the remote bearer name after the remote node address. Signed-off-by: Erik Hugne <eri...@er...> --- net/tipc/link.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index 23bcc11..8c45c4c 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1461,6 +1461,7 @@ static void tipc_link_proto_rcv(struct tipc_link *l_ptr, struct sk_buff *buf) u32 max_pkt_info; u32 max_pkt_ack; u32 msg_tol; + char *remote; struct tipc_msg *msg = buf_msg(buf); /* Discard protocol message during link changeover */ @@ -1494,8 +1495,9 @@ static void tipc_link_proto_rcv(struct tipc_link *l_ptr, struct sk_buff *buf) /* fall thru' */ case ACTIVATE_MSG: /* Update link settings according other endpoint's values */ - strcpy((strrchr(l_ptr->name, ':') + 1), (char *)msg_data(msg)); - + remote = strchr(l_ptr->name, '-'); + strlcpy((strchr(remote, ':') + 1), (char *)msg_data(msg), + TIPC_MAX_BEARER_NAME); msg_tol = msg_link_tolerance(msg); if (msg_tol > l_ptr->tolerance) link_set_supervision_props(l_ptr, msg_tol); -- 2.1.3 |
From: <eri...@er...> - 2014-12-18 16:48:33
|
From: Erik Hugne <eri...@er...> A TIPC link over IP/UDP media will have a substantially longer name including the ip/port for both endpoints. The string limits are increased to avoid truncation of these. Signed-off-by: Erik Hugne <eri...@er...> --- include/uapi/linux/tipc.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h index 6f71b9b..5e50358 100644 --- a/include/uapi/linux/tipc.h +++ b/include/uapi/linux/tipc.h @@ -212,15 +212,15 @@ struct sockaddr_tipc { * Maximum sizes of TIPC bearer-related names (including terminating NULL) * The string formatting for each name element is: * media: media - * interface: media:interface name - * link: Z.C.N:interface-Z.C.N:interface - * + * interface: interface/ip[:port:ip:port] + * bearer: media:interface/ip[:port:ip:port] + * link: Z.C.N:interface/ip[:port]-Z.C.N:interface/ip[:port] */ #define TIPC_MAX_MEDIA_NAME 16 -#define TIPC_MAX_IF_NAME 16 -#define TIPC_MAX_BEARER_NAME 32 -#define TIPC_MAX_LINK_NAME 60 +#define TIPC_MAX_IF_NAME 96 +#define TIPC_MAX_BEARER_NAME 128 +#define TIPC_MAX_LINK_NAME 256 #define SIOCGETLINKNAME SIOCPROTOPRIVATE -- 2.1.3 |
From: <eri...@er...> - 2014-12-18 16:48:45
|
From: Erik Hugne <eri...@er...> The ip/udp bearer can be configured in a point-to-point mode by specifying a remote ip/hostname: tipc-config -be=udp:eth0[:<port>:<remip>:<remport>] If not specified, the UDP port used is 6118 (iana assigned). Or, it can be enabled in multicast mode: tipc-config -be=udp:eth0 In this mode, links will be established to all tipc nodes that are member in the multicast group. The multicast group is generated based on the TIPC network ID but can be overridden by specifying a multicast address as remote ip. Signed-off-by: Erik Hugne <eri...@er...> --- net/tipc/Makefile | 2 +- net/tipc/bearer.c | 1 + net/tipc/bearer.h | 5 +- net/tipc/core.c | 5 + net/tipc/udp_media.c | 503 +++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 514 insertions(+), 2 deletions(-) create mode 100644 net/tipc/udp_media.c diff --git a/net/tipc/Makefile b/net/tipc/Makefile index 333e459..ad50a84 100644 --- a/net/tipc/Makefile +++ b/net/tipc/Makefile @@ -8,7 +8,7 @@ tipc-y += addr.o bcast.o bearer.o config.o \ core.o link.o discover.o msg.o \ name_distr.o subscr.o name_table.o net.o \ netlink.o node.o socket.o log.o eth_media.o \ - server.o + server.o udp_media.o tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o tipc-$(CONFIG_SYSCTL) += sysctl.o diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 463db5b..20d7773 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -47,6 +47,7 @@ static struct tipc_media * const media_info_array[] = { #ifdef CONFIG_TIPC_MEDIA_IB &ib_media_info, #endif + &udp_media_info, NULL }; diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 4f0907e..fdaba9b 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -42,7 +42,7 @@ #include <net/genetlink.h> #define MAX_BEARERS 2 -#define MAX_MEDIA 2 +#define MAX_MEDIA 3 /* Identifiers associated with TIPC message header media address info * - address info field is 32 bytes long @@ -59,6 +59,7 @@ */ #define TIPC_MEDIA_TYPE_ETH 1 #define TIPC_MEDIA_TYPE_IB 2 +#define TIPC_MEDIA_TYPE_UDP 3 /** * struct tipc_media_addr - destination address used by TIPC bearers @@ -179,6 +180,8 @@ extern struct tipc_media eth_media_info; #ifdef CONFIG_TIPC_MEDIA_IB extern struct tipc_media ib_media_info; #endif +extern struct tipc_media udp_media_info; +extern struct workqueue_struct *tipc_udp_send_wq; int tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info); int tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info); diff --git a/net/tipc/core.c b/net/tipc/core.c index a5737b8..e1ba0cd 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -79,6 +79,7 @@ struct sk_buff *tipc_buf_acquire(u32 size) */ static void tipc_core_stop(void) { + destroy_workqueue(tipc_udp_send_wq); tipc_net_stop(); tipc_bearer_cleanup(); tipc_netlink_stop(); @@ -98,6 +99,10 @@ static int tipc_core_start(void) get_random_bytes(&tipc_random, sizeof(tipc_random)); + tipc_udp_send_wq = alloc_workqueue("tipc_udp_send", 0, 0); + if (IS_ERR(tipc_udp_send_wq)) + return -ENOMEM; + err = tipc_sk_ref_table_init(tipc_max_ports, tipc_random); if (err) goto out_reftbl; diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c new file mode 100644 index 0000000..8ddb163 --- /dev/null +++ b/net/tipc/udp_media.c @@ -0,0 +1,503 @@ +/* net/tipc/udp_media.c: IP bearer support for TIPC + * + * Copyright (c) 2013, Ericsson AB + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the names of the copyright holders nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. + * + * Alternatively, this software may be distributed under the terms of the + * GNU General Public License ("GPL") version 2 as published by the Free + * Software Foundation. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/socket.h> +#include <linux/ip.h> +#include <linux/udp.h> +#include <linux/inet.h> +#include <linux/inetdevice.h> +#include <linux/kernel.h> +#include <linux/workqueue.h> +#include <linux/list.h> +#include <net/sock.h> +#include "core.h" +#include "bearer.h" + +/* IANA assigned UDP port */ +#define UDP_PORT_DEFAULT 6118 + +/** + * struct udp_skb_parms - bearer representation of a packet + * @dst: the remote bearer address + * @skb: tipc packet payload + * @next: list pointer + */ +struct udp_skb_parms { + struct sockaddr_in dst; + struct sk_buff *skb; + struct list_head next; +}; + +/** + * struct udp_bearer - ip/udp bearer data structure + * @txq_lock: transmit queue lock + * @txq: transmit queue + * @bearer: associated generic tipc bearer + * @listen: bearer listener socket + * @transmit: transmit socket + * @next: list pointer + * @work: used to schedule deferred work on a bearer + * @tx_work: used to defer packet transmission to socket + */ +struct udp_bearer { + spinlock_t txq_lock; + struct list_head txq; + struct tipc_bearer __rcu *bearer; + struct socket *listen; + struct socket *transmit; + struct work_struct work; + struct work_struct tx_work; +}; + +struct workqueue_struct *tipc_udp_send_wq = NULL; + +/* udp_media_addr_set - convert a ip/udp address to a TIPC media address */ +static void tipc_udp_media_addr_set(struct tipc_media_addr *addr, + struct sockaddr_in *sin) +{ + memset(addr, 0, sizeof(struct tipc_media_addr)); + addr->media_id = TIPC_MEDIA_TYPE_UDP; + memcpy(addr->value, sin, sizeof(struct sockaddr_in)); + if (ipv4_is_multicast(sin->sin_addr.s_addr)) + addr->broadcast = 1; +} + +/* tipc_udp_addr2str - convert ip/udp address to string */ +static int tipc_udp_addr2str(struct tipc_media_addr *a, char *buf, int size) +{ + struct sockaddr_in *sin = (struct sockaddr_in *)&a->value; + + snprintf(buf, size, "%pI4:%u", &sin->sin_addr, + htons(sin->sin_port)); + return 0; +} + +/* tipc_udp_msg2addr - extract an ip/udp address from a TIPC ndisc message */ +static int tipc_udp_msg2addr(struct tipc_bearer *b, struct tipc_media_addr *a, + char *msg) +{ + struct sockaddr_in *sin; + + sin = (struct sockaddr_in *)(msg + TIPC_MEDIA_ADDR_OFFSET); + if (msg[TIPC_MEDIA_TYPE_OFFSET] != TIPC_MEDIA_TYPE_UDP) + return -EINVAL; + tipc_udp_media_addr_set(a, sin); + return 0; +} + +/* tipc_udp_addr2msg - write an ip/udp address to a TIPC ndisc message */ +static int tipc_udp_addr2msg(char *msg, struct tipc_media_addr *a) +{ + memset(msg, 0, TIPC_MEDIA_INFO_SIZE); + msg[TIPC_MEDIA_TYPE_OFFSET] = TIPC_MEDIA_TYPE_UDP; + memcpy(msg + TIPC_MEDIA_ADDR_OFFSET, &a->value, + sizeof(struct sockaddr_in)); + return 0; +} + +/* tipc_udp_send - send a TIPC buffer to the bearer socket */ +static void tipc_udp_send(struct work_struct *work) +{ + struct msghdr msg; + struct kvec iov; + struct sk_buff *skb; + struct udp_skb_parms *parms; + struct udp_bearer *ub; + int err; + + ub = container_of(work, struct udp_bearer, tx_work); + memset(&msg, 0, sizeof(struct msghdr)); + while (!list_empty(&ub->txq)) { + spin_lock_bh(&ub->txq_lock); + parms = list_first_entry(&ub->txq, struct udp_skb_parms, next); + list_del(&parms->next); + spin_unlock_bh(&ub->txq_lock); + skb = parms->skb; + iov.iov_base = skb->data; + iov.iov_len = skb->len; + msg.msg_iter.iov = (struct iovec *)&iov; + msg.msg_name = &parms->dst; + msg.msg_namelen = sizeof(struct sockaddr_in); + err = kernel_sendmsg(ub->transmit, &msg, &iov, 1, iov.iov_len); + /* no route to host, interface down or other underlying error */ + if (unlikely(err < 0)) + pr_warn_ratelimited("sendmsg failed with error: %d\n", + err); + kfree(parms); + consume_skb(skb); + cond_resched(); + } +} + +/** + * tipc_send_msg - enqueue a send request + * + * The send request need to be deferred since we cannot call kernel + * socket API functions while holding the node spinlock. + */ +static int tipc_udp_send_msg(struct sk_buff *skb, struct tipc_bearer *b, + struct tipc_media_addr *dest) +{ + struct udp_bearer *ub = b->media_ptr; + struct sk_buff *clone; + struct udp_skb_parms *parms; + + clone = skb_clone(skb, GFP_ATOMIC); + if (!clone) + return -ENOMEM; + parms = kmalloc(sizeof(*parms), GFP_ATOMIC); + if (!parms) { + kfree_skb(clone); + return -ENOMEM; + } + + /* Ndisc code uses temporary stack allocated media_addr + * so we must copy it before deferring + */ + memcpy(&parms->dst, dest, sizeof(struct sockaddr_in)); + parms->skb = clone; + spin_lock_bh(&ub->txq_lock); + list_add_tail(&parms->next, &ub->txq); + spin_unlock_bh(&ub->txq_lock); + queue_work(tipc_udp_send_wq, &ub->tx_work); + return 0; +} + +/* tipc_udp_recv - read data from bearer socket */ +static void tipc_udp_recv(struct sock *sk) +{ + struct sk_buff *skb; + struct udp_bearer *ub; + struct tipc_bearer *b; + int err; + + skb = skb_recv_datagram(sk, 0, MSG_DONTWAIT, &err); + if (err == -EAGAIN) + return; + read_lock(&sk->sk_callback_lock); + ub = sk->sk_user_data; + read_unlock(&sk->sk_callback_lock); + if (!ub) { + pr_err_ratelimited("failed to get UDP bearer reference"); + kfree_skb(skb); + return; + } + skb_pull(skb, sizeof(struct udphdr)); + skb->next = NULL; + skb_orphan(skb); + rcu_read_lock(); + b = rcu_dereference_rtnl(ub->bearer); + if (b) { + tipc_rcv(skb, b); + rcu_read_unlock(); + return; + } + rcu_read_unlock(); + kfree_skb(skb); +} + +/* init_mcast_bearer - set up igmp membership for a bearer */ +static int init_mcast_bearer(struct sockaddr_in *ifaddr, + struct udp_bearer *ub) +{ + struct ip_mreq mreq; + struct tipc_bearer *b; + struct sockaddr_in *bcast; + const int off = 0; + int err; + + b = ub->bearer; + bcast = (struct sockaddr_in *)&b->bcast_addr.value; + mreq.imr_multiaddr.s_addr = bcast->sin_addr.s_addr; + mreq.imr_interface.s_addr = ifaddr->sin_addr.s_addr; + /* Join the multicast group for our TIPC network. */ + err = kernel_setsockopt(ub->transmit, IPPROTO_IP, IP_ADD_MEMBERSHIP, + (char *)&mreq, sizeof(mreq)); + if (err) { + pr_err("Failed to join multicast group\n"); + return err; + } + /* Turn off multicast loop to avoid getting our own ndisc messages */ + err = kernel_setsockopt(ub->transmit, IPPROTO_IP, IP_MULTICAST_LOOP, + (char *)&off, sizeof(off)); + if (err) { + pr_err("Failed to disable multicast loop\n"); + return err; + } + /* Assign the egress interface for tipc discovery mcast messages */ + err = kernel_setsockopt(ub->transmit, IPPROTO_IP, IP_MULTICAST_IF, + (char *)&mreq.imr_interface.s_addr, + sizeof(struct in_addr)); + if (err) { + pr_err("Failed to set egress multicast interface\n"); + return err; + } + err = kernel_bind(ub->transmit, (struct sockaddr *)bcast, + sizeof(struct sockaddr_in)); + if (err) + pr_err("Failed to bind multicast bearer socket\n"); + return err; +} + +/** + * setup_bearer - deferred udp bearer initialization + * @work: work struct holding the udp bearer pointer + * + * create and initialize the listen and transmit udp sockets + */ +static void setup_bearer(struct work_struct *work) +{ + struct net_device *dev; + struct sockaddr_in *listen; + struct sockaddr_in *bcast; + struct udp_bearer *ub; + struct tipc_bearer *b; + int err; + + ub = container_of(work, struct udp_bearer, work); + rcu_read_lock(); + b = rcu_dereference_rtnl(ub->bearer); + listen = (struct sockaddr_in *)&b->addr; + dev = __ip_dev_find(&init_net, listen->sin_addr.s_addr, false); + if (!dev) { + pr_err("Device lookup for %pI4 failed\n", + &listen->sin_addr.s_addr); + err = -ENODEV; + goto out; + } + b->mtu = dev->mtu - sizeof(struct iphdr) - sizeof(struct udphdr); + err = sock_create_kern(AF_INET, SOCK_DGRAM, 0, &ub->listen); + if (err) { + pr_err("Failed to create bearer listen socket"); + goto out; + } + err = kernel_bind(ub->listen, (struct sockaddr *)listen, + sizeof(struct sockaddr_in)); + if (err) { + pr_err("Failed to bind bearer listen socket"); + goto out; + } + if (sock_create_kern(AF_INET, SOCK_DGRAM, 0, &ub->transmit)) + goto out; + bcast = (struct sockaddr_in *)&b->bcast_addr.value; + if (ipv4_is_multicast(bcast->sin_addr.s_addr)) { + err = init_mcast_bearer(listen, ub); + if (err) + goto out; + } + write_lock_bh(&ub->transmit->sk->sk_callback_lock); + ub->transmit->sk->sk_data_ready = tipc_udp_recv; + ub->transmit->sk->sk_user_data = ub; + write_unlock_bh(&ub->transmit->sk->sk_callback_lock); + + write_lock_bh(&ub->listen->sk->sk_callback_lock); + ub->listen->sk->sk_data_ready = tipc_udp_recv; + ub->listen->sk->sk_user_data = ub; + write_unlock_bh(&ub->listen->sk->sk_callback_lock); + + rcu_read_unlock(); + return; +out: + pr_err("UDP bearer setup failed (errno=%d)\n", err); + rtnl_lock(); + tipc_disable_bearer(b->name); + rtnl_unlock(); +} + +/** + * parse_options - parse udp bearer configuration + * @arg: bearer configuration string, including media name + * @local: output struct holding local ip/port + * @remote: output struct holding remote ip/port + */ +static int parse_options(char *arg, struct sockaddr_in *local, + struct sockaddr_in *remote) +{ + char *opt = NULL; + char str[TIPC_MAX_BEARER_NAME]; + char *tmp; + unsigned long port; + + local->sin_family = AF_INET; + remote->sin_family = AF_INET; + strlcpy(str, arg, TIPC_MAX_BEARER_NAME); + tmp = str; + /* Skip media name */ + opt = strsep(&tmp, ":"); + if (!opt) + return -EINVAL; + /* Get the local address (mandatory) */ + opt = strsep(&tmp, ":"); + if (!opt || + !in4_pton(opt, -1, (u8 *)&local->sin_addr.s_addr, 0, NULL)) { + pr_err("Invalid local address %s\n", opt); + return -EINVAL; + } + /* Get the local port (optional) */ + opt = strsep(&tmp, ":"); + if (opt) { + if (0 == kstrtoul(opt, 10, &port) && port > 0 && port < 65535) { + local->sin_port = htons(port); + } else { + pr_err("Invalid local port %s\n", opt); + return -EINVAL; + } + } else { + local->sin_port = htons(UDP_PORT_DEFAULT); + } + /* Get the discovery/peer address (optional) */ + opt = strsep(&tmp, ":"); + if (opt) { + if (!in4_pton(opt, -1, (u8 *)&remote->sin_addr.s_addr, + 0, NULL)) { + pr_err("Invalid discover/peer address %s\n", opt); + return -EINVAL; + } + } else { + /* Generate discovery address based on network ID */ + remote->sin_addr.s_addr = htonl((228 << 24) | + ((tipc_net_id >> 8) << 8) | + (tipc_net_id & 0xFF)); + } + /* Get the remote port (optional) */ + opt = strsep(&tmp, ":"); + if (opt) { + if (0 == kstrtoul(opt, 10, &port) && port > 0 && port < 65535) { + remote->sin_port = htons(port); + } else { + pr_err("Invalid remote port %s\n", opt); + return -EINVAL; + } + } else { + remote->sin_port = htons(UDP_PORT_DEFAULT); + } + return 0; +} + +/** + * tipc_udp_enable - callback to create a new udp bearer instance + * @b: pointer to generic tipc_bearer + * + * validate the bearer parameters and perform basic initialization of the + * udp_bearer, the kernel socket setup is deferred + */ +static int tipc_udp_enable(struct tipc_bearer *b) +{ + struct udp_bearer *ub; + struct sockaddr_in listen; + struct sockaddr_in *bcast; + + ub = kzalloc(sizeof(*ub), GFP_ATOMIC); + if (!ub) + return -ENOMEM; + + bcast = (struct sockaddr_in *)&b->bcast_addr.value; + memset(bcast, 0, sizeof(b->bcast_addr.value)); + memset(&listen, 0, sizeof(listen)); + if (parse_options(b->name, &listen, bcast) == -EINVAL) { + kfree(ub); + return -EINVAL; + } + b->bcast_addr.media_id = TIPC_MEDIA_TYPE_UDP; + b->bcast_addr.broadcast = 1; + INIT_LIST_HEAD(&ub->txq); + spin_lock_init(&ub->txq_lock); + INIT_WORK(&ub->tx_work, tipc_udp_send); + b->media_ptr = ub; + rcu_assign_pointer(ub->bearer, b); + tipc_udp_media_addr_set(&b->addr, &listen); + + INIT_WORK(&ub->work, setup_bearer); + schedule_work(&ub->work); + return 0; +} + +static void clean_bearer_txq(struct udp_bearer *ub) +{ + struct udp_skb_parms *parm, *safe; + + spin_lock_bh(&ub->txq_lock); + list_for_each_entry_safe(parm, safe, &ub->txq, next) { + list_del(&parm->next); + kfree_skb(parm->skb); + kfree(parm); + } + spin_unlock_bh(&ub->txq_lock); +} + +/* cleanup_bearer - break the socket/bearer association */ +static void cleanup_bearer(struct work_struct *work) +{ + struct udp_bearer *ub = container_of(work, struct udp_bearer, work); + + if (ub->listen) + sock_release(ub->listen); + if (ub->transmit) + sock_release(ub->transmit); + clean_bearer_txq(ub); + flush_work(&ub->tx_work); + kfree(ub); +} + +/* tipc_udp_disable - detach bearer from socket */ +static void tipc_udp_disable(struct tipc_bearer *b) +{ + struct udp_bearer *ub = b->media_ptr; + + if (ub->listen) + sock_set_flag(ub->listen->sk, SOCK_DEAD); + if (ub->transmit) + sock_set_flag(ub->transmit->sk, SOCK_DEAD); + b->media_ptr = NULL; + rcu_assign_pointer(ub->bearer, NULL); + INIT_WORK(&ub->work, cleanup_bearer); + schedule_work(&ub->work); +} + +struct tipc_media udp_media_info = { + .send_msg = tipc_udp_send_msg, + .enable_media = tipc_udp_enable, + .disable_media = tipc_udp_disable, + .addr2str = tipc_udp_addr2str, + .addr2msg = tipc_udp_addr2msg, + .msg2addr = tipc_udp_msg2addr, + .priority = TIPC_DEF_LINK_PRI, + .tolerance = TIPC_DEF_LINK_TOL, + .window = TIPC_DEF_LINK_WIN, + .type_id = TIPC_MEDIA_TYPE_UDP, + .hwaddr_len = 20, + .name = "udp" +}; -- 2.1.3 |
From: Jon M. <jon...@er...> - 2015-01-05 17:11:55
|
> -----Original Message----- > From: Erik Hugne > Sent: December-18-14 11:45 AM > To: tip...@li...; Jon Maloy; > yin...@wi...; Richard Alpe > Cc: Erik Hugne > Subject: [PATCH 5/5] tipc: add ip/udp media type > > From: Erik Hugne <eri...@er...> > > The ip/udp bearer can be configured in a point-to-point mode by specifying a > remote ip/hostname: > tipc-config -be=udp:eth0[:<port>:<remip>:<remport>] > If not specified, the UDP port used is 6118 (iana assigned). > Or, it can be enabled in multicast mode: > tipc-config -be=udp:eth0 > In this mode, links will be established to all tipc nodes that are member in the > multicast group. The multicast group is generated based on the TIPC network > ID but can be overridden by specifying a multicast address as remote ip. > > Signed-off-by: Erik Hugne <eri...@er...> > --- > net/tipc/Makefile | 2 +- > net/tipc/bearer.c | 1 + > net/tipc/bearer.h | 5 +- > net/tipc/core.c | 5 + > net/tipc/udp_media.c | 503 > +++++++++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 514 insertions(+), 2 deletions(-) create mode 100644 > net/tipc/udp_media.c > > diff --git a/net/tipc/Makefile b/net/tipc/Makefile index 333e459..ad50a84 > 100644 > --- a/net/tipc/Makefile > +++ b/net/tipc/Makefile > @@ -8,7 +8,7 @@ tipc-y += addr.o bcast.o bearer.o config.o \ > core.o link.o discover.o msg.o \ > name_distr.o subscr.o name_table.o net.o \ > netlink.o node.o socket.o log.o eth_media.o \ > - server.o > + server.o udp_media.o > > tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o > tipc-$(CONFIG_SYSCTL) += sysctl.o > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 463db5b..20d7773 > 100644 > --- a/net/tipc/bearer.c > +++ b/net/tipc/bearer.c > @@ -47,6 +47,7 @@ static struct tipc_media * const media_info_array[] = { > #ifdef CONFIG_TIPC_MEDIA_IB > &ib_media_info, > #endif > + &udp_media_info, > NULL > }; > > diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 4f0907e..fdaba9b > 100644 > --- a/net/tipc/bearer.h > +++ b/net/tipc/bearer.h > @@ -42,7 +42,7 @@ > #include <net/genetlink.h> > > #define MAX_BEARERS 2 This has already been extended to 8 upstream. You will have a conflict. > -#define MAX_MEDIA 2 > +#define MAX_MEDIA 3 > > /* Identifiers associated with TIPC message header media address info > * - address info field is 32 bytes long @@ -59,6 +59,7 @@ > */ > #define TIPC_MEDIA_TYPE_ETH 1 > #define TIPC_MEDIA_TYPE_IB 2 > +#define TIPC_MEDIA_TYPE_UDP 3 > > /** > * struct tipc_media_addr - destination address used by TIPC bearers @@ - > 179,6 +180,8 @@ extern struct tipc_media eth_media_info; #ifdef > CONFIG_TIPC_MEDIA_IB extern struct tipc_media ib_media_info; #endif > +extern struct tipc_media udp_media_info; extern struct workqueue_struct > +*tipc_udp_send_wq; > > int tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info); int > tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info); diff --git > a/net/tipc/core.c b/net/tipc/core.c index a5737b8..e1ba0cd 100644 > --- a/net/tipc/core.c > +++ b/net/tipc/core.c > @@ -79,6 +79,7 @@ struct sk_buff *tipc_buf_acquire(u32 size) > */ > static void tipc_core_stop(void) > { > + destroy_workqueue(tipc_udp_send_wq); > tipc_net_stop(); > tipc_bearer_cleanup(); > tipc_netlink_stop(); > @@ -98,6 +99,10 @@ static int tipc_core_start(void) > > get_random_bytes(&tipc_random, sizeof(tipc_random)); > > + tipc_udp_send_wq = alloc_workqueue("tipc_udp_send", 0, 0); > + if (IS_ERR(tipc_udp_send_wq)) > + return -ENOMEM; > + > err = tipc_sk_ref_table_init(tipc_max_ports, tipc_random); > if (err) > goto out_reftbl; > diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c new file mode > 100644 index 0000000..8ddb163 > --- /dev/null > +++ b/net/tipc/udp_media.c > @@ -0,0 +1,503 @@ > +/* net/tipc/udp_media.c: IP bearer support for TIPC > + * > + * Copyright (c) 2013, Ericsson AB > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions are > met: > + * > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * 3. Neither the names of the copyright holders nor the names of its > + * contributors may be used to endorse or promote products derived from > + * this software without specific prior written permission. > + * > + * Alternatively, this software may be distributed under the terms of > +the > + * GNU General Public License ("GPL") version 2 as published by the > +Free > + * Software Foundation. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS "AS IS" > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > LIMITED > +TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > PARTICULAR > +PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR > +CONTRIBUTORS BE > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, > OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR > +BUSINESS > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, > WHETHER > +IN > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR > +OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF > ADVISED > +OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/socket.h> > +#include <linux/ip.h> > +#include <linux/udp.h> > +#include <linux/inet.h> > +#include <linux/inetdevice.h> > +#include <linux/kernel.h> > +#include <linux/workqueue.h> > +#include <linux/list.h> > +#include <net/sock.h> > +#include "core.h" > +#include "bearer.h" > + > +/* IANA assigned UDP port */ > +#define UDP_PORT_DEFAULT 6118 > + > +/** > + * struct udp_skb_parms - bearer representation of a packet > + * @dst: the remote bearer address > + * @skb: tipc packet payload > + * @next: list pointer > + */ > +struct udp_skb_parms { > + struct sockaddr_in dst; > + struct sk_buff *skb; > + struct list_head next; > +}; > + > +/** > + * struct udp_bearer - ip/udp bearer data structure > + * @txq_lock: transmit queue lock > + * @txq: transmit queue > + * @bearer: associated generic tipc bearer > + * @listen: bearer listener socket > + * @transmit: transmit socket > + * @next: list pointer > + * @work: used to schedule deferred work on a bearer > + * @tx_work: used to defer packet transmission to socket > + */ > +struct udp_bearer { > + spinlock_t txq_lock; > + struct list_head txq; > + struct tipc_bearer __rcu *bearer; > + struct socket *listen; > + struct socket *transmit; > + struct work_struct work; > + struct work_struct tx_work; > +}; > + > +struct workqueue_struct *tipc_udp_send_wq = NULL; > + > +/* udp_media_addr_set - convert a ip/udp address to a TIPC media > +address */ static void tipc_udp_media_addr_set(struct tipc_media_addr > *addr, > + struct sockaddr_in *sin) > +{ > + memset(addr, 0, sizeof(struct tipc_media_addr)); > + addr->media_id = TIPC_MEDIA_TYPE_UDP; > + memcpy(addr->value, sin, sizeof(struct sockaddr_in)); > + if (ipv4_is_multicast(sin->sin_addr.s_addr)) > + addr->broadcast = 1; > +} > + > +/* tipc_udp_addr2str - convert ip/udp address to string */ static int > +tipc_udp_addr2str(struct tipc_media_addr *a, char *buf, int size) { > + struct sockaddr_in *sin = (struct sockaddr_in *)&a->value; > + > + snprintf(buf, size, "%pI4:%u", &sin->sin_addr, > + htons(sin->sin_port)); > + return 0; > +} > + > +/* tipc_udp_msg2addr - extract an ip/udp address from a TIPC ndisc > +message */ static int tipc_udp_msg2addr(struct tipc_bearer *b, struct > tipc_media_addr *a, > + char *msg) > +{ > + struct sockaddr_in *sin; > + > + sin = (struct sockaddr_in *)(msg + TIPC_MEDIA_ADDR_OFFSET); > + if (msg[TIPC_MEDIA_TYPE_OFFSET] != TIPC_MEDIA_TYPE_UDP) > + return -EINVAL; > + tipc_udp_media_addr_set(a, sin); > + return 0; > +} > + > +/* tipc_udp_addr2msg - write an ip/udp address to a TIPC ndisc message > +*/ static int tipc_udp_addr2msg(char *msg, struct tipc_media_addr *a) { > + memset(msg, 0, TIPC_MEDIA_INFO_SIZE); > + msg[TIPC_MEDIA_TYPE_OFFSET] = TIPC_MEDIA_TYPE_UDP; > + memcpy(msg + TIPC_MEDIA_ADDR_OFFSET, &a->value, > + sizeof(struct sockaddr_in)); > + return 0; > +} > + > +/* tipc_udp_send - send a TIPC buffer to the bearer socket */ static > +void tipc_udp_send(struct work_struct *work) { > + struct msghdr msg; > + struct kvec iov; > + struct sk_buff *skb; > + struct udp_skb_parms *parms; > + struct udp_bearer *ub; > + int err; > + > + ub = container_of(work, struct udp_bearer, tx_work); > + memset(&msg, 0, sizeof(struct msghdr)); > + while (!list_empty(&ub->txq)) { > + spin_lock_bh(&ub->txq_lock); > + parms = list_first_entry(&ub->txq, struct udp_skb_parms, > next); Same objection as for the previous version: dynamic allocations on the critical data path should be avoided. Why not use TIPC_SKB_CB for this, as I suggested? > + list_del(&parms->next); > + spin_unlock_bh(&ub->txq_lock); > + skb = parms->skb; > + iov.iov_base = skb->data; > + iov.iov_len = skb->len; > + msg.msg_iter.iov = (struct iovec *)&iov; > + msg.msg_name = &parms->dst; > + msg.msg_namelen = sizeof(struct sockaddr_in); > + err = kernel_sendmsg(ub->transmit, &msg, &iov, 1, > iov.iov_len); > + /* no route to host, interface down or other underlying error > */ > + if (unlikely(err < 0)) > + pr_warn_ratelimited("sendmsg failed with error: > %d\n", > + err); > + kfree(parms); > + consume_skb(skb); > + cond_resched(); > + } > +} > + > +/** > + * tipc_send_msg - enqueue a send request > + * > + * The send request need to be deferred since we cannot call kernel > + * socket API functions while holding the node spinlock. > + */ Yes. Unfortunately this is necessary for now, but we should aim at releasing the node spinlock before sending the buffers, on all bearers. This will probably have some positive impact on performance. ///jon > +static int tipc_udp_send_msg(struct sk_buff *skb, struct tipc_bearer *b, > + struct tipc_media_addr *dest) > +{ > + struct udp_bearer *ub = b->media_ptr; > + struct sk_buff *clone; > + struct udp_skb_parms *parms; > + > + clone = skb_clone(skb, GFP_ATOMIC); > + if (!clone) > + return -ENOMEM; > + parms = kmalloc(sizeof(*parms), GFP_ATOMIC); > + if (!parms) { > + kfree_skb(clone); > + return -ENOMEM; > + } > + > + /* Ndisc code uses temporary stack allocated media_addr > + * so we must copy it before deferring > + */ > + memcpy(&parms->dst, dest, sizeof(struct sockaddr_in)); > + parms->skb = clone; > + spin_lock_bh(&ub->txq_lock); > + list_add_tail(&parms->next, &ub->txq); > + spin_unlock_bh(&ub->txq_lock); > + queue_work(tipc_udp_send_wq, &ub->tx_work); > + return 0; > +} > + > +/* tipc_udp_recv - read data from bearer socket */ static void > +tipc_udp_recv(struct sock *sk) { > + struct sk_buff *skb; > + struct udp_bearer *ub; > + struct tipc_bearer *b; > + int err; > + > + skb = skb_recv_datagram(sk, 0, MSG_DONTWAIT, &err); > + if (err == -EAGAIN) > + return; > + read_lock(&sk->sk_callback_lock); > + ub = sk->sk_user_data; > + read_unlock(&sk->sk_callback_lock); > + if (!ub) { > + pr_err_ratelimited("failed to get UDP bearer reference"); > + kfree_skb(skb); > + return; > + } > + skb_pull(skb, sizeof(struct udphdr)); > + skb->next = NULL; > + skb_orphan(skb); > + rcu_read_lock(); > + b = rcu_dereference_rtnl(ub->bearer); > + if (b) { > + tipc_rcv(skb, b); > + rcu_read_unlock(); > + return; > + } > + rcu_read_unlock(); > + kfree_skb(skb); > +} > + > +/* init_mcast_bearer - set up igmp membership for a bearer */ static > +int init_mcast_bearer(struct sockaddr_in *ifaddr, > + struct udp_bearer *ub) > +{ > + struct ip_mreq mreq; > + struct tipc_bearer *b; > + struct sockaddr_in *bcast; > + const int off = 0; > + int err; > + > + b = ub->bearer; > + bcast = (struct sockaddr_in *)&b->bcast_addr.value; > + mreq.imr_multiaddr.s_addr = bcast->sin_addr.s_addr; > + mreq.imr_interface.s_addr = ifaddr->sin_addr.s_addr; > + /* Join the multicast group for our TIPC network. */ > + err = kernel_setsockopt(ub->transmit, IPPROTO_IP, > IP_ADD_MEMBERSHIP, > + (char *)&mreq, sizeof(mreq)); > + if (err) { > + pr_err("Failed to join multicast group\n"); > + return err; > + } > + /* Turn off multicast loop to avoid getting our own ndisc messages */ > + err = kernel_setsockopt(ub->transmit, IPPROTO_IP, > IP_MULTICAST_LOOP, > + (char *)&off, sizeof(off)); > + if (err) { > + pr_err("Failed to disable multicast loop\n"); > + return err; > + } > + /* Assign the egress interface for tipc discovery mcast messages */ > + err = kernel_setsockopt(ub->transmit, IPPROTO_IP, > IP_MULTICAST_IF, > + (char *)&mreq.imr_interface.s_addr, > + sizeof(struct in_addr)); > + if (err) { > + pr_err("Failed to set egress multicast interface\n"); > + return err; > + } > + err = kernel_bind(ub->transmit, (struct sockaddr *)bcast, > + sizeof(struct sockaddr_in)); > + if (err) > + pr_err("Failed to bind multicast bearer socket\n"); > + return err; > +} > + > +/** > + * setup_bearer - deferred udp bearer initialization > + * @work: work struct holding the udp bearer pointer > + * > + * create and initialize the listen and transmit udp sockets */ static > +void setup_bearer(struct work_struct *work) { > + struct net_device *dev; > + struct sockaddr_in *listen; > + struct sockaddr_in *bcast; > + struct udp_bearer *ub; > + struct tipc_bearer *b; > + int err; > + > + ub = container_of(work, struct udp_bearer, work); > + rcu_read_lock(); > + b = rcu_dereference_rtnl(ub->bearer); > + listen = (struct sockaddr_in *)&b->addr; > + dev = __ip_dev_find(&init_net, listen->sin_addr.s_addr, false); > + if (!dev) { > + pr_err("Device lookup for %pI4 failed\n", > + &listen->sin_addr.s_addr); > + err = -ENODEV; > + goto out; > + } > + b->mtu = dev->mtu - sizeof(struct iphdr) - sizeof(struct udphdr); > + err = sock_create_kern(AF_INET, SOCK_DGRAM, 0, &ub->listen); > + if (err) { > + pr_err("Failed to create bearer listen socket"); > + goto out; > + } > + err = kernel_bind(ub->listen, (struct sockaddr *)listen, > + sizeof(struct sockaddr_in)); > + if (err) { > + pr_err("Failed to bind bearer listen socket"); > + goto out; > + } > + if (sock_create_kern(AF_INET, SOCK_DGRAM, 0, &ub->transmit)) > + goto out; > + bcast = (struct sockaddr_in *)&b->bcast_addr.value; > + if (ipv4_is_multicast(bcast->sin_addr.s_addr)) { > + err = init_mcast_bearer(listen, ub); > + if (err) > + goto out; > + } > + write_lock_bh(&ub->transmit->sk->sk_callback_lock); > + ub->transmit->sk->sk_data_ready = tipc_udp_recv; > + ub->transmit->sk->sk_user_data = ub; > + write_unlock_bh(&ub->transmit->sk->sk_callback_lock); > + > + write_lock_bh(&ub->listen->sk->sk_callback_lock); > + ub->listen->sk->sk_data_ready = tipc_udp_recv; > + ub->listen->sk->sk_user_data = ub; > + write_unlock_bh(&ub->listen->sk->sk_callback_lock); > + > + rcu_read_unlock(); > + return; > +out: > + pr_err("UDP bearer setup failed (errno=%d)\n", err); > + rtnl_lock(); > + tipc_disable_bearer(b->name); > + rtnl_unlock(); > +} > + > +/** > + * parse_options - parse udp bearer configuration > + * @arg: bearer configuration string, including media name > + * @local: output struct holding local ip/port > + * @remote: output struct holding remote ip/port > + */ > +static int parse_options(char *arg, struct sockaddr_in *local, > + struct sockaddr_in *remote) > +{ > + char *opt = NULL; > + char str[TIPC_MAX_BEARER_NAME]; > + char *tmp; > + unsigned long port; > + > + local->sin_family = AF_INET; > + remote->sin_family = AF_INET; > + strlcpy(str, arg, TIPC_MAX_BEARER_NAME); > + tmp = str; > + /* Skip media name */ > + opt = strsep(&tmp, ":"); > + if (!opt) > + return -EINVAL; > + /* Get the local address (mandatory) */ > + opt = strsep(&tmp, ":"); > + if (!opt || > + !in4_pton(opt, -1, (u8 *)&local->sin_addr.s_addr, 0, NULL)) { > + pr_err("Invalid local address %s\n", opt); > + return -EINVAL; > + } > + /* Get the local port (optional) */ > + opt = strsep(&tmp, ":"); > + if (opt) { > + if (0 == kstrtoul(opt, 10, &port) && port > 0 && port < 65535) { > + local->sin_port = htons(port); > + } else { > + pr_err("Invalid local port %s\n", opt); > + return -EINVAL; > + } > + } else { > + local->sin_port = htons(UDP_PORT_DEFAULT); > + } > + /* Get the discovery/peer address (optional) */ > + opt = strsep(&tmp, ":"); > + if (opt) { > + if (!in4_pton(opt, -1, (u8 *)&remote->sin_addr.s_addr, > + 0, NULL)) { > + pr_err("Invalid discover/peer address %s\n", opt); > + return -EINVAL; > + } > + } else { > + /* Generate discovery address based on network ID */ > + remote->sin_addr.s_addr = htonl((228 << 24) | > + ((tipc_net_id >> 8) << 8) | > + (tipc_net_id & 0xFF)); > + } > + /* Get the remote port (optional) */ > + opt = strsep(&tmp, ":"); > + if (opt) { > + if (0 == kstrtoul(opt, 10, &port) && port > 0 && port < 65535) { > + remote->sin_port = htons(port); > + } else { > + pr_err("Invalid remote port %s\n", opt); > + return -EINVAL; > + } > + } else { > + remote->sin_port = htons(UDP_PORT_DEFAULT); > + } > + return 0; > +} > + > +/** > + * tipc_udp_enable - callback to create a new udp bearer instance > + * @b: pointer to generic tipc_bearer > + * > + * validate the bearer parameters and perform basic initialization of > +the > + * udp_bearer, the kernel socket setup is deferred */ static int > +tipc_udp_enable(struct tipc_bearer *b) { > + struct udp_bearer *ub; > + struct sockaddr_in listen; > + struct sockaddr_in *bcast; > + > + ub = kzalloc(sizeof(*ub), GFP_ATOMIC); > + if (!ub) > + return -ENOMEM; > + > + bcast = (struct sockaddr_in *)&b->bcast_addr.value; > + memset(bcast, 0, sizeof(b->bcast_addr.value)); > + memset(&listen, 0, sizeof(listen)); > + if (parse_options(b->name, &listen, bcast) == -EINVAL) { > + kfree(ub); > + return -EINVAL; > + } > + b->bcast_addr.media_id = TIPC_MEDIA_TYPE_UDP; > + b->bcast_addr.broadcast = 1; > + INIT_LIST_HEAD(&ub->txq); > + spin_lock_init(&ub->txq_lock); > + INIT_WORK(&ub->tx_work, tipc_udp_send); > + b->media_ptr = ub; > + rcu_assign_pointer(ub->bearer, b); > + tipc_udp_media_addr_set(&b->addr, &listen); > + > + INIT_WORK(&ub->work, setup_bearer); > + schedule_work(&ub->work); > + return 0; > +} > + > +static void clean_bearer_txq(struct udp_bearer *ub) { > + struct udp_skb_parms *parm, *safe; > + > + spin_lock_bh(&ub->txq_lock); > + list_for_each_entry_safe(parm, safe, &ub->txq, next) { > + list_del(&parm->next); > + kfree_skb(parm->skb); > + kfree(parm); > + } > + spin_unlock_bh(&ub->txq_lock); > +} > + > +/* cleanup_bearer - break the socket/bearer association */ static void > +cleanup_bearer(struct work_struct *work) { > + struct udp_bearer *ub = container_of(work, struct udp_bearer, > work); > + > + if (ub->listen) > + sock_release(ub->listen); > + if (ub->transmit) > + sock_release(ub->transmit); > + clean_bearer_txq(ub); > + flush_work(&ub->tx_work); > + kfree(ub); > +} > + > +/* tipc_udp_disable - detach bearer from socket */ static void > +tipc_udp_disable(struct tipc_bearer *b) { > + struct udp_bearer *ub = b->media_ptr; > + > + if (ub->listen) > + sock_set_flag(ub->listen->sk, SOCK_DEAD); > + if (ub->transmit) > + sock_set_flag(ub->transmit->sk, SOCK_DEAD); > + b->media_ptr = NULL; > + rcu_assign_pointer(ub->bearer, NULL); > + INIT_WORK(&ub->work, cleanup_bearer); > + schedule_work(&ub->work); > +} > + > +struct tipc_media udp_media_info = { > + .send_msg = tipc_udp_send_msg, > + .enable_media = tipc_udp_enable, > + .disable_media = tipc_udp_disable, > + .addr2str = tipc_udp_addr2str, > + .addr2msg = tipc_udp_addr2msg, > + .msg2addr = tipc_udp_msg2addr, > + .priority = TIPC_DEF_LINK_PRI, > + .tolerance = TIPC_DEF_LINK_TOL, > + .window = TIPC_DEF_LINK_WIN, > + .type_id = TIPC_MEDIA_TYPE_UDP, > + .hwaddr_len = 20, > + .name = "udp" > +}; > -- > 2.1.3 |
From: Ying X. <yin...@wi...> - 2015-01-06 08:24:41
|
>> +/** >> + * tipc_send_msg - enqueue a send request >> + * >> + * The send request need to be deferred since we cannot call kernel >> + * socket API functions while holding the node spinlock. >> + */ > > Yes. Unfortunately this is necessary for now, but we should aim at > releasing the node spinlock before sending the buffers, on all > bearers. This will probably have some positive impact on performance. > About the problem, I have an idea avoiding to always take node lock during sending path: currently it's hard for us to release node lock before sending SKBs to bearer. Instead if we can exactly simulate the behaviour of socket lock when implementing node lock, we should be able to eliminate the limitation that bearer sending function must be atomic. Precisely speaking, on sending patch, tipc_node_lock() simulates the implementation of lock_sock(). It first checks whether its owner flag is zero under node spin lock's protection. If the flag is set, the caller of tipc_node_lock() would be blocked until the owner is released, otherwise, the caller can continue to operate any member of node structure and go on other operations without node spin lock being token. So without node spin lock, we are not afraid that the sending function of UDP bearer may be blocked. Of course, tipc_node_unlock() also needs to wakeup processes that are being blocked as they do not obtain node' owner. But on receiving path, we have to introduce a backlog queue for link. If node's owner is set on receiving path, the got SKB should be queued to the backlog queue, otherwise, it can be put into deferred_queue or directly deliver to socket layer. As a new input buffer queue is introduced in Jon's series (" tipc: resolve message disordering problem"), maybe it's easier for us to implement the idea based on it :) Regards, Ying |
From: Erik H. <eri...@er...> - 2015-01-16 11:51:20
|
On Mon, Jan 05, 2015 at 06:11:45PM +0100, Jon Maloy wrote: > > #define MAX_BEARERS 2 > > This has already been extended to 8 upstream. > You will have a conflict. No it's still 2 there. > > +void tipc_udp_send(struct work_struct *work) { > > + struct msghdr msg; > > + struct kvec iov; > > + struct sk_buff *skb; > > + struct udp_skb_parms *parms; > > + struct udp_bearer *ub; > > + int err; > > + > > + ub = container_of(work, struct udp_bearer, tx_work); > > + memset(&msg, 0, sizeof(struct msghdr)); > > + while (!list_empty(&ub->txq)) { > > + spin_lock_bh(&ub->txq_lock); > > + parms = list_first_entry(&ub->txq, struct udp_skb_parms, > > next); > > Same objection as for the previous version: dynamic allocations on the > critical data path should be avoided. Why not use TIPC_SKB_CB for this, > as I suggested? Fixed, thanks. > > +/** > > + * tipc_send_msg - enqueue a send request > > + * > > + * The send request need to be deferred since we cannot call kernel > > + * socket API functions while holding the node spinlock. > > + */ > > Yes. Unfortunately this is necessary for now, but we should aim at > releasing the node spinlock before sending the buffers, on all > bearers. This will probably have some positive impact on performance. > What's holding me from sending another version now is the skb reallocation from kernel_sendmsg(). On lowmem systems this is causing the kworker thread to spin heavily on memory allocation because it has run out of memory in the kmalloc-2048 pool. It should be possible to reuse the TIPC skb and pass that to the IP stack, but i don't know exactly how to do this. //E |
From: Ying X. <yin...@wi...> - 2015-01-19 02:44:11
|
On 01/16/2015 07:49 PM, Erik Hugne wrote: >>> > > +/** >>> > > + * tipc_send_msg - enqueue a send request >>> > > + * >>> > > + * The send request need to be deferred since we cannot call kernel >>> > > + * socket API functions while holding the node spinlock. >>> > > + */ >> > >> > Yes. Unfortunately this is necessary for now, but we should aim at >> > releasing the node spinlock before sending the buffers, on all >> > bearers. This will probably have some positive impact on performance. >> > > What's holding me from sending another version now is the skb reallocation > from kernel_sendmsg(). > On lowmem systems this is causing the kworker thread to spin heavily on memory > allocation because it has run out of memory in the kmalloc-2048 pool. > It should be possible to reuse the TIPC skb and pass that to the IP stack, but > i don't know exactly how to do this. If this is another reason why we have to send TIPC buffers in a separate thread, maybe we should seriously consider the idea I proposed before. If we are really able to make the TIPC sending path non-atomic, this is not only of benefit to UDP bearer, but also it gives TIPC an extreme flexibility, for example, we can use all kinds of tunnels implemented in networking stack as TIPC bearers, such as, ipip, vti, vxlan, ipv6_tunnels etc. Regards, Ying |
From: Erik H. <eri...@er...> - 2015-01-19 08:48:37
|
On Mon, Jan 19, 2015 at 10:43:56AM +0800, Ying Xue wrote: > On 01/16/2015 07:49 PM, Erik Hugne wrote: > >>> > > +/** > >>> > > + * tipc_send_msg - enqueue a send request > >>> > > + * > >>> > > + * The send request need to be deferred since we cannot call kernel > >>> > > + * socket API functions while holding the node spinlock. > >>> > > + */ > >> > > >> > Yes. Unfortunately this is necessary for now, but we should aim at > >> > releasing the node spinlock before sending the buffers, on all > >> > bearers. This will probably have some positive impact on performance. > >> > > > What's holding me from sending another version now is the skb reallocation > > from kernel_sendmsg(). > > On lowmem systems this is causing the kworker thread to spin heavily on memory > > allocation because it has run out of memory in the kmalloc-2048 pool. > > It should be possible to reuse the TIPC skb and pass that to the IP stack, but > > i don't know exactly how to do this. > > > If this is another reason why we have to send TIPC buffers in a separate > thread, maybe we should seriously consider the idea I proposed before. > If we are really able to make the TIPC sending path non-atomic, this is > not only of benefit to UDP bearer, but also it gives TIPC an extreme > flexibility, for example, we can use all kinds of tunnels implemented in > networking stack as TIPC bearers, such as, ipip, vti, vxlan, > ipv6_tunnels etc. > Making the send path non-atomic would help in the sense that we'd get rid of the bearer-level send workqueue and related stuff, but there's still the issue with each packet going through kernel_sendmsg is reallocated/copied to a new skb. I'm wondering if this would be feasible: 1. export the udp_send_skb() function 2. skb_push(sizeof udphdr) on the TIPC SKB. 3. Do a routing lookup for the destination ip and pass that, together with the TIPC SKB to udp_send_skb() //E |
From: Ying X. <yin...@wi...> - 2015-01-19 10:39:16
|
On 01/19/2015 04:47 PM, Erik Hugne wrote: > On Mon, Jan 19, 2015 at 10:43:56AM +0800, Ying Xue wrote: >> On 01/16/2015 07:49 PM, Erik Hugne wrote: >>>>>>> +/** >>>>>>> + * tipc_send_msg - enqueue a send request >>>>>>> + * >>>>>>> + * The send request need to be deferred since we cannot call kernel >>>>>>> + * socket API functions while holding the node spinlock. >>>>>>> + */ >>>>> >>>>> Yes. Unfortunately this is necessary for now, but we should aim at >>>>> releasing the node spinlock before sending the buffers, on all >>>>> bearers. This will probably have some positive impact on performance. >>>>> >>> What's holding me from sending another version now is the skb reallocation >>> from kernel_sendmsg(). >>> On lowmem systems this is causing the kworker thread to spin heavily on memory >>> allocation because it has run out of memory in the kmalloc-2048 pool. >>> It should be possible to reuse the TIPC skb and pass that to the IP stack, but >>> i don't know exactly how to do this. >> >> >> If this is another reason why we have to send TIPC buffers in a separate >> thread, maybe we should seriously consider the idea I proposed before. >> If we are really able to make the TIPC sending path non-atomic, this is >> not only of benefit to UDP bearer, but also it gives TIPC an extreme >> flexibility, for example, we can use all kinds of tunnels implemented in >> networking stack as TIPC bearers, such as, ipip, vti, vxlan, >> ipv6_tunnels etc. >> > > Making the send path non-atomic would help in the sense that we'd get rid of the > bearer-level send workqueue and related stuff, but there's still the issue with > each packet going through kernel_sendmsg is reallocated/copied to a new skb. > Until now, I eventually understood your met issue. > I'm wondering if this would be feasible: > 1. export the udp_send_skb() function > 2. skb_push(sizeof udphdr) on the TIPC SKB. > 3. Do a routing lookup for the destination ip and pass that, together with the > TIPC SKB to udp_send_skb() > It sounds like this is a hack method. Instead, we can understand how these tunnels deal with our encountered issue as long as we look through the tunnel interfaces implemented in networking with UDP socket. For example, if we take a closer look at VXLAN code, we will find a better approach to solve our issue. Actually VXLAN uses udp_tunnel to overcome the difficulty: 1. Create UDP socket: vxlan_create_sock() udp_sock_create() 2. Deliver SKB: vxlan_xmit_one() vxlan_xmit_skb() udp_tunnel_xmit_skb() In the entire sending path, no SKB copy happens. Moreover, when we use the interfaces provided by udp_tunnel, both IPV4 and IPv6 are supported :) It seems that this is a perfect solution for us :) Maybe my analysis is not completely right as I just look through the code of VXLAN. If I am wrong, please correct me :) Regards, Ying > > //E > > |
From: Erik H. <eri...@er...> - 2015-02-06 11:52:51
|
On Mon, Jan 19, 2015 at 06:38:56PM +0800, Ying Xue wrote: > > I'm wondering if this would be feasible: > > 1. export the udp_send_skb() function > > 2. skb_push(sizeof udphdr) on the TIPC SKB. > > 3. Do a routing lookup for the destination ip and pass that, together with the > > TIPC SKB to udp_send_skb() > > > > It sounds like this is a hack method. Instead, we can understand how > these tunnels deal with our encountered issue as long as we look through > the tunnel interfaces implemented in networking with UDP socket. For > example, if we take a closer look at VXLAN code, we will find a better > approach to solve our issue. Actually VXLAN uses udp_tunnel to overcome > the difficulty: > > 1. Create UDP socket: > > vxlan_create_sock() > udp_sock_create() > > 2. Deliver SKB: > > vxlan_xmit_one() > vxlan_xmit_skb() > udp_tunnel_xmit_skb() > > In the entire sending path, no SKB copy happens. Moreover, when we use > the interfaces provided by udp_tunnel, both IPV4 and IPv6 are supported :) > > It seems that this is a perfect solution for us :) > > Maybe my analysis is not completely right as I just look through the > code of VXLAN. If I am wrong, please correct me :) Thanks for the tip! I replaced all deferred sending with something like this: clone = skb_clone(skb, GFP_ATOMIC); /*Routing lookup*/ memset(&fl, 0, sizeof(fl)); fl.daddr = dst->sin_addr.s_addr; fl.saddr = src->sin_addr.s_addr; fl.flowi4_mark = clone->mark; fl.flowi4_proto = IPPROTO_UDP; rt = ip_route_output_key(net, &fl); if (IS_ERR(rt)) { err = PTR_ERR(rt); pr_err("routing lookup failed\n"); } df = htons(IP_DF); clone->ignore_df = 0; skb_set_inner_protocol(clone, htons(ETH_P_TIPC)); sent = udp_tunnel_xmit_skb(ub->transmit, rt, clone, src->sin_addr.s_addr, dst->sin_addr.s_addr, 0, 255, df, src->sin_port, dst->sin_port, false); Works perfectly :) (i had to expand the BUF_HEADROM for tipc messages to have enough headroom for an IP+UDP header) //E |
From: Ying X. <yin...@wi...> - 2015-02-09 02:15:42
|
On 02/06/2015 07:50 PM, Erik Hugne wrote: > On Mon, Jan 19, 2015 at 06:38:56PM +0800, Ying Xue wrote: >>> I'm wondering if this would be feasible: >>> 1. export the udp_send_skb() function >>> 2. skb_push(sizeof udphdr) on the TIPC SKB. >>> 3. Do a routing lookup for the destination ip and pass that, together with the >>> TIPC SKB to udp_send_skb() >>> >> >> It sounds like this is a hack method. Instead, we can understand how >> these tunnels deal with our encountered issue as long as we look through >> the tunnel interfaces implemented in networking with UDP socket. For >> example, if we take a closer look at VXLAN code, we will find a better >> approach to solve our issue. Actually VXLAN uses udp_tunnel to overcome >> the difficulty: >> >> 1. Create UDP socket: >> >> vxlan_create_sock() >> udp_sock_create() >> >> 2. Deliver SKB: >> >> vxlan_xmit_one() >> vxlan_xmit_skb() >> udp_tunnel_xmit_skb() >> >> In the entire sending path, no SKB copy happens. Moreover, when we use >> the interfaces provided by udp_tunnel, both IPV4 and IPv6 are supported :) >> >> It seems that this is a perfect solution for us :) >> >> Maybe my analysis is not completely right as I just look through the >> code of VXLAN. If I am wrong, please correct me :) > > Thanks for the tip! > I replaced all deferred sending with something like this: > clone = skb_clone(skb, GFP_ATOMIC); > /*Routing lookup*/ > memset(&fl, 0, sizeof(fl)); > fl.daddr = dst->sin_addr.s_addr; > fl.saddr = src->sin_addr.s_addr; > fl.flowi4_mark = clone->mark; > fl.flowi4_proto = IPPROTO_UDP; > > rt = ip_route_output_key(net, &fl); > if (IS_ERR(rt)) { > err = PTR_ERR(rt); > pr_err("routing lookup failed\n"); > } > > df = htons(IP_DF); > clone->ignore_df = 0; > skb_set_inner_protocol(clone, htons(ETH_P_TIPC)); > > sent = udp_tunnel_xmit_skb(ub->transmit, rt, clone, src->sin_addr.s_addr, > dst->sin_addr.s_addr, 0, 255, df, src->sin_port, > dst->sin_port, false); > > Works perfectly :) > (i had to expand the BUF_HEADROM for tipc messages to have enough headroom for > an IP+UDP header) > Good news! :) Regards, Ying > //E > > |