You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(9) |
Feb
(11) |
Mar
(22) |
Apr
(73) |
May
(78) |
Jun
(146) |
Jul
(80) |
Aug
(27) |
Sep
(5) |
Oct
(14) |
Nov
(18) |
Dec
(27) |
2005 |
Jan
(20) |
Feb
(30) |
Mar
(19) |
Apr
(28) |
May
(50) |
Jun
(31) |
Jul
(32) |
Aug
(14) |
Sep
(36) |
Oct
(43) |
Nov
(74) |
Dec
(63) |
2006 |
Jan
(34) |
Feb
(32) |
Mar
(21) |
Apr
(76) |
May
(106) |
Jun
(72) |
Jul
(70) |
Aug
(175) |
Sep
(130) |
Oct
(39) |
Nov
(81) |
Dec
(43) |
2007 |
Jan
(81) |
Feb
(36) |
Mar
(20) |
Apr
(43) |
May
(54) |
Jun
(34) |
Jul
(44) |
Aug
(55) |
Sep
(44) |
Oct
(54) |
Nov
(43) |
Dec
(41) |
2008 |
Jan
(42) |
Feb
(84) |
Mar
(73) |
Apr
(30) |
May
(119) |
Jun
(54) |
Jul
(54) |
Aug
(93) |
Sep
(173) |
Oct
(130) |
Nov
(145) |
Dec
(153) |
2009 |
Jan
(59) |
Feb
(12) |
Mar
(28) |
Apr
(18) |
May
(56) |
Jun
(9) |
Jul
(28) |
Aug
(62) |
Sep
(16) |
Oct
(19) |
Nov
(15) |
Dec
(17) |
2010 |
Jan
(14) |
Feb
(36) |
Mar
(37) |
Apr
(30) |
May
(33) |
Jun
(53) |
Jul
(42) |
Aug
(50) |
Sep
(67) |
Oct
(66) |
Nov
(69) |
Dec
(36) |
2011 |
Jan
(52) |
Feb
(45) |
Mar
(49) |
Apr
(21) |
May
(34) |
Jun
(13) |
Jul
(19) |
Aug
(37) |
Sep
(43) |
Oct
(10) |
Nov
(23) |
Dec
(30) |
2012 |
Jan
(42) |
Feb
(36) |
Mar
(46) |
Apr
(25) |
May
(96) |
Jun
(146) |
Jul
(40) |
Aug
(28) |
Sep
(61) |
Oct
(45) |
Nov
(100) |
Dec
(53) |
2013 |
Jan
(79) |
Feb
(24) |
Mar
(134) |
Apr
(156) |
May
(118) |
Jun
(75) |
Jul
(278) |
Aug
(145) |
Sep
(136) |
Oct
(168) |
Nov
(137) |
Dec
(439) |
2014 |
Jan
(284) |
Feb
(158) |
Mar
(231) |
Apr
(275) |
May
(259) |
Jun
(91) |
Jul
(222) |
Aug
(215) |
Sep
(165) |
Oct
(166) |
Nov
(211) |
Dec
(150) |
2015 |
Jan
(164) |
Feb
(324) |
Mar
(299) |
Apr
(214) |
May
(111) |
Jun
(109) |
Jul
(105) |
Aug
(36) |
Sep
(58) |
Oct
(131) |
Nov
(68) |
Dec
(30) |
2016 |
Jan
(46) |
Feb
(87) |
Mar
(135) |
Apr
(174) |
May
(132) |
Jun
(135) |
Jul
(149) |
Aug
(125) |
Sep
(79) |
Oct
(49) |
Nov
(95) |
Dec
(102) |
2017 |
Jan
(104) |
Feb
(75) |
Mar
(72) |
Apr
(53) |
May
(18) |
Jun
(5) |
Jul
(14) |
Aug
(19) |
Sep
(2) |
Oct
(13) |
Nov
(21) |
Dec
(67) |
2018 |
Jan
(56) |
Feb
(50) |
Mar
(148) |
Apr
(41) |
May
(37) |
Jun
(34) |
Jul
(34) |
Aug
(11) |
Sep
(52) |
Oct
(48) |
Nov
(28) |
Dec
(46) |
2019 |
Jan
(29) |
Feb
(63) |
Mar
(95) |
Apr
(54) |
May
(14) |
Jun
(71) |
Jul
(60) |
Aug
(49) |
Sep
(3) |
Oct
(64) |
Nov
(115) |
Dec
(57) |
2020 |
Jan
(15) |
Feb
(9) |
Mar
(38) |
Apr
(27) |
May
(60) |
Jun
(53) |
Jul
(35) |
Aug
(46) |
Sep
(37) |
Oct
(64) |
Nov
(20) |
Dec
(25) |
2021 |
Jan
(20) |
Feb
(31) |
Mar
(27) |
Apr
(23) |
May
(21) |
Jun
(30) |
Jul
(30) |
Aug
(7) |
Sep
(18) |
Oct
|
Nov
(15) |
Dec
(4) |
2022 |
Jan
(3) |
Feb
(1) |
Mar
(10) |
Apr
|
May
(2) |
Jun
(26) |
Jul
(5) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(9) |
Dec
(2) |
2023 |
Jan
(4) |
Feb
(4) |
Mar
(5) |
Apr
(10) |
May
(29) |
Jun
(17) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
|
2024 |
Jan
|
Feb
(6) |
Mar
|
Apr
(1) |
May
(6) |
Jun
|
Jul
(5) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jon M. <jon...@er...> - 2019-06-20 13:09:24
|
Acked-by: Jon Maloy <jon...@er...> > -----Original Message----- > From: net...@vg... <net...@vg...> On > Behalf Of Xin Long > Sent: 20-Jun-19 06:39 > To: network dev <ne...@vg...> > Cc: da...@da...; Jon Maloy <jon...@er...>; Ying Xue > <yin...@wi...>; tip...@li... > Subject: [PATCH net] tipc: change to use register_pernet_device > > This patch is to fix a dst defcnt leak, which can be reproduced by doing:.ericsson.com> > > # ip net a c; ip net a s; modprobe tipc > # ip net e s ip l a n eth1 type veth peer n eth1 netns c > # ip net e c ip l s lo up; ip net e c ip l s eth1 up > # ip net e s ip l s lo up; ip net e s ip l s eth1 up > # ip net e c ip a a 1.1.1.2/8 dev eth1 > # ip net e s ip a a 1.1.1.1/8 dev eth1 > # ip net e c tipc b e m udp n u1 localip 1.1.1.2 > # ip net e s tipc b e m udp n u1 localip 1.1.1.1 > # ip net d c; ip net d s; rmmod tipc > > and it will get stuck and keep logging the error: > > unregister_netdevice: waiting for lo to become free. Usage count = 1 > > The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx path > with udp_early_demux == 1, and this dst (eventually holding lo dev) can't be > released as bearer's removal in tipc pernet .exit happens after lo dev's > removal, default_device pernet .exit. > > "There are two distinct types of pernet_operations recognized: subsys and > device. At creation all subsys init functions are called before device > init functions, and at destruction all device exit functions are called > before subsys exit function." > > So by calling register_pernet_device instead to register tipc_net_ops, the > pernet .exit() will be invoked earlier than loopback dev's removal when a > netns is being destroyed, as fou/gue does. > > Note that vxlan and geneve udp tunnels don't have this issue, as the udp sock > is released in their device ndo_stop(). > > This fix is also necessary for tipc dst_cache, which will hold dsts on tx path and > I will introduce in my next patch. > > Reported-by: Li Shuang <sh...@re...> > Signed-off-by: Xin Long <luc...@gm...> > --- > net/tipc/core.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/net/tipc/core.c b/net/tipc/core.c index ed536c0..c837072 100644 > --- a/net/tipc/core.c > +++ b/net/tipc/core.c > @@ -134,7 +134,7 @@ static int __init tipc_init(void) > if (err) > goto out_sysctl; > > - err = register_pernet_subsys(&tipc_net_ops); > + err = register_pernet_device(&tipc_net_ops); > if (err) > goto out_pernet; > > @@ -142,7 +142,7 @@ static int __init tipc_init(void) > if (err) > goto out_socket; > > - err = register_pernet_subsys(&tipc_topsrv_net_ops); > + err = register_pernet_device(&tipc_topsrv_net_ops); > if (err) > goto out_pernet_topsrv; > > @@ -153,11 +153,11 @@ static int __init tipc_init(void) > pr_info("Started in single node mode\n"); > return 0; > out_bearer: > - unregister_pernet_subsys(&tipc_topsrv_net_ops); > + unregister_pernet_device(&tipc_topsrv_net_ops); > out_pernet_topsrv: > tipc_socket_stop(); > out_socket: > - unregister_pernet_subsys(&tipc_net_ops); > + unregister_pernet_device(&tipc_net_ops); > out_pernet: > tipc_unregister_sysctl(); > out_sysctl: > @@ -172,9 +172,9 @@ static int __init tipc_init(void) static void __exit > tipc_exit(void) { > tipc_bearer_cleanup(); > - unregister_pernet_subsys(&tipc_topsrv_net_ops); > + unregister_pernet_device(&tipc_topsrv_net_ops); > tipc_socket_stop(); > - unregister_pernet_subsys(&tipc_net_ops); > + unregister_pernet_device(&tipc_net_ops); > tipc_netlink_stop(); > tipc_netlink_compat_stop(); > tipc_unregister_sysctl(); > -- > 2.1.0 |
From: Jon M. <jon...@er...> - 2019-06-20 12:13:16
|
Acked-by: Jon Maloy jon...@er... > -----Original Message----- > From: John Rutherford <joh...@de...> > Sent: 18-Jun-19 23:44 > To: tip...@li... > Subject: [tipc-discussion] [net-next] tipc: fix missing indentation in source code > > Fix misalignment of policy statement in netlink.c due to automatic spatch code > transformation. > > Fixes: 3b0f31f2b8c9 ("genetlink: make policy common to family") > Signed-off-by: John Rutherford <joh...@de...> > --- > net/tipc/netlink.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c index 99bd166..d6165ad > 100644 > --- a/net/tipc/netlink.c > +++ b/net/tipc/netlink.c > @@ -261,7 +261,7 @@ struct genl_family tipc_genl_family __ro_after_init = > { > .version = TIPC_GENL_V2_VERSION, > .hdrsize = 0, > .maxattr = TIPC_NLA_MAX, > - .policy = tipc_nl_policy, > + .policy = tipc_nl_policy, > .netnsok = true, > .module = THIS_MODULE, > .ops = tipc_genl_v2_ops, > -- > 2.11.0 > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Jon M. <jon...@er...> - 2019-06-20 12:10:48
|
Acked-by: Jon Maloy jon...@er... > -----Original Message----- > From: John Rutherford <joh...@de...> > Sent: 18-Jun-19 20:11 > To: tip...@li... > Subject: [tipc-discussion] [net-next v2] tipc: add loopback device tracking > > Since node internal messages are passed directly to socket it is not possible to > observe this message exchange via tcpdump or wireshark. > > We now remedy this by making it possible to clone such messages and send > the clones to the loopback interface. The clones are dropped at reception and > have no functional role except making the traffic visible. > > The feature is turned on/off by enabling/disabling the loopback "bearer" > "eth:lo". > > Signed-off-by: John Rutherford <joh...@de...> > --- > net/tipc/bcast.c | 4 +++- > net/tipc/bearer.c | 67 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > net/tipc/bearer.h | 3 +++ > net/tipc/core.c | 5 ++++- > net/tipc/core.h | 12 ++++++++++ > net/tipc/node.c | 1 + > net/tipc/topsrv.c | 2 ++ > 7 files changed, 92 insertions(+), 2 deletions(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 6c997d4..235331d > 100644 > --- a/net/tipc/bcast.c > +++ b/net/tipc/bcast.c > @@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct > sk_buff_head *pkts, > rc = tipc_bcast_xmit(net, pkts, cong_link_cnt); > } > > - if (dests->local) > + if (dests->local) { > + tipc_loopback_trace(net, &localq); > tipc_sk_mcast_rcv(net, &localq, &inputq); > + } > exit: > /* This queue should normally be empty by now */ > __skb_queue_purge(pkts); > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 2bed658..27b4fd7 > 100644 > --- a/net/tipc/bearer.c > +++ b/net/tipc/bearer.c > @@ -836,6 +836,12 @@ int __tipc_nl_bearer_disable(struct sk_buff *skb, > struct genl_info *info) > > name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); > > + if (!strcmp(name, "eth:lo")) { > + tipc_net(net)->loopback_trace = false; > + pr_info("Disabled packet tracing on loopback interface\n"); > + return 0; > + } > + > bearer = tipc_bearer_find(net, name); > if (!bearer) > return -EINVAL; > @@ -881,6 +887,12 @@ int __tipc_nl_bearer_enable(struct sk_buff *skb, > struct genl_info *info) > > bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]); > > + if (!strcmp(bearer, "eth:lo")) { > + tipc_net(net)->loopback_trace = true; > + pr_info("Enabled packet tracing on loopback interface\n"); > + return 0; > + } > + > if (attrs[TIPC_NLA_BEARER_DOMAIN]) > domain = nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]); > > @@ -1021,6 +1033,61 @@ int tipc_nl_bearer_set(struct sk_buff *skb, struct > genl_info *info) > return err; > } > > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head > +*xmitq) { > + struct net_device *dev = net->loopback_dev; > + struct sk_buff *skb, *_skb; > + int exp; > + > + skb_queue_walk(xmitq, _skb) { > + skb = pskb_copy(_skb, GFP_ATOMIC); > + if (!skb) > + continue; > + exp = SKB_DATA_ALIGN(dev->hard_header_len - > skb_headroom(skb)); > + if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) { > + kfree_skb(skb); > + continue; > + } > + skb_reset_network_header(skb); > + skb->dev = dev; > + skb->protocol = htons(ETH_P_TIPC); > + dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr, > + dev->dev_addr, skb->len); > + dev_queue_xmit(skb); > + } > +} > + > +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev, > + struct packet_type *pt, struct net_device *od) { > + consume_skb(skb); > + return NET_RX_SUCCESS; > +} > + > +int tipc_attach_loopback(struct net *net) { > + struct net_device *dev = net->loopback_dev; > + struct tipc_net *tn = tipc_net(net); > + > + if (!dev) > + return -ENODEV; > + dev_hold(dev); > + tn->loopback_pt.dev = dev; > + tn->loopback_pt.type = htons(ETH_P_TIPC); > + tn->loopback_pt.func = tipc_loopback_rcv_pkt; > + tn->loopback_trace = false; > + dev_add_pack(&tn->loopback_pt); > + return 0; > +} > + > +void tipc_detach_loopback(struct net *net) { > + struct tipc_net *tn = tipc_net(net); > + > + dev_remove_pack(&tn->loopback_pt); > + dev_put(net->loopback_dev); > +} > + > static int __tipc_nl_add_media(struct tipc_nl_msg *msg, > struct tipc_media *media, int nlflags) { diff --git > a/net/tipc/bearer.h b/net/tipc/bearer.h index 7f4c569..ef7fad9 100644 > --- a/net/tipc/bearer.h > +++ b/net/tipc/bearer.h > @@ -232,6 +232,9 @@ void tipc_bearer_xmit(struct net *net, u32 > bearer_id, > struct tipc_media_addr *dst); > void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, > struct sk_buff_head *xmitq); > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head > +*xmitq); int tipc_attach_loopback(struct net *net); void > +tipc_detach_loopback(struct net *net); > > /* check if device MTU is too low for tipc headers */ static inline bool > tipc_mtu_bad(struct net_device *dev, unsigned int reserve) diff --git > a/net/tipc/core.c b/net/tipc/core.c index ed536c0..1867687 100644 > --- a/net/tipc/core.c > +++ b/net/tipc/core.c > @@ -81,7 +81,9 @@ static int __net_init tipc_init_net(struct net *net) > err = tipc_bcast_init(net); > if (err) > goto out_bclink; > - > + err = tipc_attach_loopback(net); > + if (err) > + goto out_bclink; > return 0; > > out_bclink: > @@ -94,6 +96,7 @@ static int __net_init tipc_init_net(struct net *net) > > static void __net_exit tipc_exit_net(struct net *net) { > + tipc_detach_loopback(net); > tipc_net_stop(net); > tipc_bcast_stop(net); > tipc_nametbl_stop(net); > diff --git a/net/tipc/core.h b/net/tipc/core.h index 7a68e1b..c1c2906 100644 > --- a/net/tipc/core.h > +++ b/net/tipc/core.h > @@ -67,6 +67,7 @@ struct tipc_link; > struct tipc_name_table; > struct tipc_topsrv; > struct tipc_monitor; > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head > +*pkts); > > #define TIPC_MOD_VER "2.0.0" > > @@ -125,6 +126,10 @@ struct tipc_net { > > /* Cluster capabilities */ > u16 capabilities; > + > + /* Tracing of node internal messages */ > + struct packet_type loopback_pt; > + bool loopback_trace; > }; > > static inline struct tipc_net *tipc_net(struct net *net) @@ -152,6 +157,13 > @@ static inline struct tipc_topsrv *tipc_topsrv(struct net *net) > return tipc_net(net)->topsrv; > } > > +static inline void tipc_loopback_trace(struct net *net, > + struct sk_buff_head *pkts) > +{ > + if (unlikely(tipc_net(net)->loopback_trace)) > + tipc_clone_to_loopback(net, pkts); > +} > + > static inline unsigned int tipc_hashfn(u32 addr) { > return addr & (NODE_HTABLE_SIZE - 1); > diff --git a/net/tipc/node.c b/net/tipc/node.c index 9e106d3..7e58831 > 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -1439,6 +1439,7 @@ int tipc_node_xmit(struct net *net, struct > sk_buff_head *list, > int rc; > > if (in_own_node(net, dnode)) { > + tipc_loopback_trace(net, list); > tipc_sk_rcv(net, list); > return 0; > } > diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index f345662..e3a6ba1 > 100644 > --- a/net/tipc/topsrv.c > +++ b/net/tipc/topsrv.c > @@ -40,6 +40,7 @@ > #include "socket.h" > #include "addr.h" > #include "msg.h" > +#include "bearer.h" > #include <net/sock.h> > #include <linux/module.h> > > @@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net, > struct tipc_event *evt) > memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt)); > skb_queue_head_init(&evtq); > __skb_queue_tail(&evtq, skb); > + tipc_loopback_trace(net, &evtq); > tipc_sk_rcv(net, &evtq); > } > > -- > 2.11.0 > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Xin L. <luc...@gm...> - 2019-06-20 11:26:24
|
On Mon, Jun 17, 2019 at 10:28 PM Jon Maloy <jon...@er...> wrote: > > Hi Xin, > As I remember the discussion around introduction of UDP media a few years ago, the developer, Erik Huge, only chose to register TIPC as a udp tunnel user instead of regular udp user because it provides a more efficient way to receive packet in kernel space. > With UDP tunnel, we could receive packet directly in a callback, while TIPC had to run in a work queue thread in order to read packets from the socket. So, in reality we don't need any tunnel at all. Another upside is that it is possible to hook in a GSO callback function from the tunnel user, something I am uncertain if we can do as a regular UDP user. Right, udp tunnel was invented for this kind of encapsulation. To implement this gso callback, we need to require an ipproto number for TIPC, and register the callback into inet_offloads by inet_add_offload(). And on tx path set: skb->encapsulation = 1, skb_shinfo(skb)->gso_type|= SKB_GSO_UDP_TUNNEL, skb->inner_protocol_type = ENCAP_TYPE_IPPROTO. Then it will be called by: dev_queue_xmit() .. -> skb_mac_gso_segment() ... -> udp4_ufo_fragment() -> skb_udp_tunnel_segment() -> skb_udp_tunnel_segment() -> tipc_gso_fragment() btw, do we have an official ipproto number for TIPC already? > Do you have any comments on this? Could it possibly be done differently? > > ///jon > > > > -----Original Message----- > > From: net...@vg... <net...@vg...> On > > Behalf Of Xin Long > > Sent: 17-Jun-19 09:34 > > To: network dev <ne...@vg...> > > Cc: da...@da...; Jon Maloy <jon...@er...>; Ying Xue > > <yin...@wi...>; tip...@li...; Marcelo > > Ricardo Leitner <mar...@gm...>; Neil Horman > > <nh...@tu...>; Su Yanjun <suy...@cn...>; David > > Ahern <ds...@gm...>; syz...@go...; Dmitry > > Vyukov <dv...@go...>; Pravin B Shelar <ps...@ni...> > > Subject: [PATCH net 0/3] net: fix quite a few dst_cache crashes reported by > > syzbot > > > > There are two kinds of crashes reported many times by syzbot with no > > reproducer. Call Traces are like: > > > > BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 > > net/ipv4/route.c:1556 > > rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 > > __mkroute_output net/ipv4/route.c:2332 [inline] > > ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 > > ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 > > __ip_route_output_key include/net/route.h:125 [inline] > > ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 > > ip_route_output_key include/net/route.h:135 [inline] > > ... > > > > or: > > > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 > > <IRQ> > > rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] > > free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 > > __rcu_reclaim kernel/rcu/rcu.h:240 [inline] > > rcu_do_batch kernel/rcu/tree.c:2437 [inline] > > invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] > > rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 > > ... > > > > They were caused by the fib_nh_common percpu member > > 'nhc_pcpu_rth_output' > > overwritten by another percpu variable 'dev->tstats' access overflow in tipc > > udp media xmit path when counting packets on a non tunnel device. > > > > The fix is to make udp tunnel work with no tunnel device by allowing not to > > count packets on the tstats when the tunnel dev is NULL in Patches 1/3 and > > 2/3, then pass a NULL tunnel dev in tipc_udp_tunnel() in Patch 3/3. > > > > Xin Long (3): > > ip_tunnel: allow not to count pkts on tstats by setting skb's dev to > > NULL > > ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL > > tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb > > > > include/net/ip6_tunnel.h | 9 ++++++--- net/ipv4/ip_tunnel_core.c | 9 > > ++++++--- > > net/tipc/udp_media.c | 8 +++----- > > 3 files changed, 15 insertions(+), 11 deletions(-) > > > > -- > > 2.1.0 > |
From: Xin L. <luc...@gm...> - 2019-06-20 11:03:55
|
As other udp/ip tunnels do, tipc udp media should also have a lockless dst_cache supported on its tx path. Here we add dst_cache into udp_replicast to support dst cache for both rmcast and rcast, and rmcast uses ub->rcast and each rcast uses its own node in ub->rcast.list. Signed-off-by: Xin Long <luc...@gm...> --- net/tipc/udp_media.c | 72 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 47 insertions(+), 25 deletions(-) diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 1405ccc..b8962df 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -76,6 +76,7 @@ struct udp_media_addr { /* struct udp_replicast - container for UDP remote addresses */ struct udp_replicast { struct udp_media_addr addr; + struct dst_cache dst_cache; struct rcu_head rcu; struct list_head list; }; @@ -158,22 +159,27 @@ static int tipc_udp_addr2msg(char *msg, struct tipc_media_addr *a) /* tipc_send_msg - enqueue a send request */ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb, struct udp_bearer *ub, struct udp_media_addr *src, - struct udp_media_addr *dst) + struct udp_media_addr *dst, struct dst_cache *cache) { + struct dst_entry *ndst = dst_cache_get(cache); int ttl, err = 0; - struct rtable *rt; if (dst->proto == htons(ETH_P_IP)) { - struct flowi4 fl = { - .daddr = dst->ipv4.s_addr, - .saddr = src->ipv4.s_addr, - .flowi4_mark = skb->mark, - .flowi4_proto = IPPROTO_UDP - }; - rt = ip_route_output_key(net, &fl); - if (IS_ERR(rt)) { - err = PTR_ERR(rt); - goto tx_error; + struct rtable *rt = (struct rtable *)ndst; + + if (!rt) { + struct flowi4 fl = { + .daddr = dst->ipv4.s_addr, + .saddr = src->ipv4.s_addr, + .flowi4_mark = skb->mark, + .flowi4_proto = IPPROTO_UDP + }; + rt = ip_route_output_key(net, &fl); + if (IS_ERR(rt)) { + err = PTR_ERR(rt); + goto tx_error; + } + dst_cache_set_ip4(cache, &rt->dst, fl.saddr); } ttl = ip4_dst_hoplimit(&rt->dst); @@ -182,17 +188,19 @@ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb, dst->port, false, true); #if IS_ENABLED(CONFIG_IPV6) } else { - struct dst_entry *ndst; - struct flowi6 fl6 = { - .flowi6_oif = ub->ifindex, - .daddr = dst->ipv6, - .saddr = src->ipv6, - .flowi6_proto = IPPROTO_UDP - }; - err = ipv6_stub->ipv6_dst_lookup(net, ub->ubsock->sk, &ndst, - &fl6); - if (err) - goto tx_error; + if (!ndst) { + struct flowi6 fl6 = { + .flowi6_oif = ub->ifindex, + .daddr = dst->ipv6, + .saddr = src->ipv6, + .flowi6_proto = IPPROTO_UDP + }; + err = ipv6_stub->ipv6_dst_lookup(net, ub->ubsock->sk, + &ndst, &fl6); + if (err) + goto tx_error; + dst_cache_set_ip6(cache, ndst, &fl6.saddr); + } ttl = ip6_dst_hoplimit(ndst); err = udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb, NULL, &src->ipv6, &dst->ipv6, 0, ttl, 0, @@ -230,7 +238,8 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, } if (addr->broadcast != TIPC_REPLICAST_SUPPORT) - return tipc_udp_xmit(net, skb, ub, src, dst); + return tipc_udp_xmit(net, skb, ub, src, dst, + &ub->rcast.dst_cache); /* Replicast, send an skb to each configured IP address */ list_for_each_entry_rcu(rcast, &ub->rcast.list, list) { @@ -242,7 +251,8 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, goto out; } - err = tipc_udp_xmit(net, _skb, ub, src, &rcast->addr); + err = tipc_udp_xmit(net, _skb, ub, src, &rcast->addr, + &rcast->dst_cache); if (err) goto out; } @@ -286,6 +296,11 @@ static int tipc_udp_rcast_add(struct tipc_bearer *b, if (!rcast) return -ENOMEM; + if (dst_cache_init(&rcast->dst_cache, GFP_ATOMIC)) { + kfree(rcast); + return -ENOMEM; + } + memcpy(&rcast->addr, addr, sizeof(struct udp_media_addr)); if (ntohs(addr->proto) == ETH_P_IP) @@ -742,6 +757,10 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b, tuncfg.encap_destroy = NULL; setup_udp_tunnel_sock(net, ub->ubsock, &tuncfg); + err = dst_cache_init(&ub->rcast.dst_cache, GFP_ATOMIC); + if (err) + goto err; + /** * The bcast media address port is used for all peers and the ip * is used if it's a multicast address. @@ -756,6 +775,7 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b, return 0; err: + dst_cache_destroy(&ub->rcast.dst_cache); if (ub->ubsock) udp_tunnel_sock_release(ub->ubsock); kfree(ub); @@ -769,10 +789,12 @@ static void cleanup_bearer(struct work_struct *work) struct udp_replicast *rcast, *tmp; list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) { + dst_cache_destroy(&rcast->dst_cache); list_del_rcu(&rcast->list); kfree_rcu(rcast, rcu); } + dst_cache_destroy(&ub->rcast.dst_cache); if (ub->ubsock) udp_tunnel_sock_release(ub->ubsock); synchronize_net(); -- 2.1.0 |
From: Xin L. <luc...@gm...> - 2019-06-20 10:39:44
|
This patch is to fix a dst defcnt leak, which can be reproduced by doing: # ip net a c; ip net a s; modprobe tipc # ip net e s ip l a n eth1 type veth peer n eth1 netns c # ip net e c ip l s lo up; ip net e c ip l s eth1 up # ip net e s ip l s lo up; ip net e s ip l s eth1 up # ip net e c ip a a 1.1.1.2/8 dev eth1 # ip net e s ip a a 1.1.1.1/8 dev eth1 # ip net e c tipc b e m udp n u1 localip 1.1.1.2 # ip net e s tipc b e m udp n u1 localip 1.1.1.1 # ip net d c; ip net d s; rmmod tipc and it will get stuck and keep logging the error: unregister_netdevice: waiting for lo to become free. Usage count = 1 The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx path with udp_early_demux == 1, and this dst (eventually holding lo dev) can't be released as bearer's removal in tipc pernet .exit happens after lo dev's removal, default_device pernet .exit. "There are two distinct types of pernet_operations recognized: subsys and device. At creation all subsys init functions are called before device init functions, and at destruction all device exit functions are called before subsys exit function." So by calling register_pernet_device instead to register tipc_net_ops, the pernet .exit() will be invoked earlier than loopback dev's removal when a netns is being destroyed, as fou/gue does. Note that vxlan and geneve udp tunnels don't have this issue, as the udp sock is released in their device ndo_stop(). This fix is also necessary for tipc dst_cache, which will hold dsts on tx path and I will introduce in my next patch. Reported-by: Li Shuang <sh...@re...> Signed-off-by: Xin Long <luc...@gm...> --- net/tipc/core.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/tipc/core.c b/net/tipc/core.c index ed536c0..c837072 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -134,7 +134,7 @@ static int __init tipc_init(void) if (err) goto out_sysctl; - err = register_pernet_subsys(&tipc_net_ops); + err = register_pernet_device(&tipc_net_ops); if (err) goto out_pernet; @@ -142,7 +142,7 @@ static int __init tipc_init(void) if (err) goto out_socket; - err = register_pernet_subsys(&tipc_topsrv_net_ops); + err = register_pernet_device(&tipc_topsrv_net_ops); if (err) goto out_pernet_topsrv; @@ -153,11 +153,11 @@ static int __init tipc_init(void) pr_info("Started in single node mode\n"); return 0; out_bearer: - unregister_pernet_subsys(&tipc_topsrv_net_ops); + unregister_pernet_device(&tipc_topsrv_net_ops); out_pernet_topsrv: tipc_socket_stop(); out_socket: - unregister_pernet_subsys(&tipc_net_ops); + unregister_pernet_device(&tipc_net_ops); out_pernet: tipc_unregister_sysctl(); out_sysctl: @@ -172,9 +172,9 @@ static int __init tipc_init(void) static void __exit tipc_exit(void) { tipc_bearer_cleanup(); - unregister_pernet_subsys(&tipc_topsrv_net_ops); + unregister_pernet_device(&tipc_topsrv_net_ops); tipc_socket_stop(); - unregister_pernet_subsys(&tipc_net_ops); + unregister_pernet_device(&tipc_net_ops); tipc_netlink_stop(); tipc_netlink_compat_stop(); tipc_unregister_sysctl(); -- 2.1.0 |
From: John R. <joh...@de...> - 2019-06-19 03:44:41
|
Fix misalignment of policy statement in netlink.c due to automatic spatch code transformation. Fixes: 3b0f31f2b8c9 ("genetlink: make policy common to family") Signed-off-by: John Rutherford <joh...@de...> --- net/tipc/netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c index 99bd166..d6165ad 100644 --- a/net/tipc/netlink.c +++ b/net/tipc/netlink.c @@ -261,7 +261,7 @@ struct genl_family tipc_genl_family __ro_after_init = { .version = TIPC_GENL_V2_VERSION, .hdrsize = 0, .maxattr = TIPC_NLA_MAX, - .policy = tipc_nl_policy, + .policy = tipc_nl_policy, .netnsok = true, .module = THIS_MODULE, .ops = tipc_genl_v2_ops, -- 2.11.0 |
From: John R. <joh...@de...> - 2019-06-19 00:11:49
|
Since node internal messages are passed directly to socket it is not possible to observe this message exchange via tcpdump or wireshark. We now remedy this by making it possible to clone such messages and send the clones to the loopback interface. The clones are dropped at reception and have no functional role except making the traffic visible. The feature is turned on/off by enabling/disabling the loopback "bearer" "eth:lo". Signed-off-by: John Rutherford <joh...@de...> --- net/tipc/bcast.c | 4 +++- net/tipc/bearer.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ net/tipc/bearer.h | 3 +++ net/tipc/core.c | 5 ++++- net/tipc/core.h | 12 ++++++++++ net/tipc/node.c | 1 + net/tipc/topsrv.c | 2 ++ 7 files changed, 92 insertions(+), 2 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 6c997d4..235331d 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts, rc = tipc_bcast_xmit(net, pkts, cong_link_cnt); } - if (dests->local) + if (dests->local) { + tipc_loopback_trace(net, &localq); tipc_sk_mcast_rcv(net, &localq, &inputq); + } exit: /* This queue should normally be empty by now */ __skb_queue_purge(pkts); diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 2bed658..27b4fd7 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -836,6 +836,12 @@ int __tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info) name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); + if (!strcmp(name, "eth:lo")) { + tipc_net(net)->loopback_trace = false; + pr_info("Disabled packet tracing on loopback interface\n"); + return 0; + } + bearer = tipc_bearer_find(net, name); if (!bearer) return -EINVAL; @@ -881,6 +887,12 @@ int __tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info) bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]); + if (!strcmp(bearer, "eth:lo")) { + tipc_net(net)->loopback_trace = true; + pr_info("Enabled packet tracing on loopback interface\n"); + return 0; + } + if (attrs[TIPC_NLA_BEARER_DOMAIN]) domain = nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]); @@ -1021,6 +1033,61 @@ int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) return err; } +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq) +{ + struct net_device *dev = net->loopback_dev; + struct sk_buff *skb, *_skb; + int exp; + + skb_queue_walk(xmitq, _skb) { + skb = pskb_copy(_skb, GFP_ATOMIC); + if (!skb) + continue; + exp = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb)); + if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) { + kfree_skb(skb); + continue; + } + skb_reset_network_header(skb); + skb->dev = dev; + skb->protocol = htons(ETH_P_TIPC); + dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr, + dev->dev_addr, skb->len); + dev_queue_xmit(skb); + } +} + +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, struct net_device *od) +{ + consume_skb(skb); + return NET_RX_SUCCESS; +} + +int tipc_attach_loopback(struct net *net) +{ + struct net_device *dev = net->loopback_dev; + struct tipc_net *tn = tipc_net(net); + + if (!dev) + return -ENODEV; + dev_hold(dev); + tn->loopback_pt.dev = dev; + tn->loopback_pt.type = htons(ETH_P_TIPC); + tn->loopback_pt.func = tipc_loopback_rcv_pkt; + tn->loopback_trace = false; + dev_add_pack(&tn->loopback_pt); + return 0; +} + +void tipc_detach_loopback(struct net *net) +{ + struct tipc_net *tn = tipc_net(net); + + dev_remove_pack(&tn->loopback_pt); + dev_put(net->loopback_dev); +} + static int __tipc_nl_add_media(struct tipc_nl_msg *msg, struct tipc_media *media, int nlflags) { diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 7f4c569..ef7fad9 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -232,6 +232,9 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, struct tipc_media_addr *dst); void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq); +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq); +int tipc_attach_loopback(struct net *net); +void tipc_detach_loopback(struct net *net); /* check if device MTU is too low for tipc headers */ static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve) diff --git a/net/tipc/core.c b/net/tipc/core.c index ed536c0..1867687 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -81,7 +81,9 @@ static int __net_init tipc_init_net(struct net *net) err = tipc_bcast_init(net); if (err) goto out_bclink; - + err = tipc_attach_loopback(net); + if (err) + goto out_bclink; return 0; out_bclink: @@ -94,6 +96,7 @@ static int __net_init tipc_init_net(struct net *net) static void __net_exit tipc_exit_net(struct net *net) { + tipc_detach_loopback(net); tipc_net_stop(net); tipc_bcast_stop(net); tipc_nametbl_stop(net); diff --git a/net/tipc/core.h b/net/tipc/core.h index 7a68e1b..c1c2906 100644 --- a/net/tipc/core.h +++ b/net/tipc/core.h @@ -67,6 +67,7 @@ struct tipc_link; struct tipc_name_table; struct tipc_topsrv; struct tipc_monitor; +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts); #define TIPC_MOD_VER "2.0.0" @@ -125,6 +126,10 @@ struct tipc_net { /* Cluster capabilities */ u16 capabilities; + + /* Tracing of node internal messages */ + struct packet_type loopback_pt; + bool loopback_trace; }; static inline struct tipc_net *tipc_net(struct net *net) @@ -152,6 +157,13 @@ static inline struct tipc_topsrv *tipc_topsrv(struct net *net) return tipc_net(net)->topsrv; } +static inline void tipc_loopback_trace(struct net *net, + struct sk_buff_head *pkts) +{ + if (unlikely(tipc_net(net)->loopback_trace)) + tipc_clone_to_loopback(net, pkts); +} + static inline unsigned int tipc_hashfn(u32 addr) { return addr & (NODE_HTABLE_SIZE - 1); diff --git a/net/tipc/node.c b/net/tipc/node.c index 9e106d3..7e58831 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1439,6 +1439,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list, int rc; if (in_own_node(net, dnode)) { + tipc_loopback_trace(net, list); tipc_sk_rcv(net, list); return 0; } diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index f345662..e3a6ba1 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -40,6 +40,7 @@ #include "socket.h" #include "addr.h" #include "msg.h" +#include "bearer.h" #include <net/sock.h> #include <linux/module.h> @@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net, struct tipc_event *evt) memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt)); skb_queue_head_init(&evtq); __skb_queue_tail(&evtq, skb); + tipc_loopback_trace(net, &evtq); tipc_sk_rcv(net, &evtq); } -- 2.11.0 |
From: David M. <da...@da...> - 2019-06-18 17:04:11
|
From: Tuong Lien <tuo...@de...> Date: Mon, 17 Jun 2019 11:56:12 +0700 > It appears that a FAILOVER_MSG can come from peer even when the failure > link is resetting (i.e. just after the 'node_write_unlock()'...). This > means the failover procedure on the node has not been started yet. > The situation is as follows: ... > Once this happens, the link failover procedure will be triggered > wrongly on the receiving node since the node isn't in FAILINGOVER state > but then another link failover will be carried out. > The consequences are: > > 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' > was set, reset and set incorrectly, the criteria to end the failover > would not be met, it could keep waiting for a message that has already > received. > > 2) The early FAILOVER_MSG(s) could be queued in the link failover > deferdq but would be purged or not pulled out because the 'drop_point' > was not set correctly. > > 3) The early FAILOVER_MSG(s) could be dropped too. > > 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state > shortly, but later on it would be restarted. > > The same situation can also happen when the link is in PEER_RESET state > and a FAILOVER_MSG arrives. > > The commit resolves the issues by forcing the link down immediately, so > the failover procedure will be started normally (which is the same as > when receiving a FAILOVER_MSG and the link is in up state). > > Also, the function "tipc_node_link_failover()" is toughen to avoid such > a situation from happening. > > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tuong Lien <tuo...@de...> Applied, thank you. |
From: Xue, Y. <Yin...@wi...> - 2019-06-18 12:13:07
|
Hi Tuong. Thank you for your explanation . It makes sense to me. Please go ahead. Thanks, Ying -----Original Message----- From: Tuong Lien Tong [mailto:tuo...@de...] Sent: Monday, June 17, 2019 4:49 PM To: Xue, Ying; tip...@li...; jon...@er...; ma...@do... Subject: RE: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet Hi Ying, Thanks for your comments! Regarding your last statement, yes when making the patch, I noticed that the "tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to think a way to combine them but didn't because of the reasons: 1- The "core" functions to copy the data are different since the "tipc_msg_build()" plays with user data in the iov buffers, whereas, for the other, it's skb data. Also, the outputs are different, the first function will set the messages' type in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with the other because it will overwrite the tunnel messages' type... that I had to use the other field (fragm_no/nof_fragms) to determine this at the receiving side... 2- I don't want to touch the old code that can be risky :( BR/Tuong -----Original Message----- From: Ying Xue <yin...@wi...> Sent: Sunday, June 16, 2019 1:42 PM To: Tuong Lien <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet > 2) The same scenario above can happen more easily in case the MTU of > the links is set differently or when changing. In that case, as long as > a large message in the failure link's transmq queue was built and > fragmented with its link's MTU > the other link's one, the issue will > happen (there is no need of a link synching in advance). > > 3) The link synching procedure also faces with the same issue but since > the link synching is only started upon receipt of a SYNCH_MSG, dropping > the message will not result in a state deadlock, but it is not expected > as design. > > The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 This is the same as previous commit. The commit ID might be invalid after it's merged into upstream. > ("tipc: optimize link synching mechanism") by generating only a dummy > SYNCH_MSG (i.e. without data) at the link synching, so the size of a > FAILOVER_MSG if any then will never exceed the link's MTU. > /** > + * tipc_msg_fragment - build a fragment skb list for TIPC message > + * > + * @skb: TIPC message skb > + * @hdr: internal msg header to be put on the top of the fragments > + * @pktmax: max size of a fragment incl. the header > + * @frags: returned fragment skb list > + * > + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL > + * or -ENOMEM > + */ > +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, > + int pktmax, struct sk_buff_head *frags) > +{ > + int pktno, nof_fragms, dsz, dmax, eat; > + struct tipc_msg *_hdr; > + struct sk_buff *_skb; > + u8 *data; > + > + /* Non-linear buffer? */ > + if (skb_linearize(skb)) > + return -ENOMEM; > + > + data = (u8 *)skb->data; > + dsz = msg_size(buf_msg(skb)); > + dmax = pktmax - INT_H_SIZE; > + > + if (dsz <= dmax || !dmax) > + return -EINVAL; > + > + nof_fragms = dsz / dmax + 1; > + > + for (pktno = 1; pktno <= nof_fragms; pktno++) { > + if (pktno < nof_fragms) > + eat = dmax; > + else > + eat = dsz % dmax; > + > + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC); > + if (!_skb) > + goto error; > + > + skb_orphan(_skb); > + __skb_queue_tail(frags, _skb); > + > + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE); > + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat); > + data += eat; > + > + _hdr = buf_msg(_skb); > + msg_set_fragm_no(_hdr, pktno); > + msg_set_nof_fragms(_hdr, nof_fragms); > + msg_set_size(_hdr, INT_H_SIZE + eat); > + } > + return 0; > + In fact we have similar code in tipc_msg_build() where we also fragment packet if necessary. In order to eliminate redundant code, I suggest we should extract the common code into a separate function and then tipc_msg_build() and tipc_msg_fragment() call it. |
From: Erik H. <eri...@gm...> - 2019-06-18 05:14:15
|
Change in netlink.c unrelated to the commit //E On Tue, 18 Jun 2019, 02:23 John Rutherford, <joh...@de...> wrote: > Since node internal messages are passed directly to socket it is not > possible to observe this message exchange via tcpdump or wireshark. > > We now remedy this by making it possible to clone such messages and send > the clones to the loopback interface. The clones are dropped at reception > and have no functional role except making the traffic visible. > > The feature is turned on/off by enabling/disabling the loopback "bearer" > "eth:lo". > > Signed-off-by: John Rutherford <joh...@de...> > --- > net/tipc/bcast.c | 4 +++- > net/tipc/bearer.c | 67 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > net/tipc/bearer.h | 3 +++ > net/tipc/core.c | 5 +++- > net/tipc/core.h | 12 ++++++++++ > net/tipc/netlink.c | 2 +- > net/tipc/node.c | 1 + > net/tipc/topsrv.c | 2 ++ > 8 files changed, 93 insertions(+), 3 deletions(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > index 6c997d4..235331d 100644 > --- a/net/tipc/bcast.c > +++ b/net/tipc/bcast.c > @@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct > sk_buff_head *pkts, > rc = tipc_bcast_xmit(net, pkts, cong_link_cnt); > } > > - if (dests->local) > + if (dests->local) { > + tipc_loopback_trace(net, &localq); > tipc_sk_mcast_rcv(net, &localq, &inputq); > + } > exit: > /* This queue should normally be empty by now */ > __skb_queue_purge(pkts); > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c > index 2bed658..27b4fd7 100644 > --- a/net/tipc/bearer.c > +++ b/net/tipc/bearer.c > @@ -836,6 +836,12 @@ int __tipc_nl_bearer_disable(struct sk_buff *skb, > struct genl_info *info) > > name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); > > + if (!strcmp(name, "eth:lo")) { > + tipc_net(net)->loopback_trace = false; > + pr_info("Disabled packet tracing on loopback interface\n"); > + return 0; > + } > + > bearer = tipc_bearer_find(net, name); > if (!bearer) > return -EINVAL; > @@ -881,6 +887,12 @@ int __tipc_nl_bearer_enable(struct sk_buff *skb, > struct genl_info *info) > > bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]); > > + if (!strcmp(bearer, "eth:lo")) { > + tipc_net(net)->loopback_trace = true; > + pr_info("Enabled packet tracing on loopback interface\n"); > + return 0; > + } > + > if (attrs[TIPC_NLA_BEARER_DOMAIN]) > domain = nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]); > > @@ -1021,6 +1033,61 @@ int tipc_nl_bearer_set(struct sk_buff *skb, struct > genl_info *info) > return err; > } > > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq) > +{ > + struct net_device *dev = net->loopback_dev; > + struct sk_buff *skb, *_skb; > + int exp; > + > + skb_queue_walk(xmitq, _skb) { > + skb = pskb_copy(_skb, GFP_ATOMIC); > + if (!skb) > + continue; > + exp = SKB_DATA_ALIGN(dev->hard_header_len - > skb_headroom(skb)); > + if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) { > + kfree_skb(skb); > + continue; > + } > + skb_reset_network_header(skb); > + skb->dev = dev; > + skb->protocol = htons(ETH_P_TIPC); > + dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr, > + dev->dev_addr, skb->len); > + dev_queue_xmit(skb); > + } > +} > + > +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device > *dev, > + struct packet_type *pt, struct net_device > *od) > +{ > + consume_skb(skb); > + return NET_RX_SUCCESS; > +} > + > +int tipc_attach_loopback(struct net *net) > +{ > + struct net_device *dev = net->loopback_dev; > + struct tipc_net *tn = tipc_net(net); > + > + if (!dev) > + return -ENODEV; > + dev_hold(dev); > + tn->loopback_pt.dev = dev; > + tn->loopback_pt.type = htons(ETH_P_TIPC); > + tn->loopback_pt.func = tipc_loopback_rcv_pkt; > + tn->loopback_trace = false; > + dev_add_pack(&tn->loopback_pt); > + return 0; > +} > + > +void tipc_detach_loopback(struct net *net) > +{ > + struct tipc_net *tn = tipc_net(net); > + > + dev_remove_pack(&tn->loopback_pt); > + dev_put(net->loopback_dev); > +} > + > static int __tipc_nl_add_media(struct tipc_nl_msg *msg, > struct tipc_media *media, int nlflags) > { > diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h > index 7f4c569..ef7fad9 100644 > --- a/net/tipc/bearer.h > +++ b/net/tipc/bearer.h > @@ -232,6 +232,9 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, > struct tipc_media_addr *dst); > void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, > struct sk_buff_head *xmitq); > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq); > +int tipc_attach_loopback(struct net *net); > +void tipc_detach_loopback(struct net *net); > > /* check if device MTU is too low for tipc headers */ > static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int > reserve) > diff --git a/net/tipc/core.c b/net/tipc/core.c > index ed536c0..1867687 100644 > --- a/net/tipc/core.c > +++ b/net/tipc/core.c > @@ -81,7 +81,9 @@ static int __net_init tipc_init_net(struct net *net) > err = tipc_bcast_init(net); > if (err) > goto out_bclink; > - > + err = tipc_attach_loopback(net); > + if (err) > + goto out_bclink; > return 0; > > out_bclink: > @@ -94,6 +96,7 @@ static int __net_init tipc_init_net(struct net *net) > > static void __net_exit tipc_exit_net(struct net *net) > { > + tipc_detach_loopback(net); > tipc_net_stop(net); > tipc_bcast_stop(net); > tipc_nametbl_stop(net); > diff --git a/net/tipc/core.h b/net/tipc/core.h > index 7a68e1b..c1c2906 100644 > --- a/net/tipc/core.h > +++ b/net/tipc/core.h > @@ -67,6 +67,7 @@ struct tipc_link; > struct tipc_name_table; > struct tipc_topsrv; > struct tipc_monitor; > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts); > > #define TIPC_MOD_VER "2.0.0" > > @@ -125,6 +126,10 @@ struct tipc_net { > > /* Cluster capabilities */ > u16 capabilities; > + > + /* Tracing of node internal messages */ > + struct packet_type loopback_pt; > + bool loopback_trace; > }; > > static inline struct tipc_net *tipc_net(struct net *net) > @@ -152,6 +157,13 @@ static inline struct tipc_topsrv *tipc_topsrv(struct > net *net) > return tipc_net(net)->topsrv; > } > > +static inline void tipc_loopback_trace(struct net *net, > + struct sk_buff_head *pkts) > +{ > + if (unlikely(tipc_net(net)->loopback_trace)) > + tipc_clone_to_loopback(net, pkts); > +} > + > static inline unsigned int tipc_hashfn(u32 addr) > { > return addr & (NODE_HTABLE_SIZE - 1); > diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c > index 99bd166..d6165ad 100644 > --- a/net/tipc/netlink.c > +++ b/net/tipc/netlink.c > @@ -261,7 +261,7 @@ struct genl_family tipc_genl_family __ro_after_init = { > .version = TIPC_GENL_V2_VERSION, > .hdrsize = 0, > .maxattr = TIPC_NLA_MAX, > - .policy = tipc_nl_policy, > + .policy = tipc_nl_policy, > .netnsok = true, > .module = THIS_MODULE, > .ops = tipc_genl_v2_ops, > diff --git a/net/tipc/node.c b/net/tipc/node.c > index 9e106d3..7e58831 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -1439,6 +1439,7 @@ int tipc_node_xmit(struct net *net, struct > sk_buff_head *list, > int rc; > > if (in_own_node(net, dnode)) { > + tipc_loopback_trace(net, list); > tipc_sk_rcv(net, list); > return 0; > } > diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c > index f345662..e3a6ba1 100644 > --- a/net/tipc/topsrv.c > +++ b/net/tipc/topsrv.c > @@ -40,6 +40,7 @@ > #include "socket.h" > #include "addr.h" > #include "msg.h" > +#include "bearer.h" > #include <net/sock.h> > #include <linux/module.h> > > @@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net, > struct tipc_event *evt) > memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt)); > skb_queue_head_init(&evtq); > __skb_queue_tail(&evtq, skb); > + tipc_loopback_trace(net, &evtq); > tipc_sk_rcv(net, &evtq); > } > > -- > 2.11.0 > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > |
From: John R. <joh...@de...> - 2019-06-18 00:22:35
|
Since node internal messages are passed directly to socket it is not possible to observe this message exchange via tcpdump or wireshark. We now remedy this by making it possible to clone such messages and send the clones to the loopback interface. The clones are dropped at reception and have no functional role except making the traffic visible. The feature is turned on/off by enabling/disabling the loopback "bearer" "eth:lo". Signed-off-by: John Rutherford <joh...@de...> --- net/tipc/bcast.c | 4 +++- net/tipc/bearer.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ net/tipc/bearer.h | 3 +++ net/tipc/core.c | 5 +++- net/tipc/core.h | 12 ++++++++++ net/tipc/netlink.c | 2 +- net/tipc/node.c | 1 + net/tipc/topsrv.c | 2 ++ 8 files changed, 93 insertions(+), 3 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 6c997d4..235331d 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts, rc = tipc_bcast_xmit(net, pkts, cong_link_cnt); } - if (dests->local) + if (dests->local) { + tipc_loopback_trace(net, &localq); tipc_sk_mcast_rcv(net, &localq, &inputq); + } exit: /* This queue should normally be empty by now */ __skb_queue_purge(pkts); diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 2bed658..27b4fd7 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -836,6 +836,12 @@ int __tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info) name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); + if (!strcmp(name, "eth:lo")) { + tipc_net(net)->loopback_trace = false; + pr_info("Disabled packet tracing on loopback interface\n"); + return 0; + } + bearer = tipc_bearer_find(net, name); if (!bearer) return -EINVAL; @@ -881,6 +887,12 @@ int __tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info) bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]); + if (!strcmp(bearer, "eth:lo")) { + tipc_net(net)->loopback_trace = true; + pr_info("Enabled packet tracing on loopback interface\n"); + return 0; + } + if (attrs[TIPC_NLA_BEARER_DOMAIN]) domain = nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]); @@ -1021,6 +1033,61 @@ int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) return err; } +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq) +{ + struct net_device *dev = net->loopback_dev; + struct sk_buff *skb, *_skb; + int exp; + + skb_queue_walk(xmitq, _skb) { + skb = pskb_copy(_skb, GFP_ATOMIC); + if (!skb) + continue; + exp = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb)); + if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) { + kfree_skb(skb); + continue; + } + skb_reset_network_header(skb); + skb->dev = dev; + skb->protocol = htons(ETH_P_TIPC); + dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr, + dev->dev_addr, skb->len); + dev_queue_xmit(skb); + } +} + +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, struct net_device *od) +{ + consume_skb(skb); + return NET_RX_SUCCESS; +} + +int tipc_attach_loopback(struct net *net) +{ + struct net_device *dev = net->loopback_dev; + struct tipc_net *tn = tipc_net(net); + + if (!dev) + return -ENODEV; + dev_hold(dev); + tn->loopback_pt.dev = dev; + tn->loopback_pt.type = htons(ETH_P_TIPC); + tn->loopback_pt.func = tipc_loopback_rcv_pkt; + tn->loopback_trace = false; + dev_add_pack(&tn->loopback_pt); + return 0; +} + +void tipc_detach_loopback(struct net *net) +{ + struct tipc_net *tn = tipc_net(net); + + dev_remove_pack(&tn->loopback_pt); + dev_put(net->loopback_dev); +} + static int __tipc_nl_add_media(struct tipc_nl_msg *msg, struct tipc_media *media, int nlflags) { diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 7f4c569..ef7fad9 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -232,6 +232,9 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, struct tipc_media_addr *dst); void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq); +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq); +int tipc_attach_loopback(struct net *net); +void tipc_detach_loopback(struct net *net); /* check if device MTU is too low for tipc headers */ static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve) diff --git a/net/tipc/core.c b/net/tipc/core.c index ed536c0..1867687 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -81,7 +81,9 @@ static int __net_init tipc_init_net(struct net *net) err = tipc_bcast_init(net); if (err) goto out_bclink; - + err = tipc_attach_loopback(net); + if (err) + goto out_bclink; return 0; out_bclink: @@ -94,6 +96,7 @@ static int __net_init tipc_init_net(struct net *net) static void __net_exit tipc_exit_net(struct net *net) { + tipc_detach_loopback(net); tipc_net_stop(net); tipc_bcast_stop(net); tipc_nametbl_stop(net); diff --git a/net/tipc/core.h b/net/tipc/core.h index 7a68e1b..c1c2906 100644 --- a/net/tipc/core.h +++ b/net/tipc/core.h @@ -67,6 +67,7 @@ struct tipc_link; struct tipc_name_table; struct tipc_topsrv; struct tipc_monitor; +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts); #define TIPC_MOD_VER "2.0.0" @@ -125,6 +126,10 @@ struct tipc_net { /* Cluster capabilities */ u16 capabilities; + + /* Tracing of node internal messages */ + struct packet_type loopback_pt; + bool loopback_trace; }; static inline struct tipc_net *tipc_net(struct net *net) @@ -152,6 +157,13 @@ static inline struct tipc_topsrv *tipc_topsrv(struct net *net) return tipc_net(net)->topsrv; } +static inline void tipc_loopback_trace(struct net *net, + struct sk_buff_head *pkts) +{ + if (unlikely(tipc_net(net)->loopback_trace)) + tipc_clone_to_loopback(net, pkts); +} + static inline unsigned int tipc_hashfn(u32 addr) { return addr & (NODE_HTABLE_SIZE - 1); diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c index 99bd166..d6165ad 100644 --- a/net/tipc/netlink.c +++ b/net/tipc/netlink.c @@ -261,7 +261,7 @@ struct genl_family tipc_genl_family __ro_after_init = { .version = TIPC_GENL_V2_VERSION, .hdrsize = 0, .maxattr = TIPC_NLA_MAX, - .policy = tipc_nl_policy, + .policy = tipc_nl_policy, .netnsok = true, .module = THIS_MODULE, .ops = tipc_genl_v2_ops, diff --git a/net/tipc/node.c b/net/tipc/node.c index 9e106d3..7e58831 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1439,6 +1439,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list, int rc; if (in_own_node(net, dnode)) { + tipc_loopback_trace(net, list); tipc_sk_rcv(net, list); return 0; } diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index f345662..e3a6ba1 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -40,6 +40,7 @@ #include "socket.h" #include "addr.h" #include "msg.h" +#include "bearer.h" #include <net/sock.h> #include <linux/module.h> @@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net, struct tipc_event *evt) memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt)); skb_queue_head_init(&evtq); __skb_queue_tail(&evtq, skb); + tipc_loopback_trace(net, &evtq); tipc_sk_rcv(net, &evtq); } -- 2.11.0 |
From: David M. <da...@da...> - 2019-06-17 20:29:27
|
From: Tuong Lien <tuo...@de...> Date: Mon, 17 Jun 2019 12:15:42 +0700 > In patch series, commit 9195948fbf34 ("tipc: improve TIPC throughput by > Gap ACK blocks"), as for simplicity, the repeated retransmit failures' > detection in the function - "tipc_link_retrans()" was kept there for > broadcast retransmissions only. > > This commit now reapplies this feature for link unicast retransmissions > that has been done via the function - "tipc_link_advance_transmq()". > > Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()" > as it is used only for broadcast. > > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tuong Lien <tuo...@de...> Applied, thank you. |
From: Jon M. <jon...@er...> - 2019-06-17 14:44:03
|
Hi Xin, As I remember the discussion around introduction of UDP media a few years ago, the developer, Erik Huge, only chose to register TIPC as a udp tunnel user instead of regular udp user because it provides a more efficient way to receive packet in kernel space. With UDP tunnel, we could receive packet directly in a callback, while TIPC had to run in a work queue thread in order to read packets from the socket. So, in reality we don't need any tunnel at all. Another upside is that it is possible to hook in a GSO callback function from the tunnel user, something I am uncertain if we can do as a regular UDP user. Do you have any comments on this? Could it possibly be done differently? ///jon > -----Original Message----- > From: net...@vg... <net...@vg...> On > Behalf Of Xin Long > Sent: 17-Jun-19 09:34 > To: network dev <ne...@vg...> > Cc: da...@da...; Jon Maloy <jon...@er...>; Ying Xue > <yin...@wi...>; tip...@li...; Marcelo > Ricardo Leitner <mar...@gm...>; Neil Horman > <nh...@tu...>; Su Yanjun <suy...@cn...>; David > Ahern <ds...@gm...>; syz...@go...; Dmitry > Vyukov <dv...@go...>; Pravin B Shelar <ps...@ni...> > Subject: [PATCH net 0/3] net: fix quite a few dst_cache crashes reported by > syzbot > > There are two kinds of crashes reported many times by syzbot with no > reproducer. Call Traces are like: > > BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 > net/ipv4/route.c:1556 > rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 > __mkroute_output net/ipv4/route.c:2332 [inline] > ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 > ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 > __ip_route_output_key include/net/route.h:125 [inline] > ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 > ip_route_output_key include/net/route.h:135 [inline] > ... > > or: > > kasan: GPF could be caused by NULL-ptr deref or user memory access > RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 > <IRQ> > rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] > free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 > __rcu_reclaim kernel/rcu/rcu.h:240 [inline] > rcu_do_batch kernel/rcu/tree.c:2437 [inline] > invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] > rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 > ... > > They were caused by the fib_nh_common percpu member > 'nhc_pcpu_rth_output' > overwritten by another percpu variable 'dev->tstats' access overflow in tipc > udp media xmit path when counting packets on a non tunnel device. > > The fix is to make udp tunnel work with no tunnel device by allowing not to > count packets on the tstats when the tunnel dev is NULL in Patches 1/3 and > 2/3, then pass a NULL tunnel dev in tipc_udp_tunnel() in Patch 3/3. > > Xin Long (3): > ip_tunnel: allow not to count pkts on tstats by setting skb's dev to > NULL > ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL > tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb > > include/net/ip6_tunnel.h | 9 ++++++--- net/ipv4/ip_tunnel_core.c | 9 > ++++++--- > net/tipc/udp_media.c | 8 +++----- > 3 files changed, 15 insertions(+), 11 deletions(-) > > -- > 2.1.0 |
From: Tuong L. T. <tuo...@de...> - 2019-06-17 08:49:29
|
Hi Ying, Thanks for your comments! Regarding your last statement, yes when making the patch, I noticed that the "tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to think a way to combine them but didn't because of the reasons: 1- The "core" functions to copy the data are different since the "tipc_msg_build()" plays with user data in the iov buffers, whereas, for the other, it's skb data. Also, the outputs are different, the first function will set the messages' type in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with the other because it will overwrite the tunnel messages' type... that I had to use the other field (fragm_no/nof_fragms) to determine this at the receiving side... 2- I don't want to touch the old code that can be risky :( BR/Tuong -----Original Message----- From: Ying Xue <yin...@wi...> Sent: Sunday, June 16, 2019 1:42 PM To: Tuong Lien <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet > 2) The same scenario above can happen more easily in case the MTU of > the links is set differently or when changing. In that case, as long as > a large message in the failure link's transmq queue was built and > fragmented with its link's MTU > the other link's one, the issue will > happen (there is no need of a link synching in advance). > > 3) The link synching procedure also faces with the same issue but since > the link synching is only started upon receipt of a SYNCH_MSG, dropping > the message will not result in a state deadlock, but it is not expected > as design. > > The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 This is the same as previous commit. The commit ID might be invalid after it's merged into upstream. > ("tipc: optimize link synching mechanism") by generating only a dummy > SYNCH_MSG (i.e. without data) at the link synching, so the size of a > FAILOVER_MSG if any then will never exceed the link's MTU. > /** > + * tipc_msg_fragment - build a fragment skb list for TIPC message > + * > + * @skb: TIPC message skb > + * @hdr: internal msg header to be put on the top of the fragments > + * @pktmax: max size of a fragment incl. the header > + * @frags: returned fragment skb list > + * > + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL > + * or -ENOMEM > + */ > +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, > + int pktmax, struct sk_buff_head *frags) > +{ > + int pktno, nof_fragms, dsz, dmax, eat; > + struct tipc_msg *_hdr; > + struct sk_buff *_skb; > + u8 *data; > + > + /* Non-linear buffer? */ > + if (skb_linearize(skb)) > + return -ENOMEM; > + > + data = (u8 *)skb->data; > + dsz = msg_size(buf_msg(skb)); > + dmax = pktmax - INT_H_SIZE; > + > + if (dsz <= dmax || !dmax) > + return -EINVAL; > + > + nof_fragms = dsz / dmax + 1; > + > + for (pktno = 1; pktno <= nof_fragms; pktno++) { > + if (pktno < nof_fragms) > + eat = dmax; > + else > + eat = dsz % dmax; > + > + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC); > + if (!_skb) > + goto error; > + > + skb_orphan(_skb); > + __skb_queue_tail(frags, _skb); > + > + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE); > + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat); > + data += eat; > + > + _hdr = buf_msg(_skb); > + msg_set_fragm_no(_hdr, pktno); > + msg_set_nof_fragms(_hdr, nof_fragms); > + msg_set_size(_hdr, INT_H_SIZE + eat); > + } > + return 0; > + In fact we have similar code in tipc_msg_build() where we also fragment packet if necessary. In order to eliminate redundant code, I suggest we should extract the common code into a separate function and then tipc_msg_build() and tipc_msg_fragment() call it. |
From: Tuong L. <tuo...@de...> - 2019-06-17 05:16:01
|
In patch series, commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), as for simplicity, the repeated retransmit failures' detection in the function - "tipc_link_retrans()" was kept there for broadcast retransmissions only. This commit now reapplies this feature for link unicast retransmissions that has been done via the function - "tipc_link_advance_transmq()". Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()" as it is used only for broadcast. Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/link.c | 106 +++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 70 insertions(+), 36 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index f5cd986e1e50..d5ed509e0660 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -249,9 +249,9 @@ static void tipc_link_build_bc_init_msg(struct tipc_link *l, struct sk_buff_head *xmitq); static bool tipc_link_release_pkts(struct tipc_link *l, u16 to); static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data); -static void tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, - struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq); +static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, + struct tipc_gap_ack_blks *ga, + struct sk_buff_head *xmitq); /* * Simple non-static link routines (i.e. referenced outside this file) @@ -1044,32 +1044,69 @@ static void tipc_link_advance_backlog(struct tipc_link *l, l->snd_nxt = seqno; } -static void link_retransmit_failure(struct tipc_link *l, struct sk_buff *skb) +/** + * link_retransmit_failure() - Detect repeated retransmit failures + * @l: tipc link sender + * @r: tipc link receiver (= l in case of unicast) + * @from: seqno of the 1st packet in retransmit request + * @rc: returned code + * + * Return: true if the repeated retransmit failures happens, otherwise + * false + */ +static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, + u16 from, int *rc) { - struct tipc_msg *hdr = buf_msg(skb); + struct sk_buff *skb = skb_peek(&l->transmq); + struct tipc_msg *hdr; + + if (!skb) + return false; + hdr = buf_msg(skb); + + /* Detect repeated retransmit failures on same packet */ + if (r->prev_from != from) { + r->prev_from = from; + r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance); + r->stale_cnt = 0; + } else if (++r->stale_cnt > 99 && time_after(jiffies, r->stale_limit)) { + pr_warn("Retransmission failure on link <%s>\n", l->name); + link_print(l, "State of link "); + pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n", + msg_user(hdr), msg_type(hdr), msg_size(hdr), + msg_errcode(hdr)); + pr_info("sqno %u, prev: %x, src: %x\n", + msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr)); + + trace_tipc_list_dump(&l->transmq, true, "retrans failure!"); + trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!"); + trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!"); + + if (link_is_bc_sndlink(l)) + *rc = TIPC_LINK_DOWN_EVT; + + *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); + return true; + } - pr_warn("Retransmission failure on link <%s>\n", l->name); - link_print(l, "State of link "); - pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n", - msg_user(hdr), msg_type(hdr), msg_size(hdr), msg_errcode(hdr)); - pr_info("sqno %u, prev: %x, src: %x\n", - msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr)); + return false; } -/* tipc_link_retrans() - retransmit one or more packets +/* tipc_link_bc_retrans() - retransmit zero or more packets * @l: the link to transmit on * @r: the receiving link ordering the retransmit. Same as l if unicast * @from: retransmit from (inclusive) this sequence number * @to: retransmit to (inclusive) this sequence number * xmitq: queue for accumulating the retransmitted packets */ -static int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r, - u16 from, u16 to, struct sk_buff_head *xmitq) +static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, + u16 from, u16 to, struct sk_buff_head *xmitq) { struct sk_buff *_skb, *skb = skb_peek(&l->transmq); u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; u16 ack = l->rcv_nxt - 1; struct tipc_msg *hdr; + int rc = 0; if (!skb) return 0; @@ -1077,20 +1114,9 @@ static int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r, return 0; trace_tipc_link_retrans(r, from, to, &l->transmq); - /* Detect repeated retransmit failures on same packet */ - if (r->prev_from != from) { - r->prev_from = from; - r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance); - r->stale_cnt = 0; - } else if (++r->stale_cnt > 99 && time_after(jiffies, r->stale_limit)) { - link_retransmit_failure(l, skb); - trace_tipc_list_dump(&l->transmq, true, "retrans failure!"); - trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!"); - trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!"); - if (link_is_bc_sndlink(l)) - return TIPC_LINK_DOWN_EVT; - return tipc_link_fsm_evt(l, LINK_FAILURE_EVT); - } + + if (link_retransmit_failure(l, r, from, &rc)) + return rc; skb_queue_walk(&l->transmq, skb) { hdr = buf_msg(skb); @@ -1324,17 +1350,23 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data) * @gap: # of gap packets * @ga: buffer pointer to Gap ACK blocks from peer * @xmitq: queue for accumulating the retransmitted packets if any + * + * In case of a repeated retransmit failures, the call will return shortly + * with a returned code (e.g. TIPC_LINK_DOWN_EVT) */ -static void tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, - struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq) +static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, + struct tipc_gap_ack_blks *ga, + struct sk_buff_head *xmitq) { struct sk_buff *skb, *_skb, *tmp; struct tipc_msg *hdr; u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; u16 ack = l->rcv_nxt - 1; - u16 seqno; - u16 n = 0; + u16 seqno, n = 0; + int rc = 0; + + if (gap && link_retransmit_failure(l, l, acked + 1, &rc)) + return rc; skb_queue_walk_safe(&l->transmq, skb, tmp) { seqno = buf_seqno(skb); @@ -1369,6 +1401,8 @@ static void tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, goto next_gap_ack; } } + + return 0; } /* tipc_link_build_state_msg: prepare link state message for transmission @@ -1919,7 +1953,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, rcvgap, 0, 0, xmitq); - tipc_link_advance_transmq(l, ack, gap, ga, xmitq); + rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); /* If NACK, retransmit will now start at right position */ if (gap) @@ -2036,7 +2070,7 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, if (more(peers_snd_nxt, l->rcv_nxt + l->window)) return rc; - rc = tipc_link_retrans(snd_l, l, from, to, xmitq); + rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); l->snd_nxt = peers_snd_nxt; if (link_bc_rcv_gap(l)) @@ -2132,7 +2166,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, if (dnode == tipc_own_addr(l->net)) { tipc_link_bc_ack_rcv(l, acked, xmitq); - rc = tipc_link_retrans(l->bc_sndlink, l, from, to, xmitq); + rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, xmitq); l->stats.recv_nacks++; return rc; } -- 2.13.7 |
From: Tuong L. <tuo...@de...> - 2019-06-17 04:56:34
|
It appears that a FAILOVER_MSG can come from peer even when the failure link is resetting (i.e. just after the 'node_write_unlock()'...). This means the failover procedure on the node has not been started yet. The situation is as follows: node1 node2 linkb linka linka linkb | | | | | | x failure | | | RESETTING | | | | | | x failure RESET | | RESETTING FAILINGOVER | | | (FAILOVER_MSG) | | |<-------------------------------------------------| | *FAILINGOVER | | | | | (dummy FAILOVER_MSG) | | |------------------------------------------------->| | RESET | | FAILOVER_END | FAILINGOVER RESET | . . . . . . . . . . . . Once this happens, the link failover procedure will be triggered wrongly on the receiving node since the node isn't in FAILINGOVER state but then another link failover will be carried out. The consequences are: 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' was set, reset and set incorrectly, the criteria to end the failover would not be met, it could keep waiting for a message that has already received. 2) The early FAILOVER_MSG(s) could be queued in the link failover deferdq but would be purged or not pulled out because the 'drop_point' was not set correctly. 3) The early FAILOVER_MSG(s) could be dropped too. 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state shortly, but later on it would be restarted. The same situation can also happen when the link is in PEER_RESET state and a FAILOVER_MSG arrives. The commit resolves the issues by forcing the link down immediately, so the failover procedure will be started normally (which is the same as when receiving a FAILOVER_MSG and the link is in up state). Also, the function "tipc_node_link_failover()" is toughen to avoid such a situation from happening. Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/link.c | 1 - net/tipc/node.c | 10 +++++++--- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index d5ed509e0660..bcfb0a4ab485 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1762,7 +1762,6 @@ void tipc_link_failover_prepare(struct tipc_link *l, struct tipc_link *tnl, * node has entered SELF_DOWN_PEER_LEAVING and both peer nodes * would have to start over from scratch instead. */ - WARN_ON(l && tipc_link_is_up(l)); tnl->drop_point = 1; tnl->failover_reasm_skb = NULL; diff --git a/net/tipc/node.c b/net/tipc/node.c index e4dba865105e..65644642c091 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -777,9 +777,9 @@ static void tipc_node_link_up(struct tipc_node *n, int bearer_id, * disturbance, wrong session, etc.) * 3. Link <1B-2B> up * 4. Link endpoint 2A down (e.g. due to link tolerance timeout) - * 5. Node B starts failover onto link <1B-2B> + * 5. Node 2 starts failover onto link <1B-2B> * - * ==> Node A does never start link/node failover! + * ==> Node 1 does never start link/node failover! * * @n: tipc node structure * @l: link peer endpoint failingover (- can be NULL) @@ -794,6 +794,10 @@ static void tipc_node_link_failover(struct tipc_node *n, struct tipc_link *l, if (!tipc_link_is_up(tnl)) return; + /* Don't rush, failure link may be in the process of resetting */ + if (l && !tipc_link_is_reset(l)) + return; + tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); @@ -1719,7 +1723,7 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb, /* Initiate or update failover mode if applicable */ if ((usr == TUNNEL_PROTOCOL) && (mtyp == FAILOVER_MSG)) { syncpt = oseqno + exp_pkts - 1; - if (pl && tipc_link_is_up(pl)) { + if (pl && !tipc_link_is_reset(pl)) { __tipc_node_link_down(n, &pb_id, xmitq, &maddr); trace_tipc_node_link_down(n, true, "node link down <- failover!"); -- 2.13.7 |
From: David M. <da...@da...> - 2019-06-17 03:42:55
|
From: Xin Long <luc...@gm...> Date: Sun, 16 Jun 2019 17:24:07 +0800 > Syzbot reported a memleak caused by grp members' deferredq list not > purged when the grp is be deleted. > > The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in > tipc_group_filter_msg() and the skb will stay in deferredq. > > So fix it by calling __skb_queue_purge for each member's deferredq > in tipc_group_delete() when a tipc sk leaves the grp. > > Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast") > Reported-by: syz...@sy... > Signed-off-by: Xin Long <luc...@gm...> Applied, thanks. |
From: Ying X. <yin...@wi...> - 2019-06-17 02:54:27
|
On 6/16/19 5:24 PM, Xin Long wrote: > Syzbot reported a memleak caused by grp members' deferredq list not > purged when the grp is be deleted. > > The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in > tipc_group_filter_msg() and the skb will stay in deferredq. > > So fix it by calling __skb_queue_purge for each member's deferredq > in tipc_group_delete() when a tipc sk leaves the grp. > > Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast") > Reported-by: syz...@sy... > Signed-off-by: Xin Long <luc...@gm...> Acked-by: Ying Xue <yin...@wi...> > --- > net/tipc/group.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/tipc/group.c b/net/tipc/group.c > index 992be61..5f98d38 100644 > --- a/net/tipc/group.c > +++ b/net/tipc/group.c > @@ -218,6 +218,7 @@ void tipc_group_delete(struct net *net, struct tipc_group *grp) > > rbtree_postorder_for_each_entry_safe(m, tmp, tree, tree_node) { > tipc_group_proto_xmit(grp, m, GRP_LEAVE_MSG, &xmitq); > + __skb_queue_purge(&m->deferredq); > list_del(&m->list); > kfree(m); > } > |
From: Xin L. <luc...@gm...> - 2019-06-16 09:24:24
|
Syzbot reported a memleak caused by grp members' deferredq list not purged when the grp is be deleted. The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in tipc_group_filter_msg() and the skb will stay in deferredq. So fix it by calling __skb_queue_purge for each member's deferredq in tipc_group_delete() when a tipc sk leaves the grp. Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast") Reported-by: syz...@sy... Signed-off-by: Xin Long <luc...@gm...> --- net/tipc/group.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/group.c b/net/tipc/group.c index 992be61..5f98d38 100644 --- a/net/tipc/group.c +++ b/net/tipc/group.c @@ -218,6 +218,7 @@ void tipc_group_delete(struct net *net, struct tipc_group *grp) rbtree_postorder_for_each_entry_safe(m, tmp, tree, tree_node) { tipc_group_proto_xmit(grp, m, GRP_LEAVE_MSG, &xmitq); + __skb_queue_purge(&m->deferredq); list_del(&m->list); kfree(m); } -- 2.1.0 |
From: Ying X. <yin...@wi...> - 2019-06-16 07:46:02
|
Hi John, Can you please submit the patch with "git send-email" command again? The patch has been totally messed under Linux environment. Thanks, Ying On 6/14/19 9:58 AM, John Rutherford wrote: > Since node internal messages are passed directly to socket it is not > possible to observe > > this message exchange via tcpdump or wireshark. > > > > We now remedy this by making it possible to clone such messages and send the > clones to the > > loopback interface. The clones are dropped at reception and have no > functional role except > > making the traffic visible. > > > > The feature is turned on/off by enabling/disabling the loopback "bearer" > "eth:lo". > > Signed-off-by: John Rutherford <joh...@de...> > > Signed-off-by: Jon Maloy <jon...@er...> > > --- > > net/tipc/bcast.c | 5 +++- > > net/tipc/bearer.c | 67 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > net/tipc/bearer.h | 3 +++ > > net/tipc/core.c | 5 +++- > > net/tipc/core.h | 9 ++++++++ > > net/tipc/netlink.c | 2 +- > > net/tipc/node.c | 2 ++ > > net/tipc/topsrv.c | 3 +++ > > 8 files changed, 93 insertions(+), 3 deletions(-) > > > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > > index 6c997d4..db40129 100644 > > --- a/net/tipc/bcast.c > > +++ b/net/tipc/bcast.c > > @@ -406,8 +406,11 @@ int tipc_mcast_xmit(struct net *net, struct > sk_buff_head *pkts, > > rc = tipc_bcast_xmit(net, > pkts, cong_link_cnt); > > } > > - if (dests->local) > > + if (dests->local) { > > + if (unlikely(tipc_loopback_trace(net))) > > + tipc_clone_to_loopback(net, > &localq); > > tipc_sk_mcast_rcv(net, &localq, &inputq); > > + } > > exit: > > /* This queue should normally be empty by now */ > > __skb_queue_purge(pkts); > > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c > > index 2bed658..27b4fd7 100644 > > --- a/net/tipc/bearer.c > > +++ b/net/tipc/bearer.c > > @@ -836,6 +836,12 @@ int __tipc_nl_bearer_disable(struct sk_buff *skb, > struct genl_info *info) > > name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); > > + if (!strcmp(name, "eth:lo")) { > > + tipc_net(net)->loopback_trace = false; > > + pr_info("Disabled packet tracing on loopback > interface\n"); > > + return 0; > > + } > > + > > bearer = tipc_bearer_find(net, name); > > if (!bearer) > > return -EINVAL; > > @@ -881,6 +887,12 @@ int __tipc_nl_bearer_enable(struct sk_buff *skb, struct > genl_info *info) > > bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]); > > + if (!strcmp(bearer, "eth:lo")) { > > + tipc_net(net)->loopback_trace = true; > > + pr_info("Enabled packet tracing on loopback > interface\n"); > > + return 0; > > + } > > + > > if (attrs[TIPC_NLA_BEARER_DOMAIN]) > > domain = > nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]); > > @@ -1021,6 +1033,61 @@ int tipc_nl_bearer_set(struct sk_buff *skb, struct > genl_info *info) > > return err; > > } > > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq) > > +{ > > + struct net_device *dev = net->loopback_dev; > > + struct sk_buff *skb, *_skb; > > + int exp; > > + > > + skb_queue_walk(xmitq, _skb) { > > + skb = pskb_copy(_skb, GFP_ATOMIC); > > + if (!skb) > > + continue; > > + exp = SKB_DATA_ALIGN(dev->hard_header_len - > skb_headroom(skb)); > > + if (exp > 0 && pskb_expand_head(skb, exp, 0, > GFP_ATOMIC)) { > > + kfree_skb(skb); > > + continue; > > + } > > + skb_reset_network_header(skb); > > + skb->dev = dev; > > + skb->protocol = htons(ETH_P_TIPC); > > + dev_hard_header(skb, dev, ETH_P_TIPC, > dev->dev_addr, > > + dev->dev_addr, > skb->len); > > + dev_queue_xmit(skb); > > + } > > +} > > + > > +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device > *dev, > > + struct > packet_type *pt, struct net_device *od) > > +{ > > + consume_skb(skb); > > + return NET_RX_SUCCESS; > > +} > > + > > +int tipc_attach_loopback(struct net *net) > > +{ > > + struct net_device *dev = net->loopback_dev; > > + struct tipc_net *tn = tipc_net(net); > > + > > + if (!dev) > > + return -ENODEV; > > + dev_hold(dev); > > + tn->loopback_pt.dev = dev; > > + tn->loopback_pt.type = htons(ETH_P_TIPC); > > + tn->loopback_pt.func = tipc_loopback_rcv_pkt; > > + tn->loopback_trace = false; > > + dev_add_pack(&tn->loopback_pt); > > + return 0; > > +} > > + > > +void tipc_detach_loopback(struct net *net) > > +{ > > + struct tipc_net *tn = tipc_net(net); > > + > > + dev_remove_pack(&tn->loopback_pt); > > + dev_put(net->loopback_dev); > > +} > > + > > static int __tipc_nl_add_media(struct tipc_nl_msg *msg, > > struct tipc_media > *media, int nlflags) > > { > > diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h > > index 7f4c569..ef7fad9 100644 > > --- a/net/tipc/bearer.h > > +++ b/net/tipc/bearer.h > > @@ -232,6 +232,9 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, > > struct tipc_media_addr *dst); > > void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, > > struct sk_buff_head *xmitq); > > +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *xmitq); > > +int tipc_attach_loopback(struct net *net); > > +void tipc_detach_loopback(struct net *net); > > /* check if device MTU is too low for tipc headers */ > > static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int > reserve) > > diff --git a/net/tipc/core.c b/net/tipc/core.c > > index ed536c0..1867687 100644 > > --- a/net/tipc/core.c > > +++ b/net/tipc/core.c > > @@ -81,7 +81,9 @@ static int __net_init tipc_init_net(struct net *net) > > err = tipc_bcast_init(net); > > if (err) > > goto out_bclink; > > - > > + err = tipc_attach_loopback(net); > > + if (err) > > + goto out_bclink; > > return 0; > > out_bclink: > > @@ -94,6 +96,7 @@ static int __net_init tipc_init_net(struct net *net) > > static void __net_exit tipc_exit_net(struct net *net) > > { > > + tipc_detach_loopback(net); > > tipc_net_stop(net); > > tipc_bcast_stop(net); > > tipc_nametbl_stop(net); > > diff --git a/net/tipc/core.h b/net/tipc/core.h > > index 7a68e1b..de164ac 100644 > > --- a/net/tipc/core.h > > +++ b/net/tipc/core.h > > @@ -125,6 +125,10 @@ struct tipc_net { > > /* Cluster capabilities */ > > u16 capabilities; > > + > > + /* Tracing of node internal messages */ > > + struct packet_type loopback_pt; > > + bool loopback_trace; > > }; > > static inline struct tipc_net *tipc_net(struct net *net) > > @@ -152,6 +156,11 @@ static inline struct tipc_topsrv *tipc_topsrv(struct > net *net) > > return tipc_net(net)->topsrv; > > } > > +static inline bool tipc_loopback_trace(struct net *net) > > +{ > > + return tipc_net(net)->loopback_trace; > > +} > > + > > static inline unsigned int tipc_hashfn(u32 addr) > > { > > return addr & (NODE_HTABLE_SIZE - 1); > > diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c > > index 99bd166..d6165ad 100644 > > --- a/net/tipc/netlink.c > > +++ b/net/tipc/netlink.c > > @@ -261,7 +261,7 @@ struct genl_family tipc_genl_family __ro_after_init = { > > .version = TIPC_GENL_V2_VERSION, > > .hdrsize = 0, > > .maxattr = TIPC_NLA_MAX, > > - .policy = tipc_nl_policy, > > + .policy = tipc_nl_policy, > > .netnsok = true, > > .module = THIS_MODULE, > > .ops = tipc_genl_v2_ops, > > diff --git a/net/tipc/node.c b/net/tipc/node.c > > index 9e106d3..415d02d 100644 > > --- a/net/tipc/node.c > > +++ b/net/tipc/node.c > > @@ -1439,6 +1439,8 @@ int tipc_node_xmit(struct net *net, struct > sk_buff_head *list, > > int rc; > > if (in_own_node(net, dnode)) { > > + if (unlikely(tipc_loopback_trace(net))) > > + tipc_clone_to_loopback(net, > list); > > tipc_sk_rcv(net, list); > > return 0; > > } > > diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c > > index f345662..37f07e7 100644 > > --- a/net/tipc/topsrv.c > > +++ b/net/tipc/topsrv.c > > @@ -40,6 +40,7 @@ > > #include "socket.h" > > #include "addr.h" > > #include "msg.h" > > +#include "bearer.h" > > #include <net/sock.h> > > #include <linux/module.h> > > @@ -608,6 +609,8 @@ static void tipc_topsrv_kern_evt(struct net *net, struct > tipc_event *evt) > > memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt)); > > skb_queue_head_init(&evtq); > > __skb_queue_tail(&evtq, skb); > > + if (unlikely(tipc_loopback_trace(net))) > > + tipc_clone_to_loopback(net, &evtq); > > tipc_sk_rcv(net, &evtq); > > } > |
From: Ying X. <yin...@wi...> - 2019-06-16 07:14:15
|
On 6/10/19 2:44 AM, Xin Long wrote: > Looks we need to purge each member's deferredq list in tipc_group_delete(): > diff --git a/net/tipc/group.c b/net/tipc/group.c > index 992be61..23823eb 100644 > --- a/net/tipc/group.c > +++ b/net/tipc/group.c > @@ -218,6 +218,7 @@ void tipc_group_delete(struct net *net, struct > tipc_group *grp) > > rbtree_postorder_for_each_entry_safe(m, tmp, tree, tree_node) { > tipc_group_proto_xmit(grp, m, GRP_LEAVE_MSG, &xmitq); > + __skb_queue_purge(&m->deferredq); > list_del(&m->list); > kfree(m); > } Good catch! I agree with you. |
From: Ying X. <yin...@wi...> - 2019-06-16 06:53:45
|
> 2) The same scenario above can happen more easily in case the MTU of > the links is set differently or when changing. In that case, as long as > a large message in the failure link's transmq queue was built and > fragmented with its link's MTU > the other link's one, the issue will > happen (there is no need of a link synching in advance). > > 3) The link synching procedure also faces with the same issue but since > the link synching is only started upon receipt of a SYNCH_MSG, dropping > the message will not result in a state deadlock, but it is not expected > as design. > > The 1) & 3) issues are resolved by the previous commit 81e4dd94b214 This is the same as previous commit. The commit ID might be invalid after it's merged into upstream. > ("tipc: optimize link synching mechanism") by generating only a dummy > SYNCH_MSG (i.e. without data) at the link synching, so the size of a > FAILOVER_MSG if any then will never exceed the link's MTU. > /** > + * tipc_msg_fragment - build a fragment skb list for TIPC message > + * > + * @skb: TIPC message skb > + * @hdr: internal msg header to be put on the top of the fragments > + * @pktmax: max size of a fragment incl. the header > + * @frags: returned fragment skb list > + * > + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL > + * or -ENOMEM > + */ > +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, > + int pktmax, struct sk_buff_head *frags) > +{ > + int pktno, nof_fragms, dsz, dmax, eat; > + struct tipc_msg *_hdr; > + struct sk_buff *_skb; > + u8 *data; > + > + /* Non-linear buffer? */ > + if (skb_linearize(skb)) > + return -ENOMEM; > + > + data = (u8 *)skb->data; > + dsz = msg_size(buf_msg(skb)); > + dmax = pktmax - INT_H_SIZE; > + > + if (dsz <= dmax || !dmax) > + return -EINVAL; > + > + nof_fragms = dsz / dmax + 1; > + > + for (pktno = 1; pktno <= nof_fragms; pktno++) { > + if (pktno < nof_fragms) > + eat = dmax; > + else > + eat = dsz % dmax; > + > + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC); > + if (!_skb) > + goto error; > + > + skb_orphan(_skb); > + __skb_queue_tail(frags, _skb); > + > + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE); > + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat); > + data += eat; > + > + _hdr = buf_msg(_skb); > + msg_set_fragm_no(_hdr, pktno); > + msg_set_nof_fragms(_hdr, nof_fragms); > + msg_set_size(_hdr, INT_H_SIZE + eat); > + } > + return 0; > + In fact we have similar code in tipc_msg_build() where we also fragment packet if necessary. In order to eliminate redundant code, I suggest we should extract the common code into a separate function and then tipc_msg_build() and tipc_msg_fragment() call it. |
From: Ying X. <yin...@wi...> - 2019-06-16 06:47:24
|
On 6/4/19 1:22 PM, Tuong Lien wrote: > This commit is along with the latter commit 4ec6a14c3933 The commit ID might be changed after the commit is merged into upstream, which means the ID will be invalid as well. ("tipc: fix > changeover issues due to large packet") to resolve the issues with the > link changeover mechanism. See that commit for details. > > Basically, for the link synching, from now on, we will send only one > single ("dummy") SYNCH message to peer. The SYNCH message does not > contain any data, just a header conveying the synch point to the peer. > > A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for > backward compatible! > > Suggested-by: Jon Maloy <jon...@er...> > Signed-off-by: Tuong Lien <tuo...@de...> Acked-by: Ying Xue <yin...@wi...> > --- > net/tipc/link.c | 26 ++++++++++++++++++++++++++ > net/tipc/msg.h | 10 ++++++++++ > net/tipc/node.c | 6 ++++-- > net/tipc/node.h | 6 ++++-- > 4 files changed, 44 insertions(+), 4 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c > index f5cd986e1e50..6924cf1e526f 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -1637,6 +1637,7 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl, > struct sk_buff_head *queue = &l->transmq; > struct sk_buff_head tmpxq, tnlq; > u16 pktlen, pktcnt, seqno = l->snd_nxt; > + u16 syncpt; > > if (!tnl) > return; > @@ -1656,6 +1657,31 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl, > tipc_link_xmit(l, &tnlq, &tmpxq); > __skb_queue_purge(&tmpxq); > > + /* Link Synching: > + * From now on, send only one single ("dummy") SYNCH message > + * to peer. The SYNCH message does not contain any data, just > + * a header conveying the synch point to the peer. > + */ > + if (mtyp == SYNCH_MSG && (tnl->peer_caps & TIPC_TUNNEL_ENHANCED)) { > + tnlskb = tipc_msg_create(TUNNEL_PROTOCOL, SYNCH_MSG, > + INT_H_SIZE, 0, l->addr, > + tipc_own_addr(l->net), > + 0, 0, 0); > + if (!tnlskb) { > + pr_warn("%sunable to create dummy SYNCH_MSG\n", > + link_co_err); > + return; > + } > + > + hdr = buf_msg(tnlskb); > + syncpt = l->snd_nxt + skb_queue_len(&l->backlogq) - 1; > + msg_set_syncpt(hdr, syncpt); > + msg_set_bearer_id(hdr, l->peer_bearer_id); > + __skb_queue_tail(&tnlq, tnlskb); > + tipc_link_xmit(tnl, &tnlq, xmitq); > + return; > + } > + > /* Initialize reusable tunnel packet header */ > tipc_msg_init(tipc_own_addr(l->net), &tnlhdr, TUNNEL_PROTOCOL, > mtyp, INT_H_SIZE, l->addr); > diff --git a/net/tipc/msg.h b/net/tipc/msg.h > index 8de02ad6e352..baf937bfa702 100644 > --- a/net/tipc/msg.h > +++ b/net/tipc/msg.h > @@ -877,6 +877,16 @@ static inline void msg_set_msgcnt(struct tipc_msg *m, u16 n) > msg_set_bits(m, 9, 16, 0xffff, n); > } > > +static inline u16 msg_syncpt(struct tipc_msg *m) > +{ > + return msg_bits(m, 9, 16, 0xffff); > +} > + > +static inline void msg_set_syncpt(struct tipc_msg *m, u16 n) > +{ > + msg_set_bits(m, 9, 16, 0xffff, n); > +} > + > static inline u32 msg_conn_ack(struct tipc_msg *m) > { > return msg_bits(m, 9, 16, 0xffff); > diff --git a/net/tipc/node.c b/net/tipc/node.c > index 9e106d3ed187..2a8399cf5525 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -1645,7 +1645,6 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb, > int usr = msg_user(hdr); > int mtyp = msg_type(hdr); > u16 oseqno = msg_seqno(hdr); > - u16 iseqno = msg_seqno(msg_get_wrapped(hdr)); > u16 exp_pkts = msg_msgcnt(hdr); > u16 rcv_nxt, syncpt, dlv_nxt, inputq_len; > int state = n->state; > @@ -1744,7 +1743,10 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb, > > /* Initiate synch mode if applicable */ > if ((usr == TUNNEL_PROTOCOL) && (mtyp == SYNCH_MSG) && (oseqno == 1)) { > - syncpt = iseqno + exp_pkts - 1; > + if (n->capabilities & TIPC_TUNNEL_ENHANCED) > + syncpt = msg_syncpt(hdr); > + else > + syncpt = msg_seqno(msg_get_wrapped(hdr)) + exp_pkts - 1; > if (!tipc_link_is_up(l)) > __tipc_node_link_up(n, bearer_id, xmitq); > if (n->state == SELF_UP_PEER_UP) { > diff --git a/net/tipc/node.h b/net/tipc/node.h > index c0bf49ea3de4..291d0ecd4101 100644 > --- a/net/tipc/node.h > +++ b/net/tipc/node.h > @@ -53,7 +53,8 @@ enum { > TIPC_NODE_ID128 = (1 << 5), > TIPC_LINK_PROTO_SEQNO = (1 << 6), > TIPC_MCAST_RBCTL = (1 << 7), > - TIPC_GAP_ACK_BLOCK = (1 << 8) > + TIPC_GAP_ACK_BLOCK = (1 << 8), > + TIPC_TUNNEL_ENHANCED = (1 << 9) > }; > > #define TIPC_NODE_CAPABILITIES (TIPC_SYN_BIT | \ > @@ -64,7 +65,8 @@ enum { > TIPC_NODE_ID128 | \ > TIPC_LINK_PROTO_SEQNO | \ > TIPC_MCAST_RBCTL | \ > - TIPC_GAP_ACK_BLOCK) > + TIPC_GAP_ACK_BLOCK | \ > + TIPC_TUNNEL_ENHANCED) > #define INVALID_BEARER_ID -1 > > void tipc_node_stop(struct net *net); > |
From: Jon M. <jon...@er...> - 2019-06-14 13:17:02
|
Acked-by: Jon ///jon > -----Original Message----- > From: Tuong Lien <tuo...@de...> > Sent: 14-Jun-19 06:41 > To: tip...@li...; Jon Maloy > <jon...@er...>; ma...@do...; yin...@wi... > Subject: [net] tipc: fix issues with early FAILOVER_MSG from peer > > It appears that a FAILOVER_MSG can come from peer even when the failure > link is resetting (i.e. just after the 'node_write_unlock()'...). This means the > failover procedure on the node has not been started yet. > The situation is as follows: > > node1 node2 > linkb linka linka linkb > | | | | > | | x failure | > | | RESETTING | > | | | | > | x failure RESET | > | RESETTING FAILINGOVER | > | | (FAILOVER_MSG) | | > |<-------------------------------------------------| > | *FAILINGOVER | | | > | | (dummy FAILOVER_MSG) | | > |------------------------------------------------->| > | RESET | | FAILOVER_END > | FAILINGOVER RESET | > . . . . > . . . . > . . . . > > Once this happens, the link failover procedure will be triggered wrongly on the > receiving node since the node isn't in FAILINGOVER state but then another link > failover will be carried out. > The consequences are: > > 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' > was set, reset and set incorrectly, the criteria to end the failover would not be > met, it could keep waiting for a message that has already received. > > 2) The early FAILOVER_MSG(s) could be queued in the link failover deferdq > but would be purged or not pulled out because the 'drop_point' > was not set correctly. > > 3) The early FAILOVER_MSG(s) could be dropped too. > > 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER > state shortly, but later on it would be restarted. > > The same situation can also happen when the link is in PEER_RESET state and a > FAILOVER_MSG arrives. > > The commit resolves the issues by forcing the link down immediately, so the > failover procedure will be started normally (which is the same as when > receiving a FAILOVER_MSG and the link is in up state). > > Also, the function "tipc_node_link_failover()" is toughen to avoid such a > situation from happening. > > Signed-off-by: Tuong Lien <tuo...@de...> > --- > net/tipc/link.c | 1 - > net/tipc/node.c | 10 +++++++--- > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c index > d5ed509e0660..bcfb0a4ab485 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -1762,7 +1762,6 @@ void tipc_link_failover_prepare(struct tipc_link *l, > struct tipc_link *tnl, > * node has entered SELF_DOWN_PEER_LEAVING and both peer nodes > * would have to start over from scratch instead. > */ > - WARN_ON(l && tipc_link_is_up(l)); > tnl->drop_point = 1; > tnl->failover_reasm_skb = NULL; > > diff --git a/net/tipc/node.c b/net/tipc/node.c index > e4dba865105e..65644642c091 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -777,9 +777,9 @@ static void tipc_node_link_up(struct tipc_node *n, > int bearer_id, > * disturbance, wrong session, etc.) > * 3. Link <1B-2B> up > * 4. Link endpoint 2A down (e.g. due to link tolerance timeout) > - * 5. Node B starts failover onto link <1B-2B> > + * 5. Node 2 starts failover onto link <1B-2B> > * > - * ==> Node A does never start link/node failover! > + * ==> Node 1 does never start link/node failover! > * > * @n: tipc node structure > * @l: link peer endpoint failingover (- can be NULL) @@ -794,6 +794,10 @@ > static void tipc_node_link_failover(struct tipc_node *n, struct tipc_link *l, > if (!tipc_link_is_up(tnl)) > return; > > + /* Don't rush, failure link may be in the process of resetting */ > + if (l && !tipc_link_is_reset(l)) > + return; > + > tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); > tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); > > @@ -1719,7 +1723,7 @@ static bool tipc_node_check_state(struct > tipc_node *n, struct sk_buff *skb, > /* Initiate or update failover mode if applicable */ > if ((usr == TUNNEL_PROTOCOL) && (mtyp == FAILOVER_MSG)) { > syncpt = oseqno + exp_pkts - 1; > - if (pl && tipc_link_is_up(pl)) { > + if (pl && !tipc_link_is_reset(pl)) { > __tipc_node_link_down(n, &pb_id, xmitq, &maddr); > trace_tipc_node_link_down(n, true, > "node link down <- failover!"); > -- > 2.13.7 |