tipc-discussion Mailing List for TIPC Cluster Domain Sockets (Page 45)

Cluster wide IPC providing datagram, connection, and bus messaging

Brought to you by: goranm, johnr64, jonmaloy, ldcanh, and 6 others

tipc-discussion — User questions & problem reports, project news, developer discussion

You can subscribe to this list here.

2003	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (6)
2004	Jan (9)	Feb (11)	Mar (22)	Apr (73)	May (78)	Jun (146)	Jul (80)	Aug (27)	Sep (5)	Oct (14)	Nov (18)	Dec (27)
2005	Jan (20)	Feb (30)	Mar (19)	Apr (28)	May (50)	Jun (31)	Jul (32)	Aug (14)	Sep (36)	Oct (43)	Nov (74)	Dec (63)
2006	Jan (34)	Feb (32)	Mar (21)	Apr (76)	May (106)	Jun (72)	Jul (70)	Aug (175)	Sep (130)	Oct (39)	Nov (81)	Dec (43)
2007	Jan (81)	Feb (36)	Mar (20)	Apr (43)	May (54)	Jun (34)	Jul (44)	Aug (55)	Sep (44)	Oct (54)	Nov (43)	Dec (41)
2008	Jan (42)	Feb (84)	Mar (73)	Apr (30)	May (119)	Jun (54)	Jul (54)	Aug (93)	Sep (173)	Oct (130)	Nov (145)	Dec (153)
2009	Jan (59)	Feb (12)	Mar (28)	Apr (18)	May (56)	Jun (9)	Jul (28)	Aug (62)	Sep (16)	Oct (19)	Nov (15)	Dec (17)
2010	Jan (14)	Feb (36)	Mar (37)	Apr (30)	May (33)	Jun (53)	Jul (42)	Aug (50)	Sep (67)	Oct (66)	Nov (69)	Dec (36)
2011	Jan (52)	Feb (45)	Mar (49)	Apr (21)	May (34)	Jun (13)	Jul (19)	Aug (37)	Sep (43)	Oct (10)	Nov (23)	Dec (30)
2012	Jan (42)	Feb (36)	Mar (46)	Apr (25)	May (96)	Jun (146)	Jul (40)	Aug (28)	Sep (61)	Oct (45)	Nov (100)	Dec (53)
2013	Jan (79)	Feb (24)	Mar (134)	Apr (156)	May (118)	Jun (75)	Jul (278)	Aug (145)	Sep (136)	Oct (168)	Nov (137)	Dec (439)
2014	Jan (284)	Feb (158)	Mar (231)	Apr (275)	May (259)	Jun (91)	Jul (222)	Aug (215)	Sep (165)	Oct (166)	Nov (211)	Dec (150)
2015	Jan (164)	Feb (324)	Mar (299)	Apr (214)	May (111)	Jun (109)	Jul (105)	Aug (36)	Sep (58)	Oct (131)	Nov (68)	Dec (30)
2016	Jan (46)	Feb (87)	Mar (135)	Apr (174)	May (132)	Jun (135)	Jul (149)	Aug (125)	Sep (79)	Oct (49)	Nov (95)	Dec (102)
2017	Jan (104)	Feb (75)	Mar (72)	Apr (53)	May (18)	Jun (5)	Jul (14)	Aug (19)	Sep (2)	Oct (13)	Nov (21)	Dec (67)
2018	Jan (56)	Feb (50)	Mar (148)	Apr (41)	May (37)	Jun (34)	Jul (34)	Aug (11)	Sep (52)	Oct (48)	Nov (28)	Dec (46)
2019	Jan (29)	Feb (63)	Mar (95)	Apr (54)	May (14)	Jun (71)	Jul (60)	Aug (49)	Sep (3)	Oct (64)	Nov (115)	Dec (57)
2020	Jan (15)	Feb (9)	Mar (38)	Apr (27)	May (60)	Jun (53)	Jul (35)	Aug (46)	Sep (37)	Oct (64)	Nov (20)	Dec (25)
2021	Jan (20)	Feb (31)	Mar (27)	Apr (23)	May (21)	Jun (30)	Jul (30)	Aug (7)	Sep (18)	Oct	Nov (15)	Dec (4)
2022	Jan (3)	Feb (1)	Mar (10)	Apr	May (2)	Jun (26)	Jul (5)	Aug	Sep (1)	Oct (2)	Nov (9)	Dec (2)
2023	Jan (4)	Feb (4)	Mar (5)	Apr (10)	May (29)	Jun (17)	Jul	Aug	Sep (1)	Oct (1)	Nov (2)	Dec
2024	Jan	Feb (6)	Mar	Apr (1)	May (6)	Jun	Jul (5)	Aug	Sep (3)	Oct	Nov	Dec
2025	Jan	Feb (3)	Mar	Apr	May	Jun	Jul (6)	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 43 44 45 46 47 .. 595 > >> (Page 45 of 595)

[tipc-discussion] [PATCH 0/3] Fix three issues found by syzbot

From: Ying X. <yin...@wi...> - 2019-08-04 12:03:52

In this series, try to fix two memory leak issues and another issue of
calling smp_processor_id() in preemptible context.

Ying Xue (3):
  tipc: fix memory leak issue
  tipc: fix memory leak issue
  tipc: fix issue of calling smp_processor_id() in preemptible

 net/tipc/group.c     | 22 +++++++++++++---------
 net/tipc/node.c      |  7 +++++--
 net/tipc/udp_media.c | 12 +++++++++---
 3 files changed, 27 insertions(+), 14 deletions(-)

-- 
2.7.4

Re: [tipc-discussion] [net] tipc: fix false detection of retransmit failures

From: Ying X. <yin...@wi...> - 2019-08-02 12:34:11

Hi Tuong,

Thank you for your clear explanation.

I am fine to this patch. Good work!

Regards,
Ying


On 8/1/19 10:58 AM, Tuong Lien Tong wrote:
> Hi Ying,
> 
> See below my answers inline.
> 
> BR/Tuong
> 
> -----Original Message-----
> From: Ying Xue <yin...@wi...> 
> Sent: Wednesday, July 31, 2019 8:21 PM
> To: Tuong Lien <tuo...@de...>; tip...@li...; jon...@er...; ma...@do...
> Subject: Re: [net] tipc: fix false detection of retransmit failures
> 
> On 7/19/19 11:56 AM, Tuong Lien wrote:
>> This commit eliminates the use of the link 'stale_limit' & 'prev_from'
>> (besides the already removed - 'stale_cnt') variables in the detection
>> of repeated retransmit failures as there is no proper way to initialize
>> them to avoid a false detection, i.e. it is not really a retransmission
>> failure but due to a garbage values in the variables.
> 
> Sorry, I couldn't understand the reason why we have no proper way to
> initialize 'stale_limit' & 'prev_from' variables of tipc_link struct.
> 
> As far as I understand, the two variables have been set to zero when
> tipc_link object is allocated with kzalloc() in tipc_link_create().
> 
> Can you please help me understand what real scenario we cannot properly
> set them.
> 
> [Tuong]: Yes, these two variables have been initialized to zero at the link create but zero or any other value is not 'safe' for them, the retransmit failure criteria can be met without a real failure. Specifically, the 'time_after()' can return 'true' unexpectedly due to a garbage value (like zero...) of the 'stale_limit' unless it's intentionally set in the 1st condition. However, the 1st condition with the 'prev_from' is not always satisfied. In case the next 'from' is coincidentally identical to the 'prev_from', we will miss it.
> 
>> -	if (r->prev_from != from) {
>> -		r->prev_from = from;
>> -		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
>> -	} else if (time_after(jiffies, r->stale_limit)) {
>> -		pr_warn("Retransmission failure on link <%s>\n", l->name);
> 
> For example:
> 1) n-th retransmit: (prev_from = x, from = 100)
> ==> ok, we set the variables: prev_from = 100, stale_limit = jiffies + 1.5s
> 2) now, this pkt #100 was retransmitted successfully...
> 3) Later on, n+1-th retransmit: (prev_from = 100, from = 100)
> -> We don’t set the 'stale_limit' but do the “else if” and bump!
> 
> Now, we can try to reset or re-initialize the 'prev_from' somehow, e.g. when the pkt #100 is ack-ed & released, but what value will be for it? Note, any value is a valid sequence number, and if the next 'from' equals that value, we will face with the same trouble again.
> Trying to set the 'stale_limit' to very far in the future is irrelevant too because it turns out that we will disable the feature if the same 'from' is really faced with the repeated retransmit failure!
> 
>>
>> Instead, a jiffies variable will be added to individual skbs (like the
>> way we restrict the skb retransmissions) in order to mark the first skb
>> retransmit time. Later on, at the next retransmissions, the timestamp
>> will be checked to see if the skb in the link transmq is "too stale",
>> that is, the link tolerance time has passed, so that a link reset will
>> be ordered. Note, just checking on the first skb in the queue is fine
>> enough since it must be the oldest one.
>> A counter is also added to keep track the actual skb retransmissions'
>> number for later checking when the failure happens.
>>
>> The downside of this approach is that the skb->cb[] buffer is about to
>> be exhausted, however it is always able to allocate another memory area
>> and keep a reference to it when needed.
>>
>> Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
>> Reported-by: Hoang Le <hoa...@de...>
>> Signed-off-by: Tuong Lien <tuo...@de...>
>> ---
>>  net/tipc/link.c | 92 ++++++++++++++++++++++++++++++++-------------------------
>>  net/tipc/msg.h  |  8 +++--
>>  2 files changed, 57 insertions(+), 43 deletions(-)
>>
>> diff --git a/net/tipc/link.c b/net/tipc/link.c
>> index 66d3a07bc571..c2c5c53cad22 100644
>> --- a/net/tipc/link.c
>> +++ b/net/tipc/link.c
>> @@ -106,8 +106,6 @@ struct tipc_stats {
>>   * @transmitq: queue for sent, non-acked messages
>>   * @backlogq: queue for messages waiting to be sent
>>   * @snt_nxt: next sequence number to use for outbound messages
>> - * @prev_from: sequence number of most previous retransmission request
>> - * @stale_limit: time when repeated identical retransmits must force link reset
>>   * @ackers: # of peers that needs to ack each packet before it can be released
>>   * @acked: # last packet acked by a certain peer. Used for broadcast.
>>   * @rcv_nxt: next sequence number to expect for inbound messages
>> @@ -164,9 +162,7 @@ struct tipc_link {
>>  		u16 limit;
>>  	} backlog[5];
>>  	u16 snd_nxt;
>> -	u16 prev_from;
>>  	u16 window;
>> -	unsigned long stale_limit;
>>  
>>  	/* Reception */
>>  	u16 rcv_nxt;
>> @@ -1044,47 +1040,53 @@ static void tipc_link_advance_backlog(struct tipc_link *l,
>>   * link_retransmit_failure() - Detect repeated retransmit failures
>>   * @l: tipc link sender
>>   * @r: tipc link receiver (= l in case of unicast)
>> - * @from: seqno of the 1st packet in retransmit request
>>   * @rc: returned code
>>   *
>>   * Return: true if the repeated retransmit failures happens, otherwise
>>   * false
>>   */
>>  static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r,
>> -				    u16 from, int *rc)
>> +				    int *rc)
>>  {
>>  	struct sk_buff *skb = skb_peek(&l->transmq);
>>  	struct tipc_msg *hdr;
>>  
>>  	if (!skb)
>>  		return false;
>> -	hdr = buf_msg(skb);
>>  
>> -	/* Detect repeated retransmit failures on same packet */
>> -	if (r->prev_from != from) {
>> -		r->prev_from = from;
>> -		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
>> -	} else if (time_after(jiffies, r->stale_limit)) {
>> -		pr_warn("Retransmission failure on link <%s>\n", l->name);
>> -		link_print(l, "State of link ");
>> -		pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
>> -			msg_user(hdr), msg_type(hdr), msg_size(hdr),
>> -			msg_errcode(hdr));
>> -		pr_info("sqno %u, prev: %x, src: %x\n",
>> -			msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
>> -
>> -		trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
>> -		trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
>> -		trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
>> +	if (!TIPC_SKB_CB(skb)->retr_cnt)
>> +		return false;
>>  
>> -		if (link_is_bc_sndlink(l))
>> -			*rc = TIPC_LINK_DOWN_EVT;
>> +	if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp +
>> +			msecs_to_jiffies(r->tolerance)))
>> +		return false;
>> +
>> +	hdr = buf_msg(skb);
>> +	if (link_is_bc_sndlink(l) && !less(r->acked, msg_seqno(hdr)))
>> +		return false;
>>  
>> +	pr_warn("Retransmission failure on link <%s>\n", l->name);
>> +	link_print(l, "State of link ");
>> +	pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
>> +		msg_user(hdr), msg_type(hdr), msg_size(hdr), msg_errcode(hdr));
>> +	pr_info("sqno %u, prev: %x, dest: %x\n",
>> +		msg_seqno(hdr), msg_prevnode(hdr), msg_destnode(hdr));
>> +	pr_info("retr_stamp %d, retr_cnt %d\n",
>> +		jiffies_to_msecs(TIPC_SKB_CB(skb)->retr_stamp),
>> +		TIPC_SKB_CB(skb)->retr_cnt);
>> +
>> +	trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
>> +	trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
>> +	trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
>> +
>> +	if (link_is_bc_sndlink(l)) {
>> +		r->state = LINK_RESET;
>> +		*rc = TIPC_LINK_DOWN_EVT;
>> +	} else {
>>  		*rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
>> -		return true;
>>  	}
>>  
>> -	return false;
>> +	return true;
>>  }
>>  
>>  /* tipc_link_bc_retrans() - retransmit zero or more packets
>> @@ -1110,7 +1112,7 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>>  
>>  	trace_tipc_link_retrans(r, from, to, &l->transmq);
>>  
>> -	if (link_retransmit_failure(l, r, from, &rc))
>> +	if (link_retransmit_failure(l, r, &rc))
>>  		return rc;
>>  
>>  	skb_queue_walk(&l->transmq, skb) {
>> @@ -1119,11 +1121,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>>  			continue;
>>  		if (more(msg_seqno(hdr), to))
>>  			break;
>> -		if (link_is_bc_sndlink(l)) {
>> -			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
>> -				continue;
>> -			TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
>> -		}
>> +
>> +		if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
>> +			continue;
>> +		TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
>>  		_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE, GFP_ATOMIC);
>>  		if (!_skb)
>>  			return 0;
>> @@ -1133,6 +1134,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>>  		_skb->priority = TC_PRIO_CONTROL;
>>  		__skb_queue_tail(xmitq, _skb);
>>  		l->stats.retransmitted++;
>> +
>> +		/* Increase actual retrans counter & mark first time */
>> +		if (!TIPC_SKB_CB(skb)->retr_cnt++)
>> +			TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>>  	}
>>  	return 0;
>>  }
>> @@ -1357,12 +1362,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>>  	struct tipc_msg *hdr;
>>  	u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
>>  	u16 ack = l->rcv_nxt - 1;
>> +	bool passed = false;
>>  	u16 seqno, n = 0;
>>  	int rc = 0;
>>  
>> -	if (gap && link_retransmit_failure(l, l, acked + 1, &rc))
>> -		return rc;
>> -
> 
> I am also unable to understand why we would like to move
> link_retransmit_failure() to internal loop of skb_queue_walk_safe() by
> adding additional "passed" checking.
> 
> [Tuong]: Since the function does not only packet retransmissions but also releases, by moving the failure check into the loop at the 1st retransmission (if any) will allow some packets at the transmq head can be released first, before it would be claimed to be 'too stale'. This is the last chance for it!
> Also, since the 1st skb in the queue must be the oldest, we only need to check once, that's enough.
> 
> Thanks,
> Ying
> 
>>  	skb_queue_walk_safe(&l->transmq, skb, tmp) {
>>  		seqno = buf_seqno(skb);
>>  
>> @@ -1372,12 +1375,17 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>>  			__skb_unlink(skb, &l->transmq);
>>  			kfree_skb(skb);
>>  		} else if (less_eq(seqno, acked + gap)) {
>> -			/* retransmit skb */
>> +			/* First, check if repeated retrans failures occurs? */
>> +			if (!passed && link_retransmit_failure(l, l, &rc))
>> +				return rc;
>> +			passed = true;
>> +
>> +			/* retransmit skb if unrestricted*/
>>  			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
>>  				continue;
>>  			TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME;
>> -
>> -			_skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
>> +			_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE,
>> +					   GFP_ATOMIC);
>>  			if (!_skb)
>>  				continue;
>>  			hdr = buf_msg(_skb);
>> @@ -1386,6 +1394,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>>  			_skb->priority = TC_PRIO_CONTROL;
>>  			__skb_queue_tail(xmitq, _skb);
>>  			l->stats.retransmitted++;
>> +
>> +			/* Increase actual retrans counter & mark first time */
>> +			if (!TIPC_SKB_CB(skb)->retr_cnt++)
>> +				TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>>  		} else {
>>  			/* retry with Gap ACK blocks if any */
>>  			if (!ga || n >= ga->gack_cnt)
>> @@ -2577,7 +2589,7 @@ int tipc_link_dump(struct tipc_link *l, u16 dqueues, char *buf)
>>  	i += scnprintf(buf + i, sz - i, " %x", l->peer_caps);
>>  	i += scnprintf(buf + i, sz - i, " %u", l->silent_intv_cnt);
>>  	i += scnprintf(buf + i, sz - i, " %u", l->rst_cnt);
>> -	i += scnprintf(buf + i, sz - i, " %u", l->prev_from);
>> +	i += scnprintf(buf + i, sz - i, " %u", 0);
>>  	i += scnprintf(buf + i, sz - i, " %u", 0);
>>  	i += scnprintf(buf + i, sz - i, " %u", l->acked);
>>  
>> diff --git a/net/tipc/msg.h b/net/tipc/msg.h
>> index da509f0eb9ca..d7ebc9e955f6 100644
>> --- a/net/tipc/msg.h
>> +++ b/net/tipc/msg.h
>> @@ -102,13 +102,15 @@ struct plist;
>>  #define TIPC_MEDIA_INFO_OFFSET	5
>>  
>>  struct tipc_skb_cb {
>> -	u32 bytes_read;
>> -	u32 orig_member;
>>  	struct sk_buff *tail;
>>  	unsigned long nxt_retr;
>> -	bool validated;
>> +	unsigned long retr_stamp;
>> +	u32 bytes_read;
>> +	u32 orig_member;
>>  	u16 chain_imp;
>>  	u16 ackers;
>> +	u16 retr_cnt;
>> +	bool validated;
>>  };
>>  
>>  #define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0]))
>>
> 
>

Re: [tipc-discussion] memory leak in tipc_group_create_member

From: Ying X. <yin...@wi...> - 2019-08-02 12:31:19

On 8/2/19 3:44 PM, Hillf Danton wrote:
> 
> On Thu, 01 Aug 2019 19:38:06 -0700
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    a9815a4f Merge branch 'x86-urgent-for-linus' of git://git...
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12a6dbf0600000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=37c48fb52e3789e6
>> dashboard link: https://syzkaller.appspot.com/bug?extid=f95d90c454864b3b5bc9
>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13be3ecc600000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11c992b4600000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syz...@sy...
>>
>> executing program
>> BUG: memory leak
>> unreferenced object 0xffff888122bca200 (size 128):
>>    comm "syz-executor232", pid 7065, jiffies 4294943817 (age 8.880s)
>>    hex dump (first 32 bytes):
>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>      00 00 00 00 00 00 00 00 18 a2 bc 22 81 88 ff ff  ..........."....
>>    backtrace:
>>      [<000000005bada299>] kmemleak_alloc_recursive  include/linux/kmemleak.h:43 [inline]
>>      [<000000005bada299>] slab_post_alloc_hook mm/slab.h:522 [inline]
>>      [<000000005bada299>] slab_alloc mm/slab.c:3319 [inline]
>>      [<000000005bada299>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
>>      [<00000000e7bcdc9f>] kmalloc include/linux/slab.h:552 [inline]
>>      [<00000000e7bcdc9f>] kzalloc include/linux/slab.h:748 [inline]
>>      [<00000000e7bcdc9f>] tipc_group_create_member+0x3c/0x190  net/tipc/group.c:306
>>      [<0000000005f56f40>] tipc_group_add_member+0x34/0x40  net/tipc/group.c:327
>>      [<0000000044406683>] tipc_nametbl_build_group+0x9b/0xf0  net/tipc/name_table.c:600
>>      [<000000009f71e803>] tipc_sk_join net/tipc/socket.c:2901 [inline]
>>      [<000000009f71e803>] tipc_setsockopt+0x170/0x490 net/tipc/socket.c:3006
>>      [<000000007f61cbc2>] __sys_setsockopt+0x10f/0x220 net/socket.c:2084
>>      [<00000000cc630372>] __do_sys_setsockopt net/socket.c:2100 [inline]
>>      [<00000000cc630372>] __se_sys_setsockopt net/socket.c:2097 [inline]
>>      [<00000000cc630372>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2097
>>      [<00000000ec30be33>] do_syscall_64+0x76/0x1a0  arch/x86/entry/common.c:296
>>      [<00000000271be3e6>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>

Acked-by: Ying Xue <yin...@wi...>

>  
> --- a/net/tipc/group.c
> +++ b/net/tipc/group.c
> @@ -273,7 +273,7 @@ static struct tipc_member *tipc_group_fi
>  	return NULL;
>  }
>  
> -static void tipc_group_add_to_tree(struct tipc_group *grp,
> +static struct tipc_member *tipc_group_add_to_tree(struct tipc_group *grp,
>  				   struct tipc_member *m)
>  {
>  	u64 nkey, key = (u64)m->node << 32 | m->port;
> @@ -282,7 +282,6 @@ static void tipc_group_add_to_tree(struc
>  
>  	n = &grp->members.rb_node;
>  	while (*n) {
> -		tmp = container_of(*n, struct tipc_member, tree_node);
>  		parent = *n;
>  		tmp = container_of(parent, struct tipc_member, tree_node);
>  		nkey = (u64)tmp->node << 32 | tmp->port;
> @@ -291,17 +290,18 @@ static void tipc_group_add_to_tree(struc
>  		else if (key > nkey)
>  			n = &(*n)->rb_right;
>  		else
> -			return;
> +			return tmp;
>  	}
>  	rb_link_node(&m->tree_node, parent, n);
>  	rb_insert_color(&m->tree_node, &grp->members);
> +	return m;
>  }
>  
>  static struct tipc_member *tipc_group_create_member(struct tipc_group *grp,
>  						    u32 node, u32 port,
>  						    u32 instance, int state)
>  {
> -	struct tipc_member *m;
> +	struct tipc_member *m, *n;
>  
>  	m = kzalloc(sizeof(*m), GFP_ATOMIC);
>  	if (!m)
> @@ -315,10 +315,14 @@ static struct tipc_member *tipc_group_cr
>  	m->instance = instance;
>  	m->bc_acked = grp->bc_snd_nxt - 1;
>  	grp->member_cnt++;
> -	tipc_group_add_to_tree(grp, m);
> -	tipc_nlist_add(&grp->dests, m->node);
> -	m->state = state;
> -	return m;
> +	n = tipc_group_add_to_tree(grp, m);
> +	if (n == m) {
> +		tipc_nlist_add(&grp->dests, m->node);
> +		m->state = state;
> +	} else {
> +		kfree(m);
> +	}
> +	return n;
>  }
>  
>  void tipc_group_add_member(struct tipc_group *grp, u32 node,
> --
> 
>

Re: [tipc-discussion] [net-next 1/1] tipc: reduce risk of wakeup queue starvation

From: David M. <da...@da...> - 2019-08-01 22:20:13

From: Jon Maloy <jon...@er...>
Date: Tue, 30 Jul 2019 16:23:18 +0200

> In commit 365ad353c256 ("tipc: reduce risk of user starvation during
> link congestion") we allowed senders to add exactly one list of extra
> buffers to the link backlog queues during link congestion (aka
> "oversubscription"). However, the criteria for when to stop adding
> wakeup messages to the input queue when the overload abates is
> inaccurate, and may cause starvation problems during very high load.
> 
> Currently, we stop adding wakeup messages after 10 total failed attempts
> where we find that there is no space left in the backlog queue for a
> certain importance level. The counter for this is accumulated across all
> levels, which may lead the algorithm to leave the loop prematurely,
> although there may still be plenty of space available at some levels.
> The result is sometimes that messages near the wakeup queue tail are not
> added to the input queue as they should be.
> 
> We now introduce a more exact algorithm, where we keep adding wakeup
> messages to a level as long as the backlog queue has free slots for
> the corresponding level, and stop at the moment there are no more such
> slots or when there are no more wakeup messages to dequeue.
> 
> Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
> Reported-by: Tung Nguyen <tun...@de...>
> Acked-by: Ying Xue <yin...@wi...>
> Signed-off-by: Jon Maloy <jon...@er...>

Applied, thank you.

Re: [tipc-discussion] [PATCH] tipc: compat: allow tipc commands without arguments

From: David M. <da...@da...> - 2019-08-01 22:15:25

From: Taras Kondratiuk <tak...@ci...>
Date: Mon, 29 Jul 2019 22:15:07 +0000

> Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
> broke older tipc tools that use compat interface (e.g. tipc-config from
> tipcutils package):
 ...
> This patch relaxes the original fix and rejects messages without
> arguments only if such arguments are expected by a command (reg_type is
> non zero).
> 
> Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
> Signed-off-by: Taras Kondratiuk <tak...@ci...>
> ---
> The patch is based on v5.3-rc2.

Applied and queued up for -stable.

Re: [tipc-discussion] [PATCH] tipc: compat: allow tipc commands without arguments

From: Ying X. <yin...@wi...> - 2019-08-01 13:25:36

On 7/30/19 6:15 AM, Taras Kondratiuk wrote:
> Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
> broke older tipc tools that use compat interface (e.g. tipc-config from
> tipcutils package):
> 
> % tipc-config -p
> operation not supported
> 
> The commit started to reject TIPC netlink compat messages that do not
> have attributes. It is too restrictive because some of such messages are
> valid (they don't need any arguments):
> 
> % grep 'tx none' include/uapi/linux/tipc_config.h
> #define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
> #define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
> #define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
> #define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
> #define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
> #define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
> #define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
> #define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */
> 
> This patch relaxes the original fix and rejects messages without
> arguments only if such arguments are expected by a command (reg_type is
> non zero).
> 
> Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
> Cc: st...@vg...
> Signed-off-by: Taras Kondratiuk <tak...@ci...>

Acked-by: Ying Xue <yin...@wi...>

> ---
> The patch is based on v5.3-rc2.
> 
>  net/tipc/netlink_compat.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
> index d86030ef1232..e135d4e11231 100644
> --- a/net/tipc/netlink_compat.c
> +++ b/net/tipc/netlink_compat.c
> @@ -55,6 +55,7 @@ struct tipc_nl_compat_msg {
>  	int rep_type;
>  	int rep_size;
>  	int req_type;
> +	int req_size;
>  	struct net *net;
>  	struct sk_buff *rep;
>  	struct tlv_desc *req;
> @@ -257,7 +258,8 @@ static int tipc_nl_compat_dumpit(struct tipc_nl_compat_cmd_dump *cmd,
>  	int err;
>  	struct sk_buff *arg;
>  
> -	if (msg->req_type && !TLV_CHECK_TYPE(msg->req, msg->req_type))
> +	if (msg->req_type && (!msg->req_size ||
> +			      !TLV_CHECK_TYPE(msg->req, msg->req_type)))
>  		return -EINVAL;
>  
>  	msg->rep = tipc_tlv_alloc(msg->rep_size);
> @@ -354,7 +356,8 @@ static int tipc_nl_compat_doit(struct tipc_nl_compat_cmd_doit *cmd,
>  {
>  	int err;
>  
> -	if (msg->req_type && !TLV_CHECK_TYPE(msg->req, msg->req_type))
> +	if (msg->req_type && (!msg->req_size ||
> +			      !TLV_CHECK_TYPE(msg->req, msg->req_type)))
>  		return -EINVAL;
>  
>  	err = __tipc_nl_compat_doit(cmd, msg);
> @@ -1278,8 +1281,8 @@ static int tipc_nl_compat_recv(struct sk_buff *skb, struct genl_info *info)
>  		goto send;
>  	}
>  
> -	len = nlmsg_attrlen(req_nlh, GENL_HDRLEN + TIPC_GENL_HDRLEN);
> -	if (!len || !TLV_OK(msg.req, len)) {
> +	msg.req_size = nlmsg_attrlen(req_nlh, GENL_HDRLEN + TIPC_GENL_HDRLEN);
> +	if (msg.req_size && !TLV_OK(msg.req, msg.req_size)) {
>  		msg.rep = tipc_get_err_tlv(TIPC_CFG_NOT_SUPPORTED);
>  		err = -EOPNOTSUPP;
>  		goto send;
>

Re: [tipc-discussion] [net] tipc: fix false detection of retransmit failures

From: Tuong L. T. <tuo...@de...> - 2019-08-01 02:59:08

Hi Ying,

See below my answers inline.

BR/Tuong

-----Original Message-----
From: Ying Xue <yin...@wi...> 
Sent: Wednesday, July 31, 2019 8:21 PM
To: Tuong Lien <tuo...@de...>; tip...@li...; jon...@er...; ma...@do...
Subject: Re: [net] tipc: fix false detection of retransmit failures

On 7/19/19 11:56 AM, Tuong Lien wrote:
> This commit eliminates the use of the link 'stale_limit' & 'prev_from'
> (besides the already removed - 'stale_cnt') variables in the detection
> of repeated retransmit failures as there is no proper way to initialize
> them to avoid a false detection, i.e. it is not really a retransmission
> failure but due to a garbage values in the variables.

Sorry, I couldn't understand the reason why we have no proper way to
initialize 'stale_limit' & 'prev_from' variables of tipc_link struct.

As far as I understand, the two variables have been set to zero when
tipc_link object is allocated with kzalloc() in tipc_link_create().

Can you please help me understand what real scenario we cannot properly
set them.

[Tuong]: Yes, these two variables have been initialized to zero at the link create but zero or any other value is not 'safe' for them, the retransmit failure criteria can be met without a real failure. Specifically, the 'time_after()' can return 'true' unexpectedly due to a garbage value (like zero...) of the 'stale_limit' unless it's intentionally set in the 1st condition. However, the 1st condition with the 'prev_from' is not always satisfied. In case the next 'from' is coincidentally identical to the 'prev_from', we will miss it.

> -	if (r->prev_from != from) {
> -		r->prev_from = from;
> -		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
> -	} else if (time_after(jiffies, r->stale_limit)) {
> -		pr_warn("Retransmission failure on link <%s>\n", l->name);

For example:
1) n-th retransmit: (prev_from = x, from = 100)
==> ok, we set the variables: prev_from = 100, stale_limit = jiffies + 1.5s
2) now, this pkt #100 was retransmitted successfully...
3) Later on, n+1-th retransmit: (prev_from = 100, from = 100)
-> We don’t set the 'stale_limit' but do the “else if” and bump!

Now, we can try to reset or re-initialize the 'prev_from' somehow, e.g. when the pkt #100 is ack-ed & released, but what value will be for it? Note, any value is a valid sequence number, and if the next 'from' equals that value, we will face with the same trouble again.
Trying to set the 'stale_limit' to very far in the future is irrelevant too because it turns out that we will disable the feature if the same 'from' is really faced with the repeated retransmit failure!

> 
> Instead, a jiffies variable will be added to individual skbs (like the
> way we restrict the skb retransmissions) in order to mark the first skb
> retransmit time. Later on, at the next retransmissions, the timestamp
> will be checked to see if the skb in the link transmq is "too stale",
> that is, the link tolerance time has passed, so that a link reset will
> be ordered. Note, just checking on the first skb in the queue is fine
> enough since it must be the oldest one.
> A counter is also added to keep track the actual skb retransmissions'
> number for later checking when the failure happens.
> 
> The downside of this approach is that the skb->cb[] buffer is about to
> be exhausted, however it is always able to allocate another memory area
> and keep a reference to it when needed.
> 
> Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
> Reported-by: Hoang Le <hoa...@de...>
> Signed-off-by: Tuong Lien <tuo...@de...>
> ---
>  net/tipc/link.c | 92 ++++++++++++++++++++++++++++++++-------------------------
>  net/tipc/msg.h  |  8 +++--
>  2 files changed, 57 insertions(+), 43 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index 66d3a07bc571..c2c5c53cad22 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -106,8 +106,6 @@ struct tipc_stats {
>   * @transmitq: queue for sent, non-acked messages
>   * @backlogq: queue for messages waiting to be sent
>   * @snt_nxt: next sequence number to use for outbound messages
> - * @prev_from: sequence number of most previous retransmission request
> - * @stale_limit: time when repeated identical retransmits must force link reset
>   * @ackers: # of peers that needs to ack each packet before it can be released
>   * @acked: # last packet acked by a certain peer. Used for broadcast.
>   * @rcv_nxt: next sequence number to expect for inbound messages
> @@ -164,9 +162,7 @@ struct tipc_link {
>  		u16 limit;
>  	} backlog[5];
>  	u16 snd_nxt;
> -	u16 prev_from;
>  	u16 window;
> -	unsigned long stale_limit;
>  
>  	/* Reception */
>  	u16 rcv_nxt;
> @@ -1044,47 +1040,53 @@ static void tipc_link_advance_backlog(struct tipc_link *l,
>   * link_retransmit_failure() - Detect repeated retransmit failures
>   * @l: tipc link sender
>   * @r: tipc link receiver (= l in case of unicast)
> - * @from: seqno of the 1st packet in retransmit request
>   * @rc: returned code
>   *
>   * Return: true if the repeated retransmit failures happens, otherwise
>   * false
>   */
>  static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r,
> -				    u16 from, int *rc)
> +				    int *rc)
>  {
>  	struct sk_buff *skb = skb_peek(&l->transmq);
>  	struct tipc_msg *hdr;
>  
>  	if (!skb)
>  		return false;
> -	hdr = buf_msg(skb);
>  
> -	/* Detect repeated retransmit failures on same packet */
> -	if (r->prev_from != from) {
> -		r->prev_from = from;
> -		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
> -	} else if (time_after(jiffies, r->stale_limit)) {
> -		pr_warn("Retransmission failure on link <%s>\n", l->name);
> -		link_print(l, "State of link ");
> -		pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
> -			msg_user(hdr), msg_type(hdr), msg_size(hdr),
> -			msg_errcode(hdr));
> -		pr_info("sqno %u, prev: %x, src: %x\n",
> -			msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
> -
> -		trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
> -		trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
> -		trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
> +	if (!TIPC_SKB_CB(skb)->retr_cnt)
> +		return false;
>  
> -		if (link_is_bc_sndlink(l))
> -			*rc = TIPC_LINK_DOWN_EVT;
> +	if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp +
> +			msecs_to_jiffies(r->tolerance)))
> +		return false;
> +
> +	hdr = buf_msg(skb);
> +	if (link_is_bc_sndlink(l) && !less(r->acked, msg_seqno(hdr)))
> +		return false;
>  
> +	pr_warn("Retransmission failure on link <%s>\n", l->name);
> +	link_print(l, "State of link ");
> +	pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
> +		msg_user(hdr), msg_type(hdr), msg_size(hdr), msg_errcode(hdr));
> +	pr_info("sqno %u, prev: %x, dest: %x\n",
> +		msg_seqno(hdr), msg_prevnode(hdr), msg_destnode(hdr));
> +	pr_info("retr_stamp %d, retr_cnt %d\n",
> +		jiffies_to_msecs(TIPC_SKB_CB(skb)->retr_stamp),
> +		TIPC_SKB_CB(skb)->retr_cnt);
> +
> +	trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
> +	trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
> +	trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
> +
> +	if (link_is_bc_sndlink(l)) {
> +		r->state = LINK_RESET;
> +		*rc = TIPC_LINK_DOWN_EVT;
> +	} else {
>  		*rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
> -		return true;
>  	}
>  
> -	return false;
> +	return true;
>  }
>  
>  /* tipc_link_bc_retrans() - retransmit zero or more packets
> @@ -1110,7 +1112,7 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>  
>  	trace_tipc_link_retrans(r, from, to, &l->transmq);
>  
> -	if (link_retransmit_failure(l, r, from, &rc))
> +	if (link_retransmit_failure(l, r, &rc))
>  		return rc;
>  
>  	skb_queue_walk(&l->transmq, skb) {
> @@ -1119,11 +1121,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>  			continue;
>  		if (more(msg_seqno(hdr), to))
>  			break;
> -		if (link_is_bc_sndlink(l)) {
> -			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
> -				continue;
> -			TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
> -		}
> +
> +		if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
> +			continue;
> +		TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
>  		_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE, GFP_ATOMIC);
>  		if (!_skb)
>  			return 0;
> @@ -1133,6 +1134,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>  		_skb->priority = TC_PRIO_CONTROL;
>  		__skb_queue_tail(xmitq, _skb);
>  		l->stats.retransmitted++;
> +
> +		/* Increase actual retrans counter & mark first time */
> +		if (!TIPC_SKB_CB(skb)->retr_cnt++)
> +			TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>  	}
>  	return 0;
>  }
> @@ -1357,12 +1362,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>  	struct tipc_msg *hdr;
>  	u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
>  	u16 ack = l->rcv_nxt - 1;
> +	bool passed = false;
>  	u16 seqno, n = 0;
>  	int rc = 0;
>  
> -	if (gap && link_retransmit_failure(l, l, acked + 1, &rc))
> -		return rc;
> -

I am also unable to understand why we would like to move
link_retransmit_failure() to internal loop of skb_queue_walk_safe() by
adding additional "passed" checking.

[Tuong]: Since the function does not only packet retransmissions but also releases, by moving the failure check into the loop at the 1st retransmission (if any) will allow some packets at the transmq head can be released first, before it would be claimed to be 'too stale'. This is the last chance for it!
Also, since the 1st skb in the queue must be the oldest, we only need to check once, that's enough.

Thanks,
Ying

>  	skb_queue_walk_safe(&l->transmq, skb, tmp) {
>  		seqno = buf_seqno(skb);
>  
> @@ -1372,12 +1375,17 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>  			__skb_unlink(skb, &l->transmq);
>  			kfree_skb(skb);
>  		} else if (less_eq(seqno, acked + gap)) {
> -			/* retransmit skb */
> +			/* First, check if repeated retrans failures occurs? */
> +			if (!passed && link_retransmit_failure(l, l, &rc))
> +				return rc;
> +			passed = true;
> +
> +			/* retransmit skb if unrestricted*/
>  			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
>  				continue;
>  			TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME;
> -
> -			_skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
> +			_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE,
> +					   GFP_ATOMIC);
>  			if (!_skb)
>  				continue;
>  			hdr = buf_msg(_skb);
> @@ -1386,6 +1394,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>  			_skb->priority = TC_PRIO_CONTROL;
>  			__skb_queue_tail(xmitq, _skb);
>  			l->stats.retransmitted++;
> +
> +			/* Increase actual retrans counter & mark first time */
> +			if (!TIPC_SKB_CB(skb)->retr_cnt++)
> +				TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>  		} else {
>  			/* retry with Gap ACK blocks if any */
>  			if (!ga || n >= ga->gack_cnt)
> @@ -2577,7 +2589,7 @@ int tipc_link_dump(struct tipc_link *l, u16 dqueues, char *buf)
>  	i += scnprintf(buf + i, sz - i, " %x", l->peer_caps);
>  	i += scnprintf(buf + i, sz - i, " %u", l->silent_intv_cnt);
>  	i += scnprintf(buf + i, sz - i, " %u", l->rst_cnt);
> -	i += scnprintf(buf + i, sz - i, " %u", l->prev_from);
> +	i += scnprintf(buf + i, sz - i, " %u", 0);
>  	i += scnprintf(buf + i, sz - i, " %u", 0);
>  	i += scnprintf(buf + i, sz - i, " %u", l->acked);
>  
> diff --git a/net/tipc/msg.h b/net/tipc/msg.h
> index da509f0eb9ca..d7ebc9e955f6 100644
> --- a/net/tipc/msg.h
> +++ b/net/tipc/msg.h
> @@ -102,13 +102,15 @@ struct plist;
>  #define TIPC_MEDIA_INFO_OFFSET	5
>  
>  struct tipc_skb_cb {
> -	u32 bytes_read;
> -	u32 orig_member;
>  	struct sk_buff *tail;
>  	unsigned long nxt_retr;
> -	bool validated;
> +	unsigned long retr_stamp;
> +	u32 bytes_read;
> +	u32 orig_member;
>  	u16 chain_imp;
>  	u16 ackers;
> +	u16 retr_cnt;
> +	bool validated;
>  };
>  
>  #define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0]))
>

Re: [tipc-discussion] [PATCH] tipc: compat: allow tipc commands without arguments

From: David M. <da...@da...> - 2019-07-31 22:53:05

Jon et al., please review.

Thank you.

Re: [tipc-discussion] [net-next v3] tipc: add loopback device tracking

From: Ying X. <yin...@wi...> - 2019-07-31 13:35:19

On 7/19/19 11:10 AM, joh...@de... wrote:
> From: John Rutherford <joh...@de...>
> 
> Since node internal messages are passed directly to the socket, it is not
> possible to observe those messages via tcpdump or wireshark.
> 
> We now remedy this by making it possible to clone such messages and send
> the clones to the loopback interface.  The clones are dropped at reception
> and have no functional role except making the traffic visible.
> 
> The feature is enabled if network taps are active for the loopback device.
> pcap filtering restrictions require the messages to be presented to the
> receiving side of the loopback device.
> 
> v3 - Function dev_nit_active used to check for network taps.
>    - Procedure netif_rx_ni used to send cloned messages to loopback device.
> 
> Signed-off-by: John Rutherford <joh...@de...>

Acked-by: Ying Xue <yin...@wi...>

> ---
>  net/tipc/bcast.c  |  4 +++-
>  net/tipc/bearer.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/tipc/bearer.h | 10 +++++++++
>  net/tipc/core.c   |  5 +++++
>  net/tipc/core.h   |  3 +++
>  net/tipc/node.c   |  1 +
>  net/tipc/topsrv.c |  2 ++
>  7 files changed, 89 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index 6c997d4..235331d 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
>  			rc = tipc_bcast_xmit(net, pkts, cong_link_cnt);
>  	}
>  
> -	if (dests->local)
> +	if (dests->local) {
> +		tipc_loopback_trace(net, &localq);
>  		tipc_sk_mcast_rcv(net, &localq, &inputq);
> +	}
>  exit:
>  	/* This queue should normally be empty by now */
>  	__skb_queue_purge(pkts);
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index 2bed658..5273ec5 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -389,6 +389,12 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
>  		dev_put(dev);
>  		return -EINVAL;
>  	}
> +	if (dev == net->loopback_dev) {
> +		dev_put(dev);
> +		pr_info("Bearer <%s>: loopback device not permitted\n",
> +			b->name);
> +		return -EINVAL;
> +	}
>  
>  	/* Autoconfigure own node identity if needed */
>  	if (!tipc_own_id(net) && hwaddr_len <= NODE_ID_LEN) {
> @@ -674,6 +680,65 @@ void tipc_bearer_stop(struct net *net)
>  	}
>  }
>  
> +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts)
> +{
> +	struct net_device *dev = net->loopback_dev;
> +	struct sk_buff *skb, *_skb;
> +	int exp;
> +
> +	skb_queue_walk(pkts, _skb) {
> +		skb = pskb_copy(_skb, GFP_ATOMIC);
> +		if (!skb)
> +			continue;
> +
> +		exp = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb));
> +		if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) {
> +			kfree_skb(skb);
> +			continue;
> +		}
> +
> +		skb_reset_network_header(skb);
> +		dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr,
> +				dev->dev_addr, skb->len);
> +		skb->dev = dev;
> +		skb->pkt_type = PACKET_HOST;
> +		skb->ip_summed = CHECKSUM_UNNECESSARY;
> +		skb->protocol = eth_type_trans(skb, dev);
> +		netif_rx_ni(skb);
> +	}
> +}
> +
> +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev,
> +				 struct packet_type *pt, struct net_device *od)
> +{
> +	consume_skb(skb);
> +	return NET_RX_SUCCESS;
> +}
> +
> +int tipc_attach_loopback(struct net *net)
> +{
> +	struct net_device *dev = net->loopback_dev;
> +	struct tipc_net *tn = tipc_net(net);
> +
> +	if (!dev)
> +		return -ENODEV;
> +
> +	dev_hold(dev);
> +	tn->loopback_pt.dev = dev;
> +	tn->loopback_pt.type = htons(ETH_P_TIPC);
> +	tn->loopback_pt.func = tipc_loopback_rcv_pkt;
> +	dev_add_pack(&tn->loopback_pt);
> +	return 0;
> +}
> +
> +void tipc_detach_loopback(struct net *net)
> +{
> +	struct tipc_net *tn = tipc_net(net);
> +
> +	dev_remove_pack(&tn->loopback_pt);
> +	dev_put(net->loopback_dev);
> +}
> +
>  /* Caller should hold rtnl_lock to protect the bearer */
>  static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
>  				struct tipc_bearer *bearer, int nlflags)
> diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
> index 7f4c569..ea0f3c4 100644
> --- a/net/tipc/bearer.h
> +++ b/net/tipc/bearer.h
> @@ -232,6 +232,16 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
>  		      struct tipc_media_addr *dst);
>  void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
>  			 struct sk_buff_head *xmitq);
> +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts);
> +int tipc_attach_loopback(struct net *net);
> +void tipc_detach_loopback(struct net *net);
> +
> +static inline void tipc_loopback_trace(struct net *net,
> +				       struct sk_buff_head *pkts)
> +{
> +	if (unlikely(dev_nit_active(net->loopback_dev)))
> +		tipc_clone_to_loopback(net, pkts);
> +}
>  
>  /* check if device MTU is too low for tipc headers */
>  static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve)
> diff --git a/net/tipc/core.c b/net/tipc/core.c
> index c837072..23cb379 100644
> --- a/net/tipc/core.c
> +++ b/net/tipc/core.c
> @@ -82,6 +82,10 @@ static int __net_init tipc_init_net(struct net *net)
>  	if (err)
>  		goto out_bclink;
>  
> +	err = tipc_attach_loopback(net);
> +	if (err)
> +		goto out_bclink;
> +
>  	return 0;
>  
>  out_bclink:
> @@ -94,6 +98,7 @@ static int __net_init tipc_init_net(struct net *net)
>  
>  static void __net_exit tipc_exit_net(struct net *net)
>  {
> +	tipc_detach_loopback(net);
>  	tipc_net_stop(net);
>  	tipc_bcast_stop(net);
>  	tipc_nametbl_stop(net);
> diff --git a/net/tipc/core.h b/net/tipc/core.h
> index 7a68e1b..60d8295 100644
> --- a/net/tipc/core.h
> +++ b/net/tipc/core.h
> @@ -125,6 +125,9 @@ struct tipc_net {
>  
>  	/* Cluster capabilities */
>  	u16 capabilities;
> +
> +	/* Tracing of node internal messages */
> +	struct packet_type loopback_pt;
>  };
>  
>  static inline struct tipc_net *tipc_net(struct net *net)
> diff --git a/net/tipc/node.c b/net/tipc/node.c
> index 550581d..16d251b 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -1443,6 +1443,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list,
>  	int rc;
>  
>  	if (in_own_node(net, dnode)) {
> +		tipc_loopback_trace(net, list);
>  		tipc_sk_rcv(net, list);
>  		return 0;
>  	}
> diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
> index f345662..e3a6ba1 100644
> --- a/net/tipc/topsrv.c
> +++ b/net/tipc/topsrv.c
> @@ -40,6 +40,7 @@
>  #include "socket.h"
>  #include "addr.h"
>  #include "msg.h"
> +#include "bearer.h"
>  #include <net/sock.h>
>  #include <linux/module.h>
>  
> @@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net, struct tipc_event *evt)
>  	memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt));
>  	skb_queue_head_init(&evtq);
>  	__skb_queue_tail(&evtq, skb);
> +	tipc_loopback_trace(net, &evtq);
>  	tipc_sk_rcv(net, &evtq);
>  }
>  
>

Re: [tipc-discussion] [net] tipc: fix false detection of retransmit failures

From: Ying X. <yin...@wi...> - 2019-07-31 13:33:31

On 7/19/19 11:56 AM, Tuong Lien wrote:
> This commit eliminates the use of the link 'stale_limit' & 'prev_from'
> (besides the already removed - 'stale_cnt') variables in the detection
> of repeated retransmit failures as there is no proper way to initialize
> them to avoid a false detection, i.e. it is not really a retransmission
> failure but due to a garbage values in the variables.

Sorry, I couldn't understand the reason why we have no proper way to
initialize 'stale_limit' & 'prev_from' variables of tipc_link struct.

As far as I understand, the two variables have been set to zero when
tipc_link object is allocated with kzalloc() in tipc_link_create().

Can you please help me understand what real scenario we cannot properly
set them.

> 
> Instead, a jiffies variable will be added to individual skbs (like the
> way we restrict the skb retransmissions) in order to mark the first skb
> retransmit time. Later on, at the next retransmissions, the timestamp
> will be checked to see if the skb in the link transmq is "too stale",
> that is, the link tolerance time has passed, so that a link reset will
> be ordered. Note, just checking on the first skb in the queue is fine
> enough since it must be the oldest one.
> A counter is also added to keep track the actual skb retransmissions'
> number for later checking when the failure happens.
> 
> The downside of this approach is that the skb->cb[] buffer is about to
> be exhausted, however it is always able to allocate another memory area
> and keep a reference to it when needed.
> 
> Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
> Reported-by: Hoang Le <hoa...@de...>
> Signed-off-by: Tuong Lien <tuo...@de...>
> ---
>  net/tipc/link.c | 92 ++++++++++++++++++++++++++++++++-------------------------
>  net/tipc/msg.h  |  8 +++--
>  2 files changed, 57 insertions(+), 43 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index 66d3a07bc571..c2c5c53cad22 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -106,8 +106,6 @@ struct tipc_stats {
>   * @transmitq: queue for sent, non-acked messages
>   * @backlogq: queue for messages waiting to be sent
>   * @snt_nxt: next sequence number to use for outbound messages
> - * @prev_from: sequence number of most previous retransmission request
> - * @stale_limit: time when repeated identical retransmits must force link reset
>   * @ackers: # of peers that needs to ack each packet before it can be released
>   * @acked: # last packet acked by a certain peer. Used for broadcast.
>   * @rcv_nxt: next sequence number to expect for inbound messages
> @@ -164,9 +162,7 @@ struct tipc_link {
>  		u16 limit;
>  	} backlog[5];
>  	u16 snd_nxt;
> -	u16 prev_from;
>  	u16 window;
> -	unsigned long stale_limit;
>  
>  	/* Reception */
>  	u16 rcv_nxt;
> @@ -1044,47 +1040,53 @@ static void tipc_link_advance_backlog(struct tipc_link *l,
>   * link_retransmit_failure() - Detect repeated retransmit failures
>   * @l: tipc link sender
>   * @r: tipc link receiver (= l in case of unicast)
> - * @from: seqno of the 1st packet in retransmit request
>   * @rc: returned code
>   *
>   * Return: true if the repeated retransmit failures happens, otherwise
>   * false
>   */
>  static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r,
> -				    u16 from, int *rc)
> +				    int *rc)
>  {
>  	struct sk_buff *skb = skb_peek(&l->transmq);
>  	struct tipc_msg *hdr;
>  
>  	if (!skb)
>  		return false;
> -	hdr = buf_msg(skb);
>  
> -	/* Detect repeated retransmit failures on same packet */
> -	if (r->prev_from != from) {
> -		r->prev_from = from;
> -		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
> -	} else if (time_after(jiffies, r->stale_limit)) {
> -		pr_warn("Retransmission failure on link <%s>\n", l->name);
> -		link_print(l, "State of link ");
> -		pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
> -			msg_user(hdr), msg_type(hdr), msg_size(hdr),
> -			msg_errcode(hdr));
> -		pr_info("sqno %u, prev: %x, src: %x\n",
> -			msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
> -
> -		trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
> -		trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
> -		trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
> +	if (!TIPC_SKB_CB(skb)->retr_cnt)
> +		return false;
>  
> -		if (link_is_bc_sndlink(l))
> -			*rc = TIPC_LINK_DOWN_EVT;
> +	if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp +
> +			msecs_to_jiffies(r->tolerance)))
> +		return false;
> +
> +	hdr = buf_msg(skb);
> +	if (link_is_bc_sndlink(l) && !less(r->acked, msg_seqno(hdr)))
> +		return false;
>  
> +	pr_warn("Retransmission failure on link <%s>\n", l->name);
> +	link_print(l, "State of link ");
> +	pr_info("Failed msg: usr %u, typ %u, len %u, err %u\n",
> +		msg_user(hdr), msg_type(hdr), msg_size(hdr), msg_errcode(hdr));
> +	pr_info("sqno %u, prev: %x, dest: %x\n",
> +		msg_seqno(hdr), msg_prevnode(hdr), msg_destnode(hdr));
> +	pr_info("retr_stamp %d, retr_cnt %d\n",
> +		jiffies_to_msecs(TIPC_SKB_CB(skb)->retr_stamp),
> +		TIPC_SKB_CB(skb)->retr_cnt);
> +
> +	trace_tipc_list_dump(&l->transmq, true, "retrans failure!");
> +	trace_tipc_link_dump(l, TIPC_DUMP_NONE, "retrans failure!");
> +	trace_tipc_link_dump(r, TIPC_DUMP_NONE, "retrans failure!");
> +
> +	if (link_is_bc_sndlink(l)) {
> +		r->state = LINK_RESET;
> +		*rc = TIPC_LINK_DOWN_EVT;
> +	} else {
>  		*rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
> -		return true;
>  	}
>  
> -	return false;
> +	return true;
>  }
>  
>  /* tipc_link_bc_retrans() - retransmit zero or more packets
> @@ -1110,7 +1112,7 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>  
>  	trace_tipc_link_retrans(r, from, to, &l->transmq);
>  
> -	if (link_retransmit_failure(l, r, from, &rc))
> +	if (link_retransmit_failure(l, r, &rc))
>  		return rc;
>  
>  	skb_queue_walk(&l->transmq, skb) {
> @@ -1119,11 +1121,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>  			continue;
>  		if (more(msg_seqno(hdr), to))
>  			break;
> -		if (link_is_bc_sndlink(l)) {
> -			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
> -				continue;
> -			TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
> -		}
> +
> +		if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
> +			continue;
> +		TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM;
>  		_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE, GFP_ATOMIC);
>  		if (!_skb)
>  			return 0;
> @@ -1133,6 +1134,10 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r,
>  		_skb->priority = TC_PRIO_CONTROL;
>  		__skb_queue_tail(xmitq, _skb);
>  		l->stats.retransmitted++;
> +
> +		/* Increase actual retrans counter & mark first time */
> +		if (!TIPC_SKB_CB(skb)->retr_cnt++)
> +			TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>  	}
>  	return 0;
>  }
> @@ -1357,12 +1362,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>  	struct tipc_msg *hdr;
>  	u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
>  	u16 ack = l->rcv_nxt - 1;
> +	bool passed = false;
>  	u16 seqno, n = 0;
>  	int rc = 0;
>  
> -	if (gap && link_retransmit_failure(l, l, acked + 1, &rc))
> -		return rc;
> -

I am also unable to understand why we would like to move
link_retransmit_failure() to internal loop of skb_queue_walk_safe() by
adding additional "passed" checking.

Thanks,
Ying

>  	skb_queue_walk_safe(&l->transmq, skb, tmp) {
>  		seqno = buf_seqno(skb);
>  
> @@ -1372,12 +1375,17 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>  			__skb_unlink(skb, &l->transmq);
>  			kfree_skb(skb);
>  		} else if (less_eq(seqno, acked + gap)) {
> -			/* retransmit skb */
> +			/* First, check if repeated retrans failures occurs? */
> +			if (!passed && link_retransmit_failure(l, l, &rc))
> +				return rc;
> +			passed = true;
> +
> +			/* retransmit skb if unrestricted*/
>  			if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
>  				continue;
>  			TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME;
> -
> -			_skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
> +			_skb = __pskb_copy(skb, LL_MAX_HEADER + MIN_H_SIZE,
> +					   GFP_ATOMIC);
>  			if (!_skb)
>  				continue;
>  			hdr = buf_msg(_skb);
> @@ -1386,6 +1394,10 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
>  			_skb->priority = TC_PRIO_CONTROL;
>  			__skb_queue_tail(xmitq, _skb);
>  			l->stats.retransmitted++;
> +
> +			/* Increase actual retrans counter & mark first time */
> +			if (!TIPC_SKB_CB(skb)->retr_cnt++)
> +				TIPC_SKB_CB(skb)->retr_stamp = jiffies;
>  		} else {
>  			/* retry with Gap ACK blocks if any */
>  			if (!ga || n >= ga->gack_cnt)
> @@ -2577,7 +2589,7 @@ int tipc_link_dump(struct tipc_link *l, u16 dqueues, char *buf)
>  	i += scnprintf(buf + i, sz - i, " %x", l->peer_caps);
>  	i += scnprintf(buf + i, sz - i, " %u", l->silent_intv_cnt);
>  	i += scnprintf(buf + i, sz - i, " %u", l->rst_cnt);
> -	i += scnprintf(buf + i, sz - i, " %u", l->prev_from);
> +	i += scnprintf(buf + i, sz - i, " %u", 0);
>  	i += scnprintf(buf + i, sz - i, " %u", 0);
>  	i += scnprintf(buf + i, sz - i, " %u", l->acked);
>  
> diff --git a/net/tipc/msg.h b/net/tipc/msg.h
> index da509f0eb9ca..d7ebc9e955f6 100644
> --- a/net/tipc/msg.h
> +++ b/net/tipc/msg.h
> @@ -102,13 +102,15 @@ struct plist;
>  #define TIPC_MEDIA_INFO_OFFSET	5
>  
>  struct tipc_skb_cb {
> -	u32 bytes_read;
> -	u32 orig_member;
>  	struct sk_buff *tail;
>  	unsigned long nxt_retr;
> -	bool validated;
> +	unsigned long retr_stamp;
> +	u32 bytes_read;
> +	u32 orig_member;
>  	u16 chain_imp;
>  	u16 ackers;
> +	u16 retr_cnt;
> +	bool validated;
>  };
>  
>  #define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0]))
>

Re: [tipc-discussion] [net 1/1] tipc: fix unitilized skb list crash

From: David M. <da...@da...> - 2019-07-30 21:40:37

From: Jon Maloy <jon...@er...>
Date: Tue, 30 Jul 2019 20:19:10 +0200

> Our test suite somtimes provokes the following crash:
> 
> Description of problem:
 ...
> The reason is that the skb list tipc_socket::mc_method.deferredq only
> is initialized for connectionless sockets, while nothing stops arriving
> multicast messages from being filtered by connection oriented sockets,
> with subsequent access to the said list.
> 
> We fix this by initializing the list unconditionally at socket creation.
> This eliminates the crash, while the message still is dropped further
> down in tipc_sk_filter_rcv() as it should be.
> 
> Reported-by: Li Shuang <sh...@re...>
> Signed-off-by: Jon Maloy <jon...@er...>

Applied and queued up for -stable, thank you.

[tipc-discussion] [net 1/1] tipc: fix unitilized skb list crash

From: Jon M. <jon...@er...> - 2019-07-30 18:19:21

Our test suite somtimes provokes the following crash:

Description of problem:
[ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
[ 1092.605072] PGD 0 P4D 0
[ 1092.607620] Oops: 0000 [#1] SMP PTI
[ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1
[ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018
[ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
[ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
[ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
[ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045
[ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000
[ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a
[ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000
[ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a
[ 1092.692614] FS:  0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000
[ 1092.700702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0
[ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1092.727843] PKRU: 55555554
[ 1092.730556] Call Trace:
[ 1092.733010]  <IRQ>
[ 1092.735034]  tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
[ 1092.739828]  ? __kmalloc_node_track_caller+0x1cb/0x290
[ 1092.744974]  ? dev_hard_start_xmit+0xa5/0x210
[ 1092.749332]  tipc_sk_rcv+0x389/0x640 [tipc]
[ 1092.753519]  tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
[ 1092.758224]  tipc_rcv+0x57a/0xf20 [tipc]
[ 1092.762154]  ? ktime_get_real_ts64+0x40/0xe0
[ 1092.766432]  ? tpacket_rcv+0x50/0x9f0
[ 1092.770098]  tipc_l2_rcv_msg+0x4a/0x70 [tipc]
[ 1092.774452]  __netif_receive_skb_core+0xb62/0xbd0
[ 1092.779164]  ? enqueue_entity+0xf6/0x630
[ 1092.783084]  ? kmem_cache_alloc+0x158/0x1c0
[ 1092.787272]  ? __build_skb+0x25/0xd0
[ 1092.790849]  netif_receive_skb_internal+0x42/0xf0
[ 1092.795557]  napi_gro_receive+0xba/0xe0
[ 1092.799417]  mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
[ 1092.804564]  mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
[ 1092.809536]  mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
[ 1092.814415]  ? __wake_up_common_lock+0x89/0xc0
[ 1092.818861]  net_rx_action+0x149/0x3b0
[ 1092.822616]  __do_softirq+0xe3/0x30a
[ 1092.826193]  irq_exit+0x100/0x110
[ 1092.829512]  do_IRQ+0x85/0xd0
[ 1092.832483]  common_interrupt+0xf/0xf
[ 1092.836147]  </IRQ>
[ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
[ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
[ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f
[ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000
[ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940
[ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78
[ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000
[ 1092.905196]  ? cpuidle_enter_state+0x92/0x2a0
[ 1092.909555]  do_idle+0x236/0x280
[ 1092.912785]  cpu_startup_entry+0x6f/0x80
[ 1092.916715]  start_secondary+0x1a7/0x200
[ 1092.920642]  secondary_startup_64+0xb7/0xc0
[...]

The reason is that the skb list tipc_socket::mc_method.deferredq only
is initialized for connectionless sockets, while nothing stops arriving
multicast messages from being filtered by connection oriented sockets,
with subsequent access to the said list.

We fix this by initializing the list unconditionally at socket creation.
This eliminates the crash, while the message still is dropped further
down in tipc_sk_filter_rcv() as it should be.

Reported-by: Li Shuang <sh...@re...>
Signed-off-by: Jon Maloy <jon...@er...>
---
 net/tipc/socket.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index dd8537f..83ae41d 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -485,9 +485,8 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
 		tsk_set_unreturnable(tsk, true);
 		if (sock->type == SOCK_DGRAM)
 			tsk_set_unreliable(tsk, true);
-		__skb_queue_head_init(&tsk->mc_method.deferredq);
 	}
-
+	__skb_queue_head_init(&tsk->mc_method.deferredq);
 	trace_tipc_sk_create(sk, NULL, TIPC_DUMP_NONE, " ");
 	return 0;
 }
-- 
2.1.4

[tipc-discussion] [net-next 1/1] tipc: reduce risk of wakeup queue starvation

From: Jon M. <jon...@er...> - 2019-07-30 14:23:33

In commit 365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tun...@de...>
Acked-by: Ying Xue <yin...@wi...>
Signed-off-by: Jon Maloy <jon...@er...>
---
 net/tipc/link.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 2c27477..dd3155b 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -854,18 +854,31 @@ static int link_schedule_user(struct tipc_link *l, struct tipc_msg *hdr)
  */
 static void link_prepare_wakeup(struct tipc_link *l)
 {
+	struct sk_buff_head *wakeupq = &l->wakeupq;
+	struct sk_buff_head *inputq = l->inputq;
 	struct sk_buff *skb, *tmp;
-	int imp, i = 0;
+	struct sk_buff_head tmpq;
+	int avail[5] = {0,};
+	int imp = 0;
+
+	__skb_queue_head_init(&tmpq);
 
-	skb_queue_walk_safe(&l->wakeupq, skb, tmp) {
+	for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++)
+		avail[imp] = l->backlog[imp].limit - l->backlog[imp].len;
+
+	skb_queue_walk_safe(wakeupq, skb, tmp) {
 		imp = TIPC_SKB_CB(skb)->chain_imp;
-		if (l->backlog[imp].len < l->backlog[imp].limit) {
-			skb_unlink(skb, &l->wakeupq);
-			skb_queue_tail(l->inputq, skb);
-		} else if (i++ > 10) {
-			break;
-		}
+		if (avail[imp] <= 0)
+			continue;
+		avail[imp]--;
+		__skb_unlink(skb, wakeupq);
+		__skb_queue_tail(&tmpq, skb);
 	}
+
+	spin_lock_bh(&inputq->lock);
+	skb_queue_splice_tail(&tmpq, inputq);
+	spin_unlock_bh(&inputq->lock);
+
 }
 
 void tipc_link_reset(struct tipc_link *l)
-- 
2.1.4

Re: [tipc-discussion] [net-next v3] tipc: add loopback device tracking

From: Jon M. <jon...@er...> - 2019-07-30 12:51:05

Acked-by: Jon Maloy <jon...@er...>

I just have one minor comment. See below.
And forget about what I said about "on the fly" device detection. Since we have to register a handler anyway that wouldn't improve anything.

///jon


> -----Original Message-----
> From: joh...@de... <joh...@de...>
> Sent: 18-Jul-19 23:10
> To: tip...@li...
> Subject: [tipc-discussion] [net-next v3] tipc: add loopback device tracking
> 
> From: John Rutherford <joh...@de...>
> 
> Since node internal messages are passed directly to the socket, it is not
> possible to observe those messages via tcpdump or wireshark.
> 
> We now remedy this by making it possible to clone such messages and send
> the clones to the loopback interface.  The clones are dropped at reception and
> have no functional role except making the traffic visible.
> 
> The feature is enabled if network taps are active for the loopback device.
> pcap filtering restrictions require the messages to be presented to the
> receiving side of the loopback device.
> 
> v3 - Function dev_nit_active used to check for network taps.
>    - Procedure netif_rx_ni used to send cloned messages to loopback device.
> 
> Signed-off-by: John Rutherford <joh...@de...>
> ---
>  net/tipc/bcast.c  |  4 +++-
>  net/tipc/bearer.c | 65
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/tipc/bearer.h | 10 +++++++++
>  net/tipc/core.c   |  5 +++++
>  net/tipc/core.h   |  3 +++
>  net/tipc/node.c   |  1 +
>  net/tipc/topsrv.c |  2 ++
>  7 files changed, 89 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 6c997d4..235331d
> 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct
> sk_buff_head *pkts,
>  			rc = tipc_bcast_xmit(net, pkts, cong_link_cnt);
>  	}
> 
> -	if (dests->local)
> +	if (dests->local) {
> +		tipc_loopback_trace(net, &localq);
>  		tipc_sk_mcast_rcv(net, &localq, &inputq);
> +	}
>  exit:
>  	/* This queue should normally be empty by now */
>  	__skb_queue_purge(pkts);
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 2bed658..5273ec5
> 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -389,6 +389,12 @@ int tipc_enable_l2_media(struct net *net, struct
> tipc_bearer *b,
>  		dev_put(dev);
>  		return -EINVAL;
>  	}
> +	if (dev == net->loopback_dev) {
> +		dev_put(dev);
> +		pr_info("Bearer <%s>: loopback device not permitted\n",

Or maybe: "Enabling <%s> not permitted\n", b->name);

> +		return -EINVAL;
> +	}
> 
>  	/* Autoconfigure own node identity if needed */
>  	if (!tipc_own_id(net) && hwaddr_len <= NODE_ID_LEN) { @@ -674,6
> +680,65 @@ void tipc_bearer_stop(struct net *net)
>  	}
>  }
> 
> +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts)
> +{
> +	struct net_device *dev = net->loopback_dev;
> +	struct sk_buff *skb, *_skb;
> +	int exp;
> +
> +	skb_queue_walk(pkts, _skb) {
> +		skb = pskb_copy(_skb, GFP_ATOMIC);
> +		if (!skb)
> +			continue;
> +
> +		exp = SKB_DATA_ALIGN(dev->hard_header_len -
> skb_headroom(skb));
> +		if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) {
> +			kfree_skb(skb);
> +			continue;
> +		}
> +
> +		skb_reset_network_header(skb);
> +		dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr,
> +				dev->dev_addr, skb->len);
> +		skb->dev = dev;
> +		skb->pkt_type = PACKET_HOST;
> +		skb->ip_summed = CHECKSUM_UNNECESSARY;
> +		skb->protocol = eth_type_trans(skb, dev);
> +		netif_rx_ni(skb);
> +	}
> +}
> +
> +static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev,
> +				 struct packet_type *pt, struct net_device *od) {
> +	consume_skb(skb);
> +	return NET_RX_SUCCESS;
> +}
> +
> +int tipc_attach_loopback(struct net *net) {
> +	struct net_device *dev = net->loopback_dev;
> +	struct tipc_net *tn = tipc_net(net);
> +
> +	if (!dev)
> +		return -ENODEV;
> +
> +	dev_hold(dev);
> +	tn->loopback_pt.dev = dev;
> +	tn->loopback_pt.type = htons(ETH_P_TIPC);
> +	tn->loopback_pt.func = tipc_loopback_rcv_pkt;
> +	dev_add_pack(&tn->loopback_pt);
> +	return 0;
> +}
> +
> +void tipc_detach_loopback(struct net *net) {
> +	struct tipc_net *tn = tipc_net(net);
> +
> +	dev_remove_pack(&tn->loopback_pt);
> +	dev_put(net->loopback_dev);
> +}
> +
>  /* Caller should hold rtnl_lock to protect the bearer */  static int
> __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
>  				struct tipc_bearer *bearer, int nlflags) diff --git
> a/net/tipc/bearer.h b/net/tipc/bearer.h index 7f4c569..ea0f3c4 100644
> --- a/net/tipc/bearer.h
> +++ b/net/tipc/bearer.h
> @@ -232,6 +232,16 @@ void tipc_bearer_xmit(struct net *net, u32
> bearer_id,
>  		      struct tipc_media_addr *dst);
>  void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
>  			 struct sk_buff_head *xmitq);
> +void tipc_clone_to_loopback(struct net *net, struct sk_buff_head
> +*pkts); int tipc_attach_loopback(struct net *net); void
> +tipc_detach_loopback(struct net *net);
> +
> +static inline void tipc_loopback_trace(struct net *net,
> +				       struct sk_buff_head *pkts)
> +{
> +	if (unlikely(dev_nit_active(net->loopback_dev)))
> +		tipc_clone_to_loopback(net, pkts);
> +}
> 
>  /* check if device MTU is too low for tipc headers */  static inline bool
> tipc_mtu_bad(struct net_device *dev, unsigned int reserve) diff --git
> a/net/tipc/core.c b/net/tipc/core.c index c837072..23cb379 100644
> --- a/net/tipc/core.c
> +++ b/net/tipc/core.c
> @@ -82,6 +82,10 @@ static int __net_init tipc_init_net(struct net *net)
>  	if (err)
>  		goto out_bclink;
> 
> +	err = tipc_attach_loopback(net);
> +	if (err)
> +		goto out_bclink;
> +
>  	return 0;
> 
>  out_bclink:
> @@ -94,6 +98,7 @@ static int __net_init tipc_init_net(struct net *net)
> 
>  static void __net_exit tipc_exit_net(struct net *net)  {
> +	tipc_detach_loopback(net);
>  	tipc_net_stop(net);
>  	tipc_bcast_stop(net);
>  	tipc_nametbl_stop(net);
> diff --git a/net/tipc/core.h b/net/tipc/core.h index 7a68e1b..60d8295 100644
> --- a/net/tipc/core.h
> +++ b/net/tipc/core.h
> @@ -125,6 +125,9 @@ struct tipc_net {
> 
>  	/* Cluster capabilities */
>  	u16 capabilities;
> +
> +	/* Tracing of node internal messages */
> +	struct packet_type loopback_pt;
>  };
> 
>  static inline struct tipc_net *tipc_net(struct net *net) diff --git
> a/net/tipc/node.c b/net/tipc/node.c index 550581d..16d251b 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -1443,6 +1443,7 @@ int tipc_node_xmit(struct net *net, struct
> sk_buff_head *list,
>  	int rc;
> 
>  	if (in_own_node(net, dnode)) {
> +		tipc_loopback_trace(net, list);
>  		tipc_sk_rcv(net, list);
>  		return 0;
>  	}
> diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index f345662..e3a6ba1
> 100644
> --- a/net/tipc/topsrv.c
> +++ b/net/tipc/topsrv.c
> @@ -40,6 +40,7 @@
>  #include "socket.h"
>  #include "addr.h"
>  #include "msg.h"
> +#include "bearer.h"
>  #include <net/sock.h>
>  #include <linux/module.h>
> 
> @@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net,
> struct tipc_event *evt)
>  	memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt));
>  	skb_queue_head_init(&evtq);
>  	__skb_queue_tail(&evtq, skb);
> +	tipc_loopback_trace(net, &evtq);
>  	tipc_sk_rcv(net, &evtq);
>  }
> 
> --
> 2.11.0
> 
> 
> 
> _______________________________________________
> tipc-discussion mailing list
> tip...@li...
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

From: Hoang L. <hoa...@de...> - 2019-07-30 09:34:06

Hi Jon,

Please ignore previous results because I'm wrong on testing ICMP starved by TIPC.

Regards,
Hoang
-----Original Message-----
From: Hoang Le <hoa...@de...> 
Sent: Tuesday, July 30, 2019 11:24 AM
To: 'tung quang nguyen' <tun...@de...>; tip...@li...; 'Jon Maloy' <jon...@er...>;
'Jon Maloy' <ma...@do...>; yin...@wi...
Subject: Re: [tipc-discussion] [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

Hi Jon,

I combine benchmark test with 50 connections and ping cmd from two nodes.
You can compare results from original code, your fix and Tung's fix as following:

Original code:
node1 ~ # ping -s 1400 10.0.0.2 -c 300
PING 10.0.0.2 (10.0.0.2): 1400 data bytes
1408 bytes from 10.0.0.2: seq=22 ttl=64 time=1.337 ms
1408 bytes from 10.0.0.2: seq=24 ttl=64 time=1.208 ms
1408 bytes from 10.0.0.2: seq=25 ttl=64 time=1.145 ms
1408 bytes from 10.0.0.2: seq=76 ttl=64 time=1.145 ms
1408 bytes from 10.0.0.2: seq=78 ttl=64 time=1.449 ms
1408 bytes from 10.0.0.2: seq=130 ttl=64 time=1.230 ms
1408 bytes from 10.0.0.2: seq=134 ttl=64 time=1.020 ms
1408 bytes from 10.0.0.2: seq=185 ttl=64 time=1.743 ms
1408 bytes from 10.0.0.2: seq=186 ttl=64 time=1.502 ms
1408 bytes from 10.0.0.2: seq=187 ttl=64 time=1.289 ms
1408 bytes from 10.0.0.2: seq=189 ttl=64 time=1.306 ms
1408 bytes from 10.0.0.2: seq=239 ttl=64 time=1.254 ms
1408 bytes from 10.0.0.2: seq=241 ttl=64 time=1.114 ms
1408 bytes from 10.0.0.2: seq=242 ttl=64 time=1.058 ms

--- 10.0.0.2 ping statistics ---
301 packets transmitted, 301 packets received, 0% packet loss
round-trip min/avg/max = 0.077/0.361/1.743 ms

- JON's fix
node1 ~ # ping -s 1400 10.0.0.2 -c 300
1408 bytes from 10.0.0.2: seq=22 ttl=64 time=1.013 ms
1408 bytes from 10.0.0.2: seq=87 ttl=64 time=2.468 ms

--- 10.0.0.2 ping statistics ---
300 packets transmitted, 300 packets received, 0% packet loss
round-trip min/avg/max = 0.119/0.323/2.468 ms
node1 ~ #

- Tung's fix
node1 ~ # ping -s 1400 10.0.0.2 -c 300
--- 10.0.0.2 ping statistics ---
300 packets transmitted, 300 packets received, 0% packet loss
round-trip min/avg/max = 0.101/0.303/0.864 ms

>From ping statistics, I could see your solution starved twice and maximum time is 2.468 ms.
Then, we're not completely solve the issue yet. But test results from Tung's fix, I don't see a starvation happen.  
So, I think we can go ahead with Tung's code fixed. Please give me your idea.

Regards,
Hoang
-----Original Message-----
From: tung quang nguyen <tun...@de...> 
Sent: Thursday, July 25, 2019 5:50 PM
To: 'Jon Maloy' <jon...@er...>; 'Jon Maloy' <ma...@do...>; tip...@li...;
yin...@wi...
Subject: Re: [tipc-discussion] [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

Hi Jon,

Let's go for this way for now.

Thanks.

Best regards,
Tung Nguyen

-----Original Message-----
From: Jon Maloy <jon...@er...> 
Sent: Friday, July 19, 2019 10:06 AM
To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...>
Cc: moh...@er...;
par...@gm...; tun...@de...;
hoa...@de...; can...@de...;
tuo...@de...; gor...@de...;
yin...@wi...; tip...@li...
Subject: [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

In commit 365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link
congestion")
Reported-by: Tung Nguyen <tun...@de...>
Signed-off-by: Jon Maloy <jon...@er...>
---
 net/tipc/link.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 66d3a07..f1d2732 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -853,18 +853,31 @@ static int link_schedule_user(struct tipc_link *l,
struct tipc_msg *hdr)
  */
 static void link_prepare_wakeup(struct tipc_link *l)
 {
+	struct sk_buff_head *wakeupq = &l->wakeupq;
+	struct sk_buff_head *inputq = l->inputq;
 	struct sk_buff *skb, *tmp;
-	int imp, i = 0;
+	struct sk_buff_head tmpq;
+	int avail[5] = {0,};
+	int imp = 0;
+
+	__skb_queue_head_init(&tmpq);
 
-	skb_queue_walk_safe(&l->wakeupq, skb, tmp) {
+	for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++)
+		avail[imp] = l->backlog[imp].limit - l->backlog[imp].len;
+
+	skb_queue_walk_safe(wakeupq, skb, tmp) {
 		imp = TIPC_SKB_CB(skb)->chain_imp;
-		if (l->backlog[imp].len < l->backlog[imp].limit) {
-			skb_unlink(skb, &l->wakeupq);
-			skb_queue_tail(l->inputq, skb);
-		} else if (i++ > 10) {
-			break;
-		}
+		if (avail[imp] <= 0)
+			continue;
+		avail[imp]--;
+		__skb_unlink(skb, wakeupq);
+		__skb_queue_tail(&tmpq, skb);
 	}
+
+	spin_lock_bh(&inputq->lock);
+	skb_queue_splice_tail(&tmpq, inputq);
+	spin_unlock_bh(&inputq->lock);
+
 }
 
 void tipc_link_reset(struct tipc_link *l)
-- 
2.1.4




_______________________________________________
tipc-discussion mailing list
tip...@li...
https://lists.sourceforge.net/lists/listinfo/tipc-discussion



_______________________________________________
tipc-discussion mailing list
tip...@li...
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

From: Hoang L. <hoa...@de...> - 2019-07-30 04:24:23

Hi Jon,

I combine benchmark test with 50 connections and ping cmd from two nodes.
You can compare results from original code, your fix and Tung's fix as following:

Original code:
node1 ~ # ping -s 1400 10.0.0.2 -c 300
PING 10.0.0.2 (10.0.0.2): 1400 data bytes
1408 bytes from 10.0.0.2: seq=22 ttl=64 time=1.337 ms
1408 bytes from 10.0.0.2: seq=24 ttl=64 time=1.208 ms
1408 bytes from 10.0.0.2: seq=25 ttl=64 time=1.145 ms
1408 bytes from 10.0.0.2: seq=76 ttl=64 time=1.145 ms
1408 bytes from 10.0.0.2: seq=78 ttl=64 time=1.449 ms
1408 bytes from 10.0.0.2: seq=130 ttl=64 time=1.230 ms
1408 bytes from 10.0.0.2: seq=134 ttl=64 time=1.020 ms
1408 bytes from 10.0.0.2: seq=185 ttl=64 time=1.743 ms
1408 bytes from 10.0.0.2: seq=186 ttl=64 time=1.502 ms
1408 bytes from 10.0.0.2: seq=187 ttl=64 time=1.289 ms
1408 bytes from 10.0.0.2: seq=189 ttl=64 time=1.306 ms
1408 bytes from 10.0.0.2: seq=239 ttl=64 time=1.254 ms
1408 bytes from 10.0.0.2: seq=241 ttl=64 time=1.114 ms
1408 bytes from 10.0.0.2: seq=242 ttl=64 time=1.058 ms

--- 10.0.0.2 ping statistics ---
301 packets transmitted, 301 packets received, 0% packet loss
round-trip min/avg/max = 0.077/0.361/1.743 ms

- JON's fix
node1 ~ # ping -s 1400 10.0.0.2 -c 300
1408 bytes from 10.0.0.2: seq=22 ttl=64 time=1.013 ms
1408 bytes from 10.0.0.2: seq=87 ttl=64 time=2.468 ms

--- 10.0.0.2 ping statistics ---
300 packets transmitted, 300 packets received, 0% packet loss
round-trip min/avg/max = 0.119/0.323/2.468 ms
node1 ~ #

- Tung's fix
node1 ~ # ping -s 1400 10.0.0.2 -c 300
--- 10.0.0.2 ping statistics ---
300 packets transmitted, 300 packets received, 0% packet loss
round-trip min/avg/max = 0.101/0.303/0.864 ms

>From ping statistics, I could see your solution starved twice and maximum time is 2.468 ms.
Then, we're not completely solve the issue yet. But test results from Tung's fix, I don't see a starvation happen.  
So, I think we can go ahead with Tung's code fixed. Please give me your idea.

Regards,
Hoang
-----Original Message-----
From: tung quang nguyen <tun...@de...> 
Sent: Thursday, July 25, 2019 5:50 PM
To: 'Jon Maloy' <jon...@er...>; 'Jon Maloy' <ma...@do...>; tip...@li...;
yin...@wi...
Subject: Re: [tipc-discussion] [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

Hi Jon,

Let's go for this way for now.

Thanks.

Best regards,
Tung Nguyen

-----Original Message-----
From: Jon Maloy <jon...@er...> 
Sent: Friday, July 19, 2019 10:06 AM
To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...>
Cc: moh...@er...;
par...@gm...; tun...@de...;
hoa...@de...; can...@de...;
tuo...@de...; gor...@de...;
yin...@wi...; tip...@li...
Subject: [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

In commit 365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link
congestion")
Reported-by: Tung Nguyen <tun...@de...>
Signed-off-by: Jon Maloy <jon...@er...>
---
 net/tipc/link.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 66d3a07..f1d2732 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -853,18 +853,31 @@ static int link_schedule_user(struct tipc_link *l,
struct tipc_msg *hdr)
  */
 static void link_prepare_wakeup(struct tipc_link *l)
 {
+	struct sk_buff_head *wakeupq = &l->wakeupq;
+	struct sk_buff_head *inputq = l->inputq;
 	struct sk_buff *skb, *tmp;
-	int imp, i = 0;
+	struct sk_buff_head tmpq;
+	int avail[5] = {0,};
+	int imp = 0;
+
+	__skb_queue_head_init(&tmpq);
 
-	skb_queue_walk_safe(&l->wakeupq, skb, tmp) {
+	for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++)
+		avail[imp] = l->backlog[imp].limit - l->backlog[imp].len;
+
+	skb_queue_walk_safe(wakeupq, skb, tmp) {
 		imp = TIPC_SKB_CB(skb)->chain_imp;
-		if (l->backlog[imp].len < l->backlog[imp].limit) {
-			skb_unlink(skb, &l->wakeupq);
-			skb_queue_tail(l->inputq, skb);
-		} else if (i++ > 10) {
-			break;
-		}
+		if (avail[imp] <= 0)
+			continue;
+		avail[imp]--;
+		__skb_unlink(skb, wakeupq);
+		__skb_queue_tail(&tmpq, skb);
 	}
+
+	spin_lock_bh(&inputq->lock);
+	skb_queue_splice_tail(&tmpq, inputq);
+	spin_unlock_bh(&inputq->lock);
+
 }
 
 void tipc_link_reset(struct tipc_link *l)
-- 
2.1.4




_______________________________________________
tipc-discussion mailing list
tip...@li...
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [tipc-tipcutils][PATCH] test: ptts: Set recv buffer size too max to receive as many packets as possible

From: Jon M. <jon...@er...> - 2019-07-29 19:32:56

Hi Zhe,
This looks ok, but you should post this as a merge request at sourceforge.
Then I or Ying can apply it.

BR
///jon

> -----Original Message-----
> From: zh...@wi... <zh...@wi...>
> Sent: 23-Jul-19 05:19
> To: tip...@li...
> Subject: [tipc-discussion] [tipc-tipcutils][PATCH] test: ptts: Set recv buffer size
> too max to receive as many packets as possible
> 
> From: He Zhe <zh...@wi...>
> 
> Flooding multicast may make the rcv buffer overrun and is considered
> premature messages later and thus cause the following error.
> 
> "Ignoring premature msg 16, currently handling 12"
> 
> This patch sets SO_RCVBUF the of socket to max int value to receive as many
> packets as possible, and give a hint to user when possible overrun occurs. Note
> that the value of SO_RCVBUF will be limited up to min(INT_MAX/2,
> sysctl_rmem_max) in kernel.
> 
> Signed-off-by: He Zhe <zh...@wi...>
> ---
>  test/ptts/tipc_ts_server.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/test/ptts/tipc_ts_server.c b/test/ptts/tipc_ts_server.c index
> a286daa..cc69e6e 100644
> --- a/test/ptts/tipc_ts_server.c
> +++ b/test/ptts/tipc_ts_server.c
> @@ -641,8 +641,9 @@ void server_mcast
>  				if (rc < 0)
>  					err("multicast message not received");
>  				if (msgno != *(int*) buf) {
> -					dbg1("Ignoring premature msg %u, currently
> handling %u\n",
> -					       *(int*)buf, msgno);
> +					dbg1("Ignoring premature msg %u, currently
> handling %u\n"
> +                                             "You can enlarge /proc/sys/net/core/rmem_max and
> try again\n",
> +                                             *(int*)buf, msgno);
>  					continue;
>  				}
>  				rc = recvfrom(sd[i], buf, expected_szs[numSubTest],
> @@ -687,8 +688,21 @@ void server_test_multicast(void)
>  	FD_ZERO(&readfds);
> 
>  	for (i = 0; i < TIPC_MCAST_SOCKETS; i++) {
> +		int optval = (int)(~0U >> 1);
> +		socklen_t optlen = sizeof(optval);
> +		int rc = 0;
> +
>  		sd[i] = createSocketTIPC (SOCK_RDM);
>  		FD_SET(sd[i], &readfds);
> +
> +		/*
> +                 * Flooding multicast may make the rcv buffer overrun and is
> considered premature msg later.
> +                 * Set SO_RCVBUF to max int value to receive as many as possible.
> +                 * Note that it will be limited up to min(INT_MAX/2,
> sysctl_rmem_max) in kernel.
> +                 */
> +		rc = setsockopt(sd[i], SOL_SOCKET, SO_RCVBUF, (const
> char*)&optval, optlen);
> +		if(rc != 0)
> +			strerror(errno);
>  	}
> 
>  	server_bindMulticast(  0,  99, sd[0]);
> --
> 2.7.4
> 
> 
> 
> _______________________________________________
> tipc-discussion mailing list
> tip...@li...
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH] net: tipc: Fix a possible null-pointer dereference in tipc_publ_purge()

From: David M. <da...@da...> - 2019-07-26 21:05:22

From: Jia-Ju Bai <bai...@gm...>
Date: Thu, 25 Jul 2019 17:20:21 +0800

> @@ -223,7 +223,8 @@ static void tipc_publ_purge(struct net *net, struct publication *publ, u32 addr)
>  		       publ->key);
>  	}
>  
> -	kfree_rcu(p, rcu);
> +	if (p)
> +		kfree_rcu(p, rcu);

Please fix your automated tools if that is what found this, because as
others have nodes kfree_rcu() can take a NULL pointer argument just
fine.

Thank you.

Re: [tipc-discussion] [PATCH] net: tipc: Fix a possible null-pointer dereference in tipc_publ_purge()

From: Ying X. <yin...@wi...> - 2019-07-26 14:21:29

On 7/25/19 5:20 PM, Jia-Ju Bai wrote:
> In tipc_publ_purge(), there is an if statement on 215 to 
> check whether p is NULL: 
>     if (p)
> 
> When p is NULL, it is used on line 226:
>     kfree_rcu(p, rcu);
> 
> Thus, a possible null-pointer dereference may occur.
> 
> To fix this bug, p is checked before being used.
> 
> This bug is found by a static analysis tool STCheck written by us.
> 
> Signed-off-by: Jia-Ju Bai <bai...@gm...>
> ---
>  net/tipc/name_distr.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> index 44abc8e9c990..241ed2274473 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -223,7 +223,8 @@ static void tipc_publ_purge(struct net *net, struct publication *publ, u32 addr)
>  		       publ->key);
>  	}
>  
> -	kfree_rcu(p, rcu);
> +	if (p)

No, I don't think so because kfree_rcu() will internally check if "p"
pointer is NULL or not.

> +		kfree_rcu(p, rcu);
>  }
>  
>  /**
>

Re: [tipc-discussion] Slowness forming TIPC cluster with explicit node addresses

From: Jon M. <jon...@er...> - 2019-07-26 13:47:42


> -----Original Message-----
> From: net...@vg... <net...@vg...> On
> Behalf Of Chris Packham
> Sent: 25-Jul-19 19:37
> To: tip...@li...
> Cc: ne...@vg...; lin...@vg...
> Subject: Slowness forming TIPC cluster with explicit node addresses
> 
> Hi,
> 
> I'm having problems forming a TIPC cluster between 2 nodes.
> 
> This is the basic steps I'm going through on each node.
> 
> modprobe tipc
> ip link set eth2 up
> tipc node set addr 1.1.5 # or 1.1.6
> tipc bearer enable media eth dev eth0

eth2, I assume...

> 
> Then to confirm if the cluster is formed I use tipc link list
> 
> [root@node-5 ~]# tipc link list
> broadcast-link: up
> ...
> 
> Looking at tcpdump the two nodes are sending packets
> 
> 22:30:05.782320 TIPC v2.0 1.1.5 > 0.0.0, headerlength 60 bytes, MessageSize
> 76 bytes, Neighbor Detection Protocol internal, messageType Link request
> 22:30:05.863555 TIPC v2.0 1.1.6 > 0.0.0, headerlength 60 bytes, MessageSize
> 76 bytes, Neighbor Detection Protocol internal, messageType Link request
> 
> Eventually (after a few minutes) the link does come up
> 
> [root@node-6 ~]# tipc link list
> broadcast-link: up
> 1001006:eth2-1001005:eth2: up
> 
> [root@node-5 ~]# tipc link list
> broadcast-link: up
> 1001005:eth2-1001006:eth2: up
> 
> When I remove the "tipc node set addr" things seem to kick into life straight
> away
> 
> [root@node-5 ~]# tipc link list
> broadcast-link: up
> 0050b61bd2aa:eth2-0050b61e6dfa:eth2: up
> 
> So there appears to be some difference in behaviour between having an
> explicit node address and using the default. Unfortunately our application
> relies on setting the node addresses.

I do this many times a day, without any problems. If there would be any time difference, I would expect the 'auto configurable' version to be slower, because it involves a DAD step.
Are you sure you don't have any other nodes running in your system?

///jon


> 
> [root@node-5 ~]# uname -a
> Linux linuxbox 5.2.0-at1+ #8 SMP Thu Jul 25 23:22:41 UTC 2019 ppc
> GNU/Linux
> 
> Any thoughts on the problem?

Re: [tipc-discussion] [net-next 0/2] tipc: link changeover issues

From: David M. <da...@da...> - 2019-07-25 22:56:13

From: Tuong Lien <tuo...@de...>
Date: Wed, 24 Jul 2019 08:56:10 +0700

> This patch series is to resolve some issues found with the current link
> changeover mechanism, it also includes an optimization for the link
> synching.

Series applied, thank you.

Re: [tipc-discussion] [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

From: tung q. n. <tun...@de...> - 2019-07-25 10:50:15

Hi Jon,

Let's go for this way for now.

Thanks.

Best regards,
Tung Nguyen

-----Original Message-----
From: Jon Maloy <jon...@er...> 
Sent: Friday, July 19, 2019 10:06 AM
To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...>
Cc: moh...@er...;
par...@gm...; tun...@de...;
hoa...@de...; can...@de...;
tuo...@de...; gor...@de...;
yin...@wi...; tip...@li...
Subject: [net-next v2 1/1] tipc: reduce risk of wakeup queue starvation

In commit 365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link
congestion")
Reported-by: Tung Nguyen <tun...@de...>
Signed-off-by: Jon Maloy <jon...@er...>
---
 net/tipc/link.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 66d3a07..f1d2732 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -853,18 +853,31 @@ static int link_schedule_user(struct tipc_link *l,
struct tipc_msg *hdr)
  */
 static void link_prepare_wakeup(struct tipc_link *l)
 {
+	struct sk_buff_head *wakeupq = &l->wakeupq;
+	struct sk_buff_head *inputq = l->inputq;
 	struct sk_buff *skb, *tmp;
-	int imp, i = 0;
+	struct sk_buff_head tmpq;
+	int avail[5] = {0,};
+	int imp = 0;
+
+	__skb_queue_head_init(&tmpq);
 
-	skb_queue_walk_safe(&l->wakeupq, skb, tmp) {
+	for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++)
+		avail[imp] = l->backlog[imp].limit - l->backlog[imp].len;
+
+	skb_queue_walk_safe(wakeupq, skb, tmp) {
 		imp = TIPC_SKB_CB(skb)->chain_imp;
-		if (l->backlog[imp].len < l->backlog[imp].limit) {
-			skb_unlink(skb, &l->wakeupq);
-			skb_queue_tail(l->inputq, skb);
-		} else if (i++ > 10) {
-			break;
-		}
+		if (avail[imp] <= 0)
+			continue;
+		avail[imp]--;
+		__skb_unlink(skb, wakeupq);
+		__skb_queue_tail(&tmpq, skb);
 	}
+
+	spin_lock_bh(&inputq->lock);
+	skb_queue_splice_tail(&tmpq, inputq);
+	spin_unlock_bh(&inputq->lock);
+
 }
 
 void tipc_link_reset(struct tipc_link *l)
-- 
2.1.4

[tipc-discussion] [net-next 1/2] tipc: optimize link synching mechanism

From: Tuong L. <tuo...@de...> - 2019-07-24 01:56:38

This commit along with the next one are to resolve the issues with the
link changeover mechanism. See that commit for details.

Basically, for the link synching, from now on, we will send only one
single ("dummy") SYNCH message to peer. The SYNCH message does not
contain any data, just a header conveying the synch point to the peer.

A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for
backward compatible!

Acked-by: Ying Xue <yin...@wi...>
Acked-by: Jon Maloy <jon...@er...>
Suggested-by: Jon Maloy <jon...@er...>
Signed-off-by: Tuong Lien <tuo...@de...>
---
 net/tipc/link.c | 26 ++++++++++++++++++++++++++
 net/tipc/msg.h  | 10 ++++++++++
 net/tipc/node.c |  6 ++++--
 net/tipc/node.h |  6 ++++--
 4 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 66d3a07bc571..e215b4ba6a4b 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1665,6 +1665,7 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
 	struct sk_buff_head *queue = &l->transmq;
 	struct sk_buff_head tmpxq, tnlq;
 	u16 pktlen, pktcnt, seqno = l->snd_nxt;
+	u16 syncpt;
 
 	if (!tnl)
 		return;
@@ -1684,6 +1685,31 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
 	tipc_link_xmit(l, &tnlq, &tmpxq);
 	__skb_queue_purge(&tmpxq);
 
+	/* Link Synching:
+	 * From now on, send only one single ("dummy") SYNCH message
+	 * to peer. The SYNCH message does not contain any data, just
+	 * a header conveying the synch point to the peer.
+	 */
+	if (mtyp == SYNCH_MSG && (tnl->peer_caps & TIPC_TUNNEL_ENHANCED)) {
+		tnlskb = tipc_msg_create(TUNNEL_PROTOCOL, SYNCH_MSG,
+					 INT_H_SIZE, 0, l->addr,
+					 tipc_own_addr(l->net),
+					 0, 0, 0);
+		if (!tnlskb) {
+			pr_warn("%sunable to create dummy SYNCH_MSG\n",
+				link_co_err);
+			return;
+		}
+
+		hdr = buf_msg(tnlskb);
+		syncpt = l->snd_nxt + skb_queue_len(&l->backlogq) - 1;
+		msg_set_syncpt(hdr, syncpt);
+		msg_set_bearer_id(hdr, l->peer_bearer_id);
+		__skb_queue_tail(&tnlq, tnlskb);
+		tipc_link_xmit(tnl, &tnlq, xmitq);
+		return;
+	}
+
 	/* Initialize reusable tunnel packet header */
 	tipc_msg_init(tipc_own_addr(l->net), &tnlhdr, TUNNEL_PROTOCOL,
 		      mtyp, INT_H_SIZE, l->addr);
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index da509f0eb9ca..fca042cdff88 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -877,6 +877,16 @@ static inline void msg_set_msgcnt(struct tipc_msg *m, u16 n)
 	msg_set_bits(m, 9, 16, 0xffff, n);
 }
 
+static inline u16 msg_syncpt(struct tipc_msg *m)
+{
+	return msg_bits(m, 9, 16, 0xffff);
+}
+
+static inline void msg_set_syncpt(struct tipc_msg *m, u16 n)
+{
+	msg_set_bits(m, 9, 16, 0xffff, n);
+}
+
 static inline u32 msg_conn_ack(struct tipc_msg *m)
 {
 	return msg_bits(m, 9, 16, 0xffff);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 324a1f91b394..5d8b48051bb9 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1649,7 +1649,6 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
 	int usr = msg_user(hdr);
 	int mtyp = msg_type(hdr);
 	u16 oseqno = msg_seqno(hdr);
-	u16 iseqno = msg_seqno(msg_inner_hdr(hdr));
 	u16 exp_pkts = msg_msgcnt(hdr);
 	u16 rcv_nxt, syncpt, dlv_nxt, inputq_len;
 	int state = n->state;
@@ -1748,7 +1747,10 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
 
 	/* Initiate synch mode if applicable */
 	if ((usr == TUNNEL_PROTOCOL) && (mtyp == SYNCH_MSG) && (oseqno == 1)) {
-		syncpt = iseqno + exp_pkts - 1;
+		if (n->capabilities & TIPC_TUNNEL_ENHANCED)
+			syncpt = msg_syncpt(hdr);
+		else
+			syncpt = msg_seqno(msg_inner_hdr(hdr)) + exp_pkts - 1;
 		if (!tipc_link_is_up(l))
 			__tipc_node_link_up(n, bearer_id, xmitq);
 		if (n->state == SELF_UP_PEER_UP) {
diff --git a/net/tipc/node.h b/net/tipc/node.h
index c0bf49ea3de4..291d0ecd4101 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -53,7 +53,8 @@ enum {
 	TIPC_NODE_ID128       = (1 << 5),
 	TIPC_LINK_PROTO_SEQNO = (1 << 6),
 	TIPC_MCAST_RBCTL      = (1 << 7),
-	TIPC_GAP_ACK_BLOCK    = (1 << 8)
+	TIPC_GAP_ACK_BLOCK    = (1 << 8),
+	TIPC_TUNNEL_ENHANCED  = (1 << 9)
 };
 
 #define TIPC_NODE_CAPABILITIES (TIPC_SYN_BIT           |  \
@@ -64,7 +65,8 @@ enum {
 				TIPC_NODE_ID128        |   \
 				TIPC_LINK_PROTO_SEQNO  |   \
 				TIPC_MCAST_RBCTL       |   \
-				TIPC_GAP_ACK_BLOCK)
+				TIPC_GAP_ACK_BLOCK     |   \
+				TIPC_TUNNEL_ENHANCED)
 #define INVALID_BEARER_ID -1
 
 void tipc_node_stop(struct net *net);
-- 
2.13.7

[tipc-discussion] [net-next 2/2] tipc: fix changeover issues due to large packet

From: Tuong L. <tuo...@de...> - 2019-07-24 01:56:37

In conjunction with changing the interfaces' MTU (e.g. especially in
the case of a bonding) where the TIPC links are brought up and down
in a short time, a couple of issues were detected with the current link
changeover mechanism:

1) When one link is up but immediately forced down again, the failover
procedure will be carried out in order to failover all the messages in
the link's transmq queue onto the other working link. The link and node
state is also set to FAILINGOVER as part of the process. The message
will be transmited in form of a FAILOVER_MSG, so its size is plus of 40
bytes (= the message header size). There is no problem if the original
message size is not larger than the link's MTU - 40, and indeed this is
the max size of a normal payload messages. However, in the situation
above, because the link has just been up, the messages in the link's
transmq are almost SYNCH_MSGs which had been generated by the link
synching procedure, then their size might reach the max value already!
When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size
will exceed the link's MTU. As a result, the messages are dropped
silently and the failover procedure will never end up, the link will
not be able to exit the FAILINGOVER state, so cannot be re-established.

2) The same scenario above can happen more easily in case the MTU of
the links is set differently or when changing. In that case, as long as
a large message in the failure link's transmq queue was built and
fragmented with its link's MTU > the other link's one, the issue will
happen (there is no need of a link synching in advance).

3) The link synching procedure also faces with the same issue but since
the link synching is only started upon receipt of a SYNCH_MSG, dropping
the message will not result in a state deadlock, but it is not expected
as design.

The 1) & 3) issues are resolved by the last commit that only a dummy
SYNCH_MSG (i.e. without data) is generated at the link synching, so the
size of a FAILOVER_MSG if any then will never exceed the link's MTU.

For the 2) issue, the only solution is trying to fragment the messages
in the failure link's transmq queue according to the working link's MTU
so they can be failovered then. A new function is made to accomplish
this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the
original message size is too large, it will be fragmented & reassembled
at the receiving side.

Acked-by: Ying Xue <yin...@wi...>
Acked-by: Jon Maloy <jon...@er...>
Signed-off-by: Tuong Lien <tuo...@de...>
---
 net/tipc/link.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 net/tipc/msg.c  | 59 ++++++++++++++++++++++++++++++++++++
 net/tipc/msg.h  | 18 ++++++++++-
 3 files changed, 155 insertions(+), 15 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index e215b4ba6a4b..2c274777b2dd 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -180,6 +180,7 @@ struct tipc_link {
 
 	/* Fragmentation/reassembly */
 	struct sk_buff *reasm_buf;
+	struct sk_buff *reasm_tnlmsg;
 
 	/* Broadcast */
 	u16 ackers;
@@ -897,8 +898,10 @@ void tipc_link_reset(struct tipc_link *l)
 	l->backlog[TIPC_CRITICAL_IMPORTANCE].len = 0;
 	l->backlog[TIPC_SYSTEM_IMPORTANCE].len = 0;
 	kfree_skb(l->reasm_buf);
+	kfree_skb(l->reasm_tnlmsg);
 	kfree_skb(l->failover_reasm_skb);
 	l->reasm_buf = NULL;
+	l->reasm_tnlmsg = NULL;
 	l->failover_reasm_skb = NULL;
 	l->rcv_unacked = 0;
 	l->snd_nxt = 1;
@@ -940,6 +943,9 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list,
 	int rc = 0;
 
 	if (unlikely(msg_size(hdr) > mtu)) {
+		pr_warn("Too large msg, purging xmit list %d %d %d %d %d!\n",
+			skb_queue_len(list), msg_user(hdr),
+			msg_type(hdr), msg_size(hdr), mtu);
 		skb_queue_purge(list);
 		return -EMSGSIZE;
 	}
@@ -1233,6 +1239,7 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb,
 			     struct sk_buff_head *inputq)
 {
 	struct sk_buff **reasm_skb = &l->failover_reasm_skb;
+	struct sk_buff **reasm_tnlmsg = &l->reasm_tnlmsg;
 	struct sk_buff_head *fdefq = &l->failover_deferdq;
 	struct tipc_msg *hdr = buf_msg(skb);
 	struct sk_buff *iskb;
@@ -1240,40 +1247,56 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb,
 	int rc = 0;
 	u16 seqno;
 
-	/* SYNCH_MSG */
-	if (msg_type(hdr) == SYNCH_MSG)
-		goto drop;
+	if (msg_type(hdr) == SYNCH_MSG) {
+		kfree_skb(skb);
+		return 0;
+	}
 
-	/* FAILOVER_MSG */
-	if (!tipc_msg_extract(skb, &iskb, &ipos)) {
-		pr_warn_ratelimited("Cannot extract FAILOVER_MSG, defq: %d\n",
-				    skb_queue_len(fdefq));
-		return rc;
+	/* Not a fragment? */
+	if (likely(!msg_nof_fragms(hdr))) {
+		if (unlikely(!tipc_msg_extract(skb, &iskb, &ipos))) {
+			pr_warn_ratelimited("Unable to extract msg, defq: %d\n",
+					    skb_queue_len(fdefq));
+			return 0;
+		}
+		kfree_skb(skb);
+	} else {
+		/* Set fragment type for buf_append */
+		if (msg_fragm_no(hdr) == 1)
+			msg_set_type(hdr, FIRST_FRAGMENT);
+		else if (msg_fragm_no(hdr) < msg_nof_fragms(hdr))
+			msg_set_type(hdr, FRAGMENT);
+		else
+			msg_set_type(hdr, LAST_FRAGMENT);
+
+		if (!tipc_buf_append(reasm_tnlmsg, &skb)) {
+			/* Successful but non-complete reassembly? */
+			if (*reasm_tnlmsg || link_is_bc_rcvlink(l))
+				return 0;
+			pr_warn_ratelimited("Unable to reassemble tunnel msg\n");
+			return tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
+		}
+		iskb = skb;
 	}
 
 	do {
 		seqno = buf_seqno(iskb);
-
 		if (unlikely(less(seqno, l->drop_point))) {
 			kfree_skb(iskb);
 			continue;
 		}
-
 		if (unlikely(seqno != l->drop_point)) {
 			__tipc_skb_queue_sorted(fdefq, seqno, iskb);
 			continue;
 		}
 
 		l->drop_point++;
-
 		if (!tipc_data_input(l, iskb, inputq))
 			rc |= tipc_link_input(l, iskb, inputq, reasm_skb);
 		if (unlikely(rc))
 			break;
 	} while ((iskb = __tipc_skb_dequeue(fdefq, l->drop_point)));
 
-drop:
-	kfree_skb(skb);
 	return rc;
 }
 
@@ -1663,15 +1686,18 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
 	struct sk_buff *skb, *tnlskb;
 	struct tipc_msg *hdr, tnlhdr;
 	struct sk_buff_head *queue = &l->transmq;
-	struct sk_buff_head tmpxq, tnlq;
+	struct sk_buff_head tmpxq, tnlq, frags;
 	u16 pktlen, pktcnt, seqno = l->snd_nxt;
+	bool pktcnt_need_update = false;
 	u16 syncpt;
+	int rc;
 
 	if (!tnl)
 		return;
 
 	skb_queue_head_init(&tnlq);
 	skb_queue_head_init(&tmpxq);
+	skb_queue_head_init(&frags);
 
 	/* At least one packet required for safe algorithm => add dummy */
 	skb = tipc_msg_create(TIPC_LOW_IMPORTANCE, TIPC_DIRECT_MSG,
@@ -1727,6 +1753,39 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
 		if (queue == &l->backlogq)
 			msg_set_seqno(hdr, seqno++);
 		pktlen = msg_size(hdr);
+
+		/* Tunnel link MTU is not large enough? This could be
+		 * due to:
+		 * 1) Link MTU has just changed or set differently;
+		 * 2) Or FAILOVER on the top of a SYNCH message
+		 *
+		 * The 2nd case should not happen if peer supports
+		 * TIPC_TUNNEL_ENHANCED
+		 */
+		if (pktlen > tnl->mtu - INT_H_SIZE) {
+			if (mtyp == FAILOVER_MSG &&
+			    (tnl->peer_caps & TIPC_TUNNEL_ENHANCED)) {
+				rc = tipc_msg_fragment(skb, &tnlhdr, tnl->mtu,
+						       &frags);
+				if (rc) {
+					pr_warn("%sunable to frag msg: rc %d\n",
+						link_co_err, rc);
+					return;
+				}
+				pktcnt += skb_queue_len(&frags) - 1;
+				pktcnt_need_update = true;
+				skb_queue_splice_tail_init(&frags, &tnlq);
+				continue;
+			}
+			/* Unluckily, peer doesn't have TIPC_TUNNEL_ENHANCED
+			 * => Just warn it and return!
+			 */
+			pr_warn_ratelimited("%stoo large msg <%d, %d>: %d!\n",
+					    link_co_err, msg_user(hdr),
+					    msg_type(hdr), msg_size(hdr));
+			return;
+		}
+
 		msg_set_size(&tnlhdr, pktlen + INT_H_SIZE);
 		tnlskb = tipc_buf_acquire(pktlen + INT_H_SIZE, GFP_ATOMIC);
 		if (!tnlskb) {
@@ -1742,6 +1801,12 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
 		goto tnl;
 	}
 
+	if (pktcnt_need_update)
+		skb_queue_walk(&tnlq, skb) {
+			hdr = buf_msg(skb);
+			msg_set_msgcnt(hdr, pktcnt);
+		}
+
 	tipc_link_xmit(tnl, &tnlq, xmitq);
 
 	if (mtyp == FAILOVER_MSG) {
diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index f48e5857210f..e6d49cdc61b4 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -244,6 +244,65 @@ bool tipc_msg_validate(struct sk_buff **_skb)
 }
 
 /**
+ * tipc_msg_fragment - build a fragment skb list for TIPC message
+ *
+ * @skb: TIPC message skb
+ * @hdr: internal msg header to be put on the top of the fragments
+ * @pktmax: max size of a fragment incl. the header
+ * @frags: returned fragment skb list
+ *
+ * Returns 0 if the fragmentation is successful, otherwise: -EINVAL
+ * or -ENOMEM
+ */
+int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
+		      int pktmax, struct sk_buff_head *frags)
+{
+	int pktno, nof_fragms, dsz, dmax, eat;
+	struct tipc_msg *_hdr;
+	struct sk_buff *_skb;
+	u8 *data;
+
+	/* Non-linear buffer? */
+	if (skb_linearize(skb))
+		return -ENOMEM;
+
+	data = (u8 *)skb->data;
+	dsz = msg_size(buf_msg(skb));
+	dmax = pktmax - INT_H_SIZE;
+	if (dsz <= dmax || !dmax)
+		return -EINVAL;
+
+	nof_fragms = dsz / dmax + 1;
+	for (pktno = 1; pktno <= nof_fragms; pktno++) {
+		if (pktno < nof_fragms)
+			eat = dmax;
+		else
+			eat = dsz % dmax;
+		/* Allocate a new fragment */
+		_skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC);
+		if (!_skb)
+			goto error;
+		skb_orphan(_skb);
+		__skb_queue_tail(frags, _skb);
+		/* Copy header & data to the fragment */
+		skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE);
+		skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat);
+		data += eat;
+		/* Update the fragment's header */
+		_hdr = buf_msg(_skb);
+		msg_set_fragm_no(_hdr, pktno);
+		msg_set_nof_fragms(_hdr, nof_fragms);
+		msg_set_size(_hdr, INT_H_SIZE + eat);
+	}
+	return 0;
+
+error:
+	__skb_queue_purge(frags);
+	__skb_queue_head_init(frags);
+	return -ENOMEM;
+}
+
+/**
  * tipc_msg_build - create buffer chain containing specified header and data
  * @mhdr: Message header, to be prepended to data
  * @m: User message
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index fca042cdff88..1c8c8dd32a4e 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -721,12 +721,26 @@ static inline void msg_set_last_bcast(struct tipc_msg *m, u32 n)
 	msg_set_bits(m, 4, 16, 0xffff, n);
 }
 
+static inline u32 msg_nof_fragms(struct tipc_msg *m)
+{
+	return msg_bits(m, 4, 0, 0xffff);
+}
+
+static inline void msg_set_nof_fragms(struct tipc_msg *m, u32 n)
+{
+	msg_set_bits(m, 4, 0, 0xffff, n);
+}
+
+static inline u32 msg_fragm_no(struct tipc_msg *m)
+{
+	return msg_bits(m, 4, 16, 0xffff);
+}
+
 static inline void msg_set_fragm_no(struct tipc_msg *m, u32 n)
 {
 	msg_set_bits(m, 4, 16, 0xffff, n);
 }
 
-
 static inline u16 msg_next_sent(struct tipc_msg *m)
 {
 	return msg_bits(m, 4, 0, 0xffff);
@@ -1045,6 +1059,8 @@ bool tipc_msg_bundle(struct sk_buff *skb, struct tipc_msg *msg, u32 mtu);
 bool tipc_msg_make_bundle(struct sk_buff **skb, struct tipc_msg *msg,
 			  u32 mtu, u32 dnode);
 bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos);
+int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
+		      int pktmax, struct sk_buff_head *frags);
 int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m,
 		   int offset, int dsz, int mtu, struct sk_buff_head *list);
 bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err);
-- 
2.13.7

[tipc-discussion] [net-next 0/2] tipc: link changeover issues

From: Tuong L. <tuo...@de...> - 2019-07-24 01:56:37

This patch series is to resolve some issues found with the current link
changeover mechanism, it also includes an optimization for the link
synching.

Tuong Lien (2):
  tipc: optimize link synching mechanism
  tipc: fix changeover issues due to large packet

 net/tipc/link.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++-------
 net/tipc/msg.c  |  59 ++++++++++++++++++++++++++++
 net/tipc/msg.h  |  28 ++++++++++++-
 net/tipc/node.c |   6 ++-
 net/tipc/node.h |   6 ++-
 5 files changed, 199 insertions(+), 19 deletions(-)

-- 
2.13.7

451 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 43 44 45 46 47 .. 595 > >> (Page 45 of 595)

tipc-discussion Mailing List for TIPC Cluster Domain Sockets (Page 45)

Cluster wide IPC providing datagram, connection, and bus messaging

tipc-discussion — User questions &amp; problem reports, project news, developer discussion

tipc-discussion — User questions & problem reports, project news, developer discussion