From: Jon M. <jon...@er...> - 2019-06-14 13:17:02
|
Acked-by: Jon ///jon > -----Original Message----- > From: Tuong Lien <tuo...@de...> > Sent: 14-Jun-19 06:41 > To: tip...@li...; Jon Maloy > <jon...@er...>; ma...@do...; yin...@wi... > Subject: [net] tipc: fix issues with early FAILOVER_MSG from peer > > It appears that a FAILOVER_MSG can come from peer even when the failure > link is resetting (i.e. just after the 'node_write_unlock()'...). This means the > failover procedure on the node has not been started yet. > The situation is as follows: > > node1 node2 > linkb linka linka linkb > | | | | > | | x failure | > | | RESETTING | > | | | | > | x failure RESET | > | RESETTING FAILINGOVER | > | | (FAILOVER_MSG) | | > |<-------------------------------------------------| > | *FAILINGOVER | | | > | | (dummy FAILOVER_MSG) | | > |------------------------------------------------->| > | RESET | | FAILOVER_END > | FAILINGOVER RESET | > . . . . > . . . . > . . . . > > Once this happens, the link failover procedure will be triggered wrongly on the > receiving node since the node isn't in FAILINGOVER state but then another link > failover will be carried out. > The consequences are: > > 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' > was set, reset and set incorrectly, the criteria to end the failover would not be > met, it could keep waiting for a message that has already received. > > 2) The early FAILOVER_MSG(s) could be queued in the link failover deferdq > but would be purged or not pulled out because the 'drop_point' > was not set correctly. > > 3) The early FAILOVER_MSG(s) could be dropped too. > > 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER > state shortly, but later on it would be restarted. > > The same situation can also happen when the link is in PEER_RESET state and a > FAILOVER_MSG arrives. > > The commit resolves the issues by forcing the link down immediately, so the > failover procedure will be started normally (which is the same as when > receiving a FAILOVER_MSG and the link is in up state). > > Also, the function "tipc_node_link_failover()" is toughen to avoid such a > situation from happening. > > Signed-off-by: Tuong Lien <tuo...@de...> > --- > net/tipc/link.c | 1 - > net/tipc/node.c | 10 +++++++--- > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c index > d5ed509e0660..bcfb0a4ab485 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -1762,7 +1762,6 @@ void tipc_link_failover_prepare(struct tipc_link *l, > struct tipc_link *tnl, > * node has entered SELF_DOWN_PEER_LEAVING and both peer nodes > * would have to start over from scratch instead. > */ > - WARN_ON(l && tipc_link_is_up(l)); > tnl->drop_point = 1; > tnl->failover_reasm_skb = NULL; > > diff --git a/net/tipc/node.c b/net/tipc/node.c index > e4dba865105e..65644642c091 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -777,9 +777,9 @@ static void tipc_node_link_up(struct tipc_node *n, > int bearer_id, > * disturbance, wrong session, etc.) > * 3. Link <1B-2B> up > * 4. Link endpoint 2A down (e.g. due to link tolerance timeout) > - * 5. Node B starts failover onto link <1B-2B> > + * 5. Node 2 starts failover onto link <1B-2B> > * > - * ==> Node A does never start link/node failover! > + * ==> Node 1 does never start link/node failover! > * > * @n: tipc node structure > * @l: link peer endpoint failingover (- can be NULL) @@ -794,6 +794,10 @@ > static void tipc_node_link_failover(struct tipc_node *n, struct tipc_link *l, > if (!tipc_link_is_up(tnl)) > return; > > + /* Don't rush, failure link may be in the process of resetting */ > + if (l && !tipc_link_is_reset(l)) > + return; > + > tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); > tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); > > @@ -1719,7 +1723,7 @@ static bool tipc_node_check_state(struct > tipc_node *n, struct sk_buff *skb, > /* Initiate or update failover mode if applicable */ > if ((usr == TUNNEL_PROTOCOL) && (mtyp == FAILOVER_MSG)) { > syncpt = oseqno + exp_pkts - 1; > - if (pl && tipc_link_is_up(pl)) { > + if (pl && !tipc_link_is_reset(pl)) { > __tipc_node_link_down(n, &pb_id, xmitq, &maddr); > trace_tipc_node_link_down(n, true, > "node link down <- failover!"); > -- > 2.13.7 |