[tipc-discussion] Solution to problem of losing fragmented messages

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Jon:

It looks like Howard (the Wind River TIPC developer in China) is working
on a solution to the long-standing problem of what to do if TIPC is
unable to begin reassembling a fragmented message because there isn't an
available sk_buff to hold the reassembled message. Basically, his idea
is to simply ignore the first fragment completely -- as if the message
was never received -- and let the link code take care of retransmitting
it later on. I think his solution for unicast links is more or less
correct, but his solution for the broadcast link isn't; I've suggested a
solution that should work there too.

Can you take a look at Howard's email (at the bottom), and then my
comments (above Howard's email), and let me know if you see any problems
with either area?

I can provide more details on the defect that Howard is trying to fix,
if needed. Basically, someone was blasting large SOCK_RDM messages down
a link and found that between 1% and 10% of the messages were being lost
because there weren't enough available buffers to allow them to be
reassembled.

Regards,
Al

________________________________

From: Stephens, Allan 
Sent: Tuesday, August 31, 2010 2:47 PM
To: Xu, Hao (Howard)
Cc: Kang, Qi; Xue, Ying
Subject: RE: WIND00229427

Hi Howard:

[Note: This email contains comments on this defect only; I don't want to
discuss multiple different issues in a single email, as it gets too
confusing. I'll send separate responses for the other defects you've
asked about.]

I like the approach you've taken to dealing with this issue, but more
work needs to be done to have a complete solution.

1) It would probably be more Linux-like to have
tipc_link_recv_fragment() return -1 if an error occurred, rather than 2.
This way, you'd have the following set of return values:
    0 - fragment received, message not complete,
    1 - fragment received, message complete,
    -1 - fragment not received, message not started
You'll need to modify tipc_recv_msg() accordingly to look for -1 instead
of 2.

2) The change you've made to tipc_bclink_recv_pkt() doesn't do anything
to force the discarded message fragment to be retransmitted, which is
essential if you don't want to lose messages. However, it doesn't look
like this can be done as easily as simply decrementing "next_in_no" like
you did for unicast links. Basically, you need to look at all code that
happens after the "if (seqno == next_in)" check and make sure nothing
important happens before you check the result of
tipc_link_recv_fragment(); this will allow you to bail out if a problem
occurs and result in the same effect as if the message had never been
received, which is what you want. I think you probably want to code
something like this:

 if (seqno == next_in) {
receive:

  /* Deliver message to destination */

  if (likely(msg_isdata(msg))) {
   spin_lock_bh(&bc_lock);
   bclink_accept_pkt(...);
   spin_unlock_bh(&bc_lock);
   tipc_node_unlock(node);
   tipc_port_recv_mcast(buf, NULL);
  } else if (msg_user(msg) == MSG_BUNDLER) {
   spin_lock_bh(&bc_lock);
   bclink_accept_pkt(...);
   bcl->stats.recv_bundles++;
   bcl->stats.recv_bundled += msg_msgcnt(msg);
   spin_unlock_bh(&bc_lock);
   tipc_node_unlock(node);
   tipc_link_recv_bundle(buf);
  } else if (msg_user(msg) == MSG_FRAGMENTER) {
   ret = tipc_link_recv_fragment(&node->bclink.defragm,
          &buf, &msg);
   if (ret < 0) {
        /* unable to start reassembly -- discard fragment & wait for
retransmission */
        goto unlock;
   } 
   spin_lock_bh(&bc_lock);
   bclink_accept_pkt(...);
   bcl->stats.recv_fragments++;
   if (ret > 0) {
        bcl->stats.recv_fragmented++;
        msg_set_destnode_cache(msg, tipc_own_addr);
   }
   spin_unlock_bh(&bc_lock);
   tipc_node_unlock(node);
   tipc_net_route_msg(buf);
  } else if (msg_user(msg) == NAME_DISTRIBUTOR) {
  ... <and so on>

where bclink_accept_pkt(...) does:

  bclink_update_last_sent(node, seqno);
  node->bclink.last_in = seqno;
  node->bclink.oos_state = 0;
  bcl->stats.recv_info++;
  /*
   * Unicast an ACK periodically, ensuring that
   * all nodes in the cluster don't ACK at the same time
   */
  if (((seqno - tipc_own_addr) % TIPC_MIN_LINK_WIN) == 0) {
   tipc_link_send_proto_msg(
    node->active_links[node->elm.addr & 1],
    STATE_MSG, 0, 0, 0, 0, 0, 0);
   bcl->stats.sent_acks++;
  }

I realize that this stuff is probably confusing to you -- indeed, I find
the broadcast link code confusing too! -- so if you've got any
questions, feel free to ask.

-- Al

	2. WIND00229427
	To the issue, I already provided a workaround (in red) to the
customer and waited for feedback. Also could you review the
modification? I think in the defect the description is detailed. So
please refer to the defect.

	a. In tipc_link.c

	void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer
*tb_ptr)
	{
	   read_lock_bh(&tipc_net_lock);
	 while (head) {
	  struct bearer *b_ptr = (struct bearer *)tb_ptr;
	  struct tipc_node *n_ptr;
	  struct link *l_ptr;
	  struct sk_buff *crs;
	  struct sk_buff *buf;
	  struct tipc_msg *msg;
	  u32 type;
	  u32 seq_no;
	  u32 ackd;
	  u32 released;
	  u32 ret;

	  [...]
	    case ROUTE_DISTRIBUTOR:
	                                        tipc_node_unlock(n_ptr);
	     tipc_route_recv(buf);
	                                        continue;
	                                case CONN_MANAGER:
	                                        /* route message
normally */
	                                        break;
	                                case MSG_FRAGMENTER:

l_ptr->stats.recv_fragments++;
	                                        ret =
tipc_link_recv_fragment(&l_ptr->defragm_buf, 

&buf, &msg);
	                                        if (ret == 1) {

l_ptr->stats.recv_fragmented++;
	                                                goto deliver;
	                                        }

	                                        if (ret == 2)
	                                        {
	                                            l_ptr->next_in_no--;
----->>> When memory is exhausted, decrease the value of
l_ptr->next_in_no for requesting retransmitting the packet. Thus the
packet will not be lost.
	                                        }

	                                        break;
	#ifdef CONFIG_TIPC_MULTIPLE_LINKS
	                                case CHANGEOVER_PROTOCOL:
	                                        type = msg_type(msg);
	  [...]
	}

	int tipc_link_recv_fragment(struct sk_buff **pending, struct
sk_buff **fb,
	       struct tipc_msg **m)
	{
	 struct sk_buff *pbuf = *pending;
	 struct sk_buff *fbuf = *fb;
	 struct sk_buff *prev = NULL;
	 struct tipc_msg *fragm = buf_msg(fbuf);
	 u32 long_msg_orig = msg_orignode(fragm);
	 u32 long_msg_seq_no = msg_fragm_msg_no(fragm);
	 [....]
	   pbuf = buf_acquire(msg_size(imsg));
	  if (pbuf != NULL) {
	   pbuf->next = *pending;
	   *pending = pbuf;
	   skb_copy_to_linear_data(pbuf, imsg,
	      msg_data_sz(fragm));

	   /*  Prepare buffer for subsequent fragments. */

	   set_long_msg_orig(pbuf, long_msg_orig); 
	   set_long_msg_seqno(pbuf, long_msg_seq_no); 
	   set_fragm_size(pbuf,fragm_sz); 
	   set_expected_frags(pbuf,exp_fragm_cnt - 1); 
	   reset_timer_cnt(pbuf);
	  } else {
	      warn("Link unable to reassemble fragmented message\n");
	      buf_discard(fbuf);
	      return 2;                         ----->>>> When memory is
exhausted, the retured value is 2.
	  }
	  buf_discard(fbuf);
	  return 0;
	 [....]
	}

	b. In tipc_bcast.c

	void tipc_bclink_recv_pkt(struct sk_buff *buf)
	{
	 struct tipc_msg *msg;
	 struct tipc_node *node;
	 u32 seqno;
	 u32 next_in;
	 int deferred;
	 [...]
	   if (likely(msg_isdata(msg))) {
	   spin_unlock_bh(&bc_lock);
	   tipc_node_unlock(node);
	   tipc_port_recv_mcast(buf, NULL);
	  } else if (msg_user(msg) == MSG_BUNDLER) {
	   bcl->stats.recv_bundles++;
	   bcl->stats.recv_bundled += msg_msgcnt(msg);
	   spin_unlock_bh(&bc_lock);
	   tipc_node_unlock(node);
	   tipc_link_recv_bundle(buf);
	  } else if (msg_user(msg) == MSG_FRAGMENTER) {
	   bcl->stats.recv_fragments++;
	   if (tipc_link_recv_fragment(&node->bclink.defragm,
	          &buf, &msg) == 1) {
	    bcl->stats.recv_fragmented++;
	    msg_set_destnode_cache(msg, tipc_own_addr);
	   }
	   spin_unlock_bh(&bc_lock);
	   tipc_node_unlock(node);
	   tipc_net_route_msg(buf);
	  } else if (msg_user(msg) == NAME_DISTRIBUTOR) {

	 [...]
	}

[tipc-discussion] Solution to problem of losing fragmented messages

Cluster wide IPC providing datagram, connection, and bus messaging

[tipc-discussion] Solution to problem of losing fragmented messages