From: Erik H. <eri...@er...> - 2012-10-25 13:48:17
|
There is a problem with SOCK_RDM messaging in TIPC. If the receiving port of an RDM packet is unable to handle the message (socket rcvbuf full), the message can be: A. Dropped, if DEST_DROPPABLE flag is true. B. Rejected back to sender if DEST_DROPPABLE flag is false. For B, if one or more messages have been rejected, subsequent messages that are still in transit can still be delivered to the socket if it have managed to drain the socket rcvqueue enough for them to be buffered. This breaks the guaranteed sequentiality of RDM message delivery and applies to both uni/multicast messaging. Easy way out of this is to force applications to use SEQPACKET/STREAM sockets if they need guaranteed message delivery. But we already have (almost) all the infrastructure in place to implement blocking send for RDM if one or more of the recipients cannot handle the packet. If we add a 'congested' state in the publication struct: http://lxr.free-electrons.com/source/net/tipc/name_table.h#L68 Everytime we send an RDM message, we will resolve a publication through either tipc_nametbl_translate or tipc_nametbl_mc_translate. If one or more of the destination ports that have published the given name is congested, this will be reported back to the port layer, and the send() call can be blocked/return -EAGAIN. If the node/portid is explicitly specified (sockaddr_tipc of type TIPC_ADDR_ID), we would probably need to add a reverse publication lookup in tipc_send2port to check the 'congested' state. When a message is received on a RDM socket, we check the buffer fill level. If this exceeds TIPC_RDM_HIGH_WMARK, we send out a TIPC broadcast message with a (new?) TIPC protocol primitive that states that port X on node a.b.c is overloaded. Upon receiving this message, all nodes in the cluster updates the corresponding publication and sets 'congested'=true. The WMARK limits should probably be based directly on the socket rcvbuf size, maybe: #define TIPC_RDM_LOW_WMARK(sk) (sk->sk_rcvbuf/4) #define TIPC_RDM_HIGH_WMARK(sk) (sk->sk_rcvbuf - TIPC_RDM_LOW_WMARK(sk)) When the buffer fill level drops below TIPC_RDM_LOW_WMARK, we send out another TIPC broadcast message, reporting that port X on node a.b.c is no longer congested. Will this work? |