Thread: [tipc-discussion] Enhancement to SOCK_RDM

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

There is a problem with SOCK_RDM messaging in TIPC.
If the receiving port of an RDM packet is unable to handle the message 
(socket rcvbuf full), the message can be:
A. Dropped, if DEST_DROPPABLE flag is true.
B. Rejected back to sender if DEST_DROPPABLE flag is false.
For B, if one or more messages have been rejected, subsequent messages 
that are still in transit can still be delivered to the socket if it 
have managed to drain the socket rcvqueue enough for them to be buffered.
This breaks the guaranteed sequentiality of RDM message delivery and 
applies to both uni/multicast messaging.

Easy way out of this is to force applications to use SEQPACKET/STREAM 
sockets if they need guaranteed message delivery.
But we already have (almost) all the infrastructure in place to 
implement blocking send for RDM if one or more of the recipients cannot 
handle the packet.
If we add a 'congested' state in the publication struct:
http://lxr.free-electrons.com/source/net/tipc/name_table.h#L68

Everytime we send an RDM message, we will resolve a publication through 
either tipc_nametbl_translate or tipc_nametbl_mc_translate.
If one or more of the destination ports that have published the given 
name is congested, this will be reported back to the port layer, and the 
send() call can be blocked/return -EAGAIN.
If the node/portid is explicitly specified (sockaddr_tipc of type 
TIPC_ADDR_ID), we would probably need to add a reverse publication 
lookup in tipc_send2port to check the 'congested' state.

When a message is received on a RDM socket, we check the buffer fill 
level. If this exceeds TIPC_RDM_HIGH_WMARK, we send out a TIPC broadcast 
message with a (new?) TIPC protocol primitive that states that port X on 
node a.b.c is overloaded.
Upon receiving this message, all nodes in the cluster updates the 
corresponding publication and sets 'congested'=true.

The WMARK limits should probably be based directly on the socket rcvbuf 
size, maybe:
#define TIPC_RDM_LOW_WMARK(sk) (sk->sk_rcvbuf/4)
#define TIPC_RDM_HIGH_WMARK(sk) (sk->sk_rcvbuf - TIPC_RDM_LOW_WMARK(sk))

When the buffer fill level drops below TIPC_RDM_LOW_WMARK, we send out 
another TIPC broadcast message, reporting that port X on node a.b.c is 
no longer congested.

Will this work?

Thread: [tipc-discussion] Enhancement to SOCK_RDM

Cluster wide IPC providing datagram, connection, and bus messaging

tipc-discussion