From: Ying X. <yin...@wi...> - 2012-10-26 08:56:49
|
Hi Erik, I understood your idea. Firstly I think the solution might be workable, but I believe it's not easy to implement a stronger version. As you know, in a distribution system, it's a hard thing to maintain a shared variable(i,e, "congested" as you mentioned). When we want to keep the variable synchronized in all nodes of cluster in time, we have to make hard efforts. Furthermore, from my understanding, you actually try to add a simple flow control algorithm for RDM socket on port layer to prevent its recvqueue from overloading. I believe the flow control mechanism can really reduce the rate of overflow, but it may not eliminate the risk at all because: 1. In distribution system, we cannot easily estimate which water mark value notifying that some receiver will enter into congested state is the best. 2. As I stated in past, in port layer, as long as we receive any message, we cannot drop it even if we know receive buffer will be overloaded. 3. Even for the same skb buffer, the skb's truesize might be different for different hosts. Therefor, we cannot 100% prevent socket receive buffer from being overloaded with TM algorithm. Like your specified, with my TM patches sent to you previously we cannot also prohibit the issue happening. In all, I cannot figure out a better method to resolve the issue. Regards, Ying Erik Hugne wrote: > There is a problem with SOCK_RDM messaging in TIPC. > If the receiving port of an RDM packet is unable to handle the message > (socket rcvbuf full), the message can be: > A. Dropped, if DEST_DROPPABLE flag is true. > B. Rejected back to sender if DEST_DROPPABLE flag is false. > For B, if one or more messages have been rejected, subsequent messages > that are still in transit can still be delivered to the socket if it > have managed to drain the socket rcvqueue enough for them to be buffered. > This breaks the guaranteed sequentiality of RDM message delivery and > applies to both uni/multicast messaging. > > Easy way out of this is to force applications to use SEQPACKET/STREAM > sockets if they need guaranteed message delivery. > But we already have (almost) all the infrastructure in place to > implement blocking send for RDM if one or more of the recipients > cannot handle the packet. > If we add a 'congested' state in the publication struct: > http://lxr.free-electrons.com/source/net/tipc/name_table.h#L68 > > Everytime we send an RDM message, we will resolve a publication > through either tipc_nametbl_translate or tipc_nametbl_mc_translate. > If one or more of the destination ports that have published the given > name is congested, this will be reported back to the port layer, and > the send() call can be blocked/return -EAGAIN. > If the node/portid is explicitly specified (sockaddr_tipc of type > TIPC_ADDR_ID), we would probably need to add a reverse publication > lookup in tipc_send2port to check the 'congested' state. > > When a message is received on a RDM socket, we check the buffer fill > level. If this exceeds TIPC_RDM_HIGH_WMARK, we send out a TIPC > broadcast message with a (new?) TIPC protocol primitive that states > that port X on node a.b.c is overloaded. > Upon receiving this message, all nodes in the cluster updates the > corresponding publication and sets 'congested'=true. > > The WMARK limits should probably be based directly on the socket > rcvbuf size, maybe: > #define TIPC_RDM_LOW_WMARK(sk) (sk->sk_rcvbuf/4) > #define TIPC_RDM_HIGH_WMARK(sk) (sk->sk_rcvbuf - TIPC_RDM_LOW_WMARK(sk)) > > When the buffer fill level drops below TIPC_RDM_LOW_WMARK, we send out > another TIPC broadcast message, reporting that port X on node a.b.c is > no longer congested. > > Will this work? > > > > |