[tipc-discussion] Link congestion with the topology service

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

All,
I'm running TIPC 1.5.12 on linux 2.6.21.

We appear to be seeing link congestion occurring with our topology
service notifications (publications and withdrawls).  It appears that
when we have a situation where one of our nodes goes down under a heavy
load, the other nodes are getting inundated with withdrawn events and
some of these events are not making it to their intended destination
because of link congestion.  There are hundreds of processes running on
each of 10 nodes plus one cluster manager.  
I believe the topology service opens up its socket with critical
importance and so do our applications.  So there is no problem in
enqueuing the messages on the socket receive queue, but we don't let the
transmit queue grew indefinitely though.  I can increase the window size
from it's default of 50 to the maximum 150 which will let us go from 96
fragments/messages to 300 fragments/messages for critical messages.  I'm
not sure how much this will help.

 So I've got a few questions:

1)Does the topology service do anything special when it detects link
congestion?  Or do the messages just get dropped as appears to be
happening in this case?
2)If 100's of processes go down at almost the same time (i.e. during a
node reset), the topology service on another node will subsequently
flood the link with withdrawn messages won' it?  I just wanted to
determine what the expected behavior is.
3)Is there anyway that an application can determine that a link
congestion situation is occurring?  We can see from using tipc-config
-ls  that link congestion has occurred.  I assume that the number we're
seeing refers to the number of occurrences of link congestion.

It is very important that our application knows what processes are
up/down so we're very sensitive to the situation where we lose these
topology service messages.  Any advice or pointers would be very helpful
for this situation.

Thanks,
Felix

[tipc-discussion] Link congestion with the topology service

Cluster wide IPC providing datagram, connection, and bus messaging

[tipc-discussion] Link congestion with the topology service