|
From: Jon M. (QB/LMC) <jon...@er...> - 2003-07-10 16:52:28
|
Hi, I will try to update my document with this information after my vacatation, which starts soon. About link congestion the story is very simple: I never throw away any messages. If the link queue limit is hit (it is configurable, but by default set to 48 packets for messages of importance "low"), the sending process/thread is put transparently to sleep until the window opens up again, which normally happens after a few microseconds. This gives the impression that a sending "never" fails. All messsges are given an importance priority in the range 0-3, set in the sending port/socket. Default value is 0 (low). Messages of priority "low" are subject to the "base queue limit", which is the same as the window limit, as configured, or 48. Messages of priority "medium" (1) are subject to queue limit = (base queue limit*4/3) . If the send queue is beyond window limit, but below the calculated queue limit, the message is queued, but not sent until the window opens up again. The process is not blocked in this case. If queue size is beyond calculated queue limit the process is temporarily put to sleep, just as with importance zero messages. Messages of priority "high" (2) and "non_rejectable" (3) are treated the same way, but are subject to the queue limits (base queue limit*5/3) and (base queue limit*6/3) respectively. For TIPC internal (e.g. name table update) messages, or routed messages, these limits are far more generous, in order to not let TIPC have to compete with its users. When it comes to processor overload, incoming messages are also handled according to importance priority. Each message is matched against two values before they are put into the read-queue of a socket. The basic value is the "global queue limit", which keeps track of the number of queued incoming, but not yet read messages on the whole processor. The limits here are 1000 for low importance, 2000 for medium importance, and 10000 for high and non-rejectable importance. If an incoming message is connection oriented these thresholds are multiplied by four, under the assumption that it has more consequences to tear down a transaction in progress than to reject it in the setup-phase, which is where non-connection oriented messages are typically used. If the upper limit is hit for a message, the message is not thrown away, but sent back (i.e. the first 1024 bytes) to the sender along with an error code. The importance for the rejected message is raised one step, to reduce the risk that the rejected message will hit the limit at the return (which is often at the same processor). If, despite all this, the rejected message hits the global limit for its importance level, the the message is thrown away. **This is the only situation where a message is thrown away silently, and as you see it takes a really bad overload situation to en up there.*** The consequence is also that a "connection abortion" message, which is both connection oriented, contains an error code, and has a raised importance level, is virtually never thrown away. To protect the processor from locally misbehaving applications there is also a "local queue limit", keeping track of the number of un-read messages queued in each socket. Based on empirical experience the values are here set to 1/2 of the global threshold, that is 500, 1000 and 5000 messages respectively. Otherwise the algorithm for rejecting and throwing away messages is the same as described above. I hope this gives you the information you need. Regards /Jon Paul Jardetzky wrote: > > Ok. I've read and understood your mail. When I get the time, > I'll make the appropriate changes. They are not hard and don't > require modifying anything outside the adaptation layer. That > said, if you decide to make the change for a future release, > just let me know. > > Mostly, I needed to fix this quickly even though we are not > going to use multiple zones. It is more about the perception > of TIPC's stability with the other engineers here. They see > machines crashing (their own desktops unfortunately) and they > immediately become concerned about building our product on top > this code. Fast bug fixes and some reassurance is needed to > develop the required confidence. You know the story ... :). > > We have a few solid networking types that need to know TIPCs > behavior under transient congestion ... e.g. when messages are > dropped and under what circumstances, window sizes, etc... If > you have information outside what is already in your document > (or the code), it will help with convincing folk that it has > what we need for our cluster. > > Thanks. I've been working with this code for a while now and > like it's functionality. > > -- Paul > > ---------------------------------------------------------- > Fabric7 Systems, Inc. Phone: +1 (650) 210-0117 > 1300 Crittenden Lane, Suite 302 Mobile: +1 (650) 619-9141 > Mountain View, CA 94043 http://www.fabric7.com |