[Osdlcluster-tipc] Re: More debugging news

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,
I will try to update my document with this information after my vacatation,
which starts soon.

About link congestion the story is very simple: I never throw away any
messages. If the link queue limit is hit (it is configurable, but by default
set to 48 packets for messages of importance "low"), the sending 
process/thread
is put transparently to sleep  until the window opens up again, which 
normally
happens after a few  microseconds. This gives  the impression that a 
sending "never" fails.

All messsges are given an importance priority in the range 0-3, set in 
the sending
port/socket. Default value is 0 (low).
Messages of priority "low" are subject to the "base queue limit", which 
is the same as
the window limit,  as configured, or 48.
Messages of priority "medium" (1) are subject to queue limit = (base 
queue limit*4/3) .
If  the send queue is beyond window limit, but below the calculated 
queue limit,
the message is queued, but not sent until the window opens up again. The 
process
is not blocked in this case. If queue size is beyond  calculated queue 
limit the process
is temporarily put to sleep, just as with importance zero messages.
Messages of priority "high" (2) and "non_rejectable" (3) are treated the 
same way, but are subject
to the queue limits (base queue limit*5/3) and (base queue limit*6/3) 
respectively.

For TIPC internal (e.g. name table update) messages, or routed messages, 
these limits
are far more generous, in order to not let TIPC have to compete with its 
users.

When it comes to processor overload, incoming messages are also handled 
according
to  importance priority. Each message is matched against two values before
they are put into the read-queue of a socket.

The basic value is the "global queue limit", which keeps track of the 
number of
queued incoming, but not yet read messages on the whole processor. The 
limits here
are 1000 for low importance, 2000 for medium importance, and 10000 for
high and non-rejectable importance.
If an incoming message is connection oriented these thresholds are 
multiplied
by four, under the assumption that it has more consequences to tear down a
transaction in progress than to reject it in the setup-phase, which is where
non-connection oriented messages are typically used.
If the upper limit is hit for  a message, the message is not thrown 
away, but
sent back (i.e. the first 1024 bytes) to the sender along with an error 
code.
The importance for the rejected message is raised one step, to reduce the
risk that the rejected message will hit the limit at the return (which 
is often at
the same processor). If, despite all this, the rejected message hits the 
global limit
for its importance level, the the message is thrown away.
**This is the only situation where a message is thrown away silently, and as
you see it takes a really bad overload situation to en up there.***
 The consequence is also that a "connection abortion" message, which is 
both
connection oriented, contains an error code, and has a raised importance 
level,
 is virtually never thrown away.

To protect the processor from locally misbehaving applications there is 
also a
"local queue limit", keeping track of the number of un-read messages 
queued in
each socket. Based on empirical experience the values are here set to 
1/2 of the
global threshold, that is 500, 1000 and 5000 messages respectively. 
Otherwise
the algorithm for rejecting and throwing away messages is the same as 
described above.

I hope this gives you the information you need.

Regards /Jon

Paul Jardetzky wrote:

>
> Ok. I've read and understood your mail. When I get the time,
> I'll make the appropriate changes. They are not hard and don't
> require modifying anything outside the adaptation layer. That
> said, if you decide to make the change for a future release,
> just let me know.
>
> Mostly, I needed to fix this quickly even though we are not
> going to use multiple zones. It is more about the perception
> of TIPC's stability with the other engineers here. They see
> machines crashing (their own desktops unfortunately) and they
> immediately become concerned about building our product on top
> this code. Fast bug fixes and some reassurance is needed to
> develop the required confidence. You know the story ... :).
>
> We have a few solid networking types that need to know TIPCs
> behavior under transient congestion ... e.g. when messages are
> dropped and under what circumstances, window sizes, etc... If
> you have information outside what is already in your document
> (or the code), it will help with convincing folk that it has
> what we need for our cluster.
>
> Thanks. I've been working with this code for a while now and
> like it's functionality.
>
> -- Paul
>
> ----------------------------------------------------------
> Fabric7 Systems, Inc.             Phone: +1 (650) 210-0117
> 1300 Crittenden Lane, Suite 302  Mobile: +1 (650) 619-9141
> Mountain View, CA 94043            http://www.fabric7.com