Re: [tipc-discussion] Issue with overflow detection logic

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Peter,
That was an impressive, and surprising, difference you could document.
I guess this subject should get a much higher priority on our agenda from now on...
I suspect that this also higlight the need for an overvwiew of the locking policies in general in TIPC.
There may be more hidden bottlenecks like this around.

Did you make any attempt to run the benchmark demo before and after this patch?
We may be able to see a difference even there.

Thank you for your great input.

///jon

Jon Maloy M.Sc. EE
Researcher
Ericsson Canada
Broadband and Systems Research
8400 Decarie
H4P 2N2, Montreal, Quebec, Canada
Phone + 1 514 345-7900 x42056
Mobile + 1 514 591-5578
jon...@er...
www.ericsson.com

> -----Original Message-----
> From: Peter Litov [mailto:xp...@am...]
> Sent: February-25-11 13:32
> To: Andrew Booth
> Cc: Jon Maloy; tip...@li...
> Subject: Re: [tipc-discussion] Issue with overflow detection logic
>
> Hi Andrew,
>
> I wouldn't for the following two reasons:
> It's just plain stupid to block all TIPC communication in a
> node because of a rogue process/thread. And it would also be
> very naive to expect all apps to always work flawlessly. My
> normal scenario is several hunded threads per node grouped in
> a dozen or so processes, all of them communicating and really
> expecting TIPC to just work and not perform memory management
> or process reliability analysis. Also TIPC is supposedly a
> cluster thing and it should be more optimized for work on a
> 'typical' cluster node rather than on an embeded device or
> something like that. In my particular case the node looses
> connectivity with the system before it is able to detect the
> bad process (because communication goes down and it is vital)
> and as a result instead of a short application restart (1-2
> sec downtime), the whole node is being restarted by the
> system which is significantly longer (60-120 sec). So instead
> of helping the node to be stable, this code actualy
> destabilizes it and prevents the system from healing itself.
> The other reason is that I am currently using 8 and 12 core
> nodes, but I would rather preffer using 24 and 48 core nodes
> should TIPC become stable enough .. and now this starts to
> look like a real possibility. I have tried to add a 16 core
> (4x4) node to the cluster and that was a disaster .. and I
> couldn't even figure out the reason .. but I was pretty sure
> it was a cache contention issue rising from the number of
> sockets and remote caches.
> With regards to the particular performance impact from
> removing the atomic ops - node load average as well as sys%
> are exactly half of what they were before .. after 36 hours
> of production use and statistics. So I would guess the impact
> is severe .. considering it makes a third of my hardware
> investment a complete waste :) .. And that's nothing compared
> to lost revenue due to downtimes.
> Also please note that memory is obscenely cheap lately and I
> really can't see why would tipc ever try to limit the total
> number of buffers in use, be it 5000 ot 10000. I understand a
> per socket limit but not per node one. It would only serve to
> protect the kernel (supposedly) on a node with very limited
> memory and a single badly written app. It's not TIPC's job to
> protect the kernel from a badly written app .. also the
> kernel should be capable of handling such scenarios just as
> the apps themself. I would prefer the stack trying to do it's
> 'communication' job until it's possible at all.. and start
> dropping incoming packets once and only once all other
> options are exhausted. Current logic gives up on the whole
> node just because of a minor glitch in few of potentialy
> thousands of sockets.
>
> Regards,
> Peter.
>
> Andrew Booth написа:
> > Hi Peter,
> >
> > I'm wondering if making OVERLOAD_LIMIT_BASE configurable would be a
> > helpful addition to mainstream TIPC. If that were
> available, would you
> > use it?
> >
> > During the working group call yesterday there were some
> discussions on
> > TIPC receive congestion handling. There was some concern
> that removing
> > OVERLOAD_LIMIT_BASE would make TIPC users vulnerable to
> running out of
> > skbuffs, which would probably have a more dramatic effect on the
> > system than hitting TIPC receive congestion.
> >
> > If the only motivation for removing the counter is to save on the
> > atomic operations, it would be nice to know if they have a
> noticeable
> > effect. Do you have any insight on this?
> >
> > Andrew
> >
> > -----Peter Litov <xp...@am...> wrote: -----
> >
> >     To: Andrew Booth <ab...@pt...>
> >     From: Peter Litov <xp...@am...>
> >     Date: 02/25/2011 10:37AM
> >     Cc: jon...@er...,
> tip...@li...
> >     Subject: Re: [tipc-discussion] Issue with overflow
> detection logic
> >
> >     Hi Andrew,
> >
> >     I haven't tried to raise the limit as memory usage is
> the least of all
> >     my worries. However considering the nature of the issue
> raising the
> >     limit itself won't fix the problem as a stuck process
> should always be
> >     considered a possibility. Neither will it have the
> speedup advantage
> >     that comes from removing all operations on the shared
> atomic counter
> >     that causes unnecessery cache invalidations on SMP nodes.
> >
> >     Regards,
> >     Peter.
> >
> >     Andrew Booth написа:
> >     > Hi Peter,
> >     >
> >     > Do you know if you get similar behaviour if you use a
> large value for
> >     > OVERLOAD_LIMIT_BASE, rather than removing the code completely?
> >     >
> >     > Andrew
> >     >
> >     > -----Peter Litov <xp...@am...> wrote: -----
> >     >
> >     >     To: Jon Maloy <jon...@er...>
> >     >     From: Peter Litov <xp...@am...>
> >     >     Date: 02/25/2011 05:20AM
> >     >     Cc: "tip...@li..."
> >     >     <tip...@li...>
> >     >     Subject: Re: [tipc-discussion] Issue with
> overflow detection logic
> >     >
> >     >     Hello Jon and everybody,
> >     >
> >     >     After reading your post I immediately recompiled
> tipc with the atomic
> >     >     counter completely disabled and it was like the
> beginning of a new era -
> >     >     sys cpu time is half what it was!
> >     >     I should note that I'm using only 8-12 core SMP
> nodes. It also seems
> >     >     that I was encountering regularly the overload
> scenario that blocked all
> >     >     TIPC socket communication as I was experiencing
> sporadic loss of TIPC
> >     >     communication to random nodes (I had that like
> always or so) which seems
> >     >     to have been gone the last 24h.
> >     >     In short - we should definitely get rid of this
> counter. I guess and
> >     >     hope most of TIPC users are not on UP systems and
> this should affect
> >     >     possitively most them.
> >     >
> >     >     Regards,
> >     >     Peter.
> >     >
> >     >     Jon Maloy написа:
> >     >     > Hi Chaks,
> >     >     > The problem you describe is indeed real, and we
> have been discussing at this forum how to solve it.
> >     >     > Your solution makes sense, but I think a much
> more radical solution is possible: we remove the global
> counter altogether.
> >     >     > The real global limit would then be the amount
> of sk_bufs that can be allocated in the system. This is
> likely to be much higher than the global limit, but once it
> is reached, the result is the same as now: packets are dropped.
> >     >     > This would basically solve your problem, since
> the non-drained sockets will be limited by their local limit,
> and all the others will be unaffected.
> >     >     >
> >     >     > I think we agreed (at least I and Allan) to try
> this solution at some moment, but it never got implemented.
> >     >     > Maybe it is time for it now.
> >     >     >
> >     >     > Regards
> >     >     > ///jon
> >     >     >
> >     >     > Jon Maloy M.Sc. EE
> >     >     > Researcher
> >     >     > Ericsson Canada
> >     >     > Broadband and Systems Research
> >     >     > 8400 Decarie
> >     >     > H4P 2N2, Montreal, Quebec, Canada
> >     >     > Phone + 1 514 345-7900 x42056
> >     >     > Mobile + 1 514 591-5578
> >     >     > jon...@er...
> >     >     > www.ericsson.com
> >     >     >
> >     >     >
> >     >     >
> >     >     >> -----Original Message-----
> >     >     >> From: Chigurupati, Chaks [mailto:ch...@wi...]
> >     >     >> Sent: February-23-11 17:31
> >     >     >> To: tip...@li...
> >     >     >> Subject: [tipc-discussion] Issue with overflow
> detection logic
> >     >     >>
> >     >     >> Hi all,
> >     >     >>
> >     >     >> If this has been discussed before and
> resolved, please ignore
> >     >     >> and point me to the conclusions.
> >     >     >>
> >     >     >> I think that the overflow detection logic in
> dispatch() and
> >     >     >> queue_overloaded() is not fair when there is a mix of
> >     >     >> connection-less and connection-oriented sockets.
> >     >     >>
> >     >     >> The overall idea seems to be: apply a
> per-socket limit as
> >     >     >> well as a system-wide limit on all sockets. In
> the current
> >     >     >> code, the per-socket limit is half of the
> system-wide limit
> >     >     >> (OVERLOAD_LIMIT_BASE). However, based on the
> importance of
> >     >     >> the message or the type of the sockets, a
> multiplication
> >     >     >> factor is applied on these limits in
> queue_overloaded()
> >     >     >> function. For example, connection-oriented
> sockets have a
> >     >     >> multiplication factor of 4 while low-priority
> messages on
> >     >     >> connection-less sockets have a factor of 1.
> >     >     >>
> >     >     >> The system-wide count of all messages in all
> sockets is
> >     >     >> maintained using one global variable,
> tipc_queue_size (that
> >     >     >> is atomically incremented/decremented). The
> multiplication
> >     >     >> factor is also applied on this counter when
> performing the
> >     >     >> overflow checks.
> >     >     >>
> >     >     >> What this means is that a large number of
> connection-orinted
> >     >     >> sockets can cause the connection-less sockets
> to suffer more
> >     >     >> drops because of the multiplication factors.
> Low importance
> >     >     >> messages arriving into connection-less socekts
> are dropped if
> >     >     >> tipc_queue_size exceeds OVERLOAD_LIMIT_BASE. However,
> >     >     >> messages arriving into connection-oriented sockets are
> >     >     >> dropped only if tipc_queue_size exceeds
> OVERLOAD_LIMIT_BASE *
> >     >     >> 4. So, the connection oriented sockets can make the
> >     >     >> tipc_queue_size become a very large value
> (i.e. Larger than
> >     >     >> OVERLOAD_LIMIT_BASE) and thereby cause the
> connection-less
> >     >     >> sockets to suffer sustained and prolonged drops.
> >     >     >>
> >     >     >> Basically, one or more abusers can impact
> other sockets that
> >     >     >> are getting drained at a normal pace. You
> don't really need a
> >     >     >> mix of different socket types to hit this
> scenario. All you
> >     >     >> need is two sockets that are not getting
> drained at all -
> >     >     >> those two sockets will build up to half the
> system limit and
> >     >     >> together will prevent any packets from getting
> queued into
> >     >     >> other sockets.
> >     >     >>
> >     >     >> If there is agreement that this is indeed a
> problem, we can
> >     >     >> probably discuss some solutions? The one I
> have in mind is to
> >     >     >> allow the queuing of a packet into a socket's
> receive buffer
> >     >     >> if its receive queue length is less than a
> certain limit (say
> >     >     >> OVERLOAD_LIMIT_BASE / 100). This is
> irrespective of the
> >     >     >> system-wide limit. That way any sockets that
> are getting
> >     >     >> drained properly should not see drops.
> >     >     >>
> >     >     >> Thx
> >     >     >> Chaks
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>
> ============================================================
> >     >     >> The information contained in this message may
> be privileged
> >     >     >> and confidential and protected from
> disclosure. If the reader
> >     >     >> of this message is not the intended recipient,
> or an employee
> >     >     >> or agent responsible for delivering this
> message to the
> >     >     >> intended recipient, you are hereby notified that any
> >     >     >> reproduction, dissemination or distribution of this
> >     >     >> communication is strictly prohibited. If you
> have received
> >     >     >> this communication in error, please notify us
> immediately by
> >     >     >> replying to the message and deleting it from
> your computer.
> >     >     >> Thank you. Tellabs
> >     >     >>
> ============================================================
> >     >     >>
> --------------------------------------------------------------
> >     >     >> ----------------
> >     >     >> Free Software Download: Index, Search &
> Analyze Logs and
> >     >     >> other IT data in Real-Time with Splunk.
> Collect, index and
> >     >     >> harness all the fast moving IT data generated by your
> >     >     >> applications, servers and devices whether
> physical, virtual
> >     >     >> or in the cloud. Deliver compliance at lower
> cost and gain
> >     >     >> new business insights.
> http://p.sf.net/sfu/splunk-dev2dev
> >     >     >> _______________________________________________
> >     >     >> tipc-discussion mailing list
> >     >     >> tip...@li...
> >     >     >>
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> >     >     >>
> >     >     >>
> >     >     >
> --------------------------------------------------------------
> ----------------
> >     >     > Free Software Download: Index, Search & Analyze
> Logs and other IT data in
> >     >     > Real-Time with Splunk. Collect, index and
> harness all the fast moving IT data
> >     >     > generated by your applications, servers and
> devices whether physical, virtual
> >     >     > or in the cloud. Deliver compliance at lower
> cost and gain new business
> >     >     > insights. http://p.sf.net/sfu/splunk-dev2dev
> >     >     > _______________________________________________
> >     >     > tipc-discussion mailing list
> >     >     > tip...@li...
> >     >     >
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> >     >     >
> >     >
> >     >
> >     >
> --------------------------------------------------------------
> ----------------
> >     >     Free Software Download: Index, Search & Analyze
> Logs and other IT data in
> >     >     Real-Time with Splunk. Collect, index and harness
> all the fast moving IT data
> >     >     generated by your applications, servers and
> devices whether physical, virtual
> >     >     or in the cloud. Deliver compliance at lower cost
> and gain new business
> >     >     insights. http://p.sf.net/sfu/splunk-dev2dev
> >     >     _______________________________________________
> >     >     tipc-discussion mailing list
> >     >     tip...@li...
> >     >
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> >     >
> >     >
> >
> >
> >
>
>

Re: [tipc-discussion] Issue with overflow detection logic

Cluster wide IPC providing datagram, connection, and bus messaging

Re: [tipc-discussion] Issue with overflow detection logic