From: Peter L. <xp...@am...> - 2011-02-25 15:37:38
|
Hi Andrew, I haven't tried to raise the limit as memory usage is the least of all my worries. However considering the nature of the issue raising the limit itself won't fix the problem as a stuck process should always be considered a possibility. Neither will it have the speedup advantage that comes from removing all operations on the shared atomic counter that causes unnecessery cache invalidations on SMP nodes. Regards, Peter. Andrew Booth написа: > Hi Peter, > > Do you know if you get similar behaviour if you use a large value for > OVERLOAD_LIMIT_BASE, rather than removing the code completely? > > Andrew > > -----Peter Litov <xp...@am...> wrote: ----- > > To: Jon Maloy <jon...@er...> > From: Peter Litov <xp...@am...> > Date: 02/25/2011 05:20AM > Cc: "tip...@li..." > <tip...@li...> > Subject: Re: [tipc-discussion] Issue with overflow detection logic > > Hello Jon and everybody, > > After reading your post I immediately recompiled tipc with the atomic > counter completely disabled and it was like the beginning of a new era - > sys cpu time is half what it was! > I should note that I'm using only 8-12 core SMP nodes. It also seems > that I was encountering regularly the overload scenario that blocked all > TIPC socket communication as I was experiencing sporadic loss of TIPC > communication to random nodes (I had that like always or so) which seems > to have been gone the last 24h. > In short - we should definitely get rid of this counter. I guess and > hope most of TIPC users are not on UP systems and this should affect > possitively most them. > > Regards, > Peter. > > Jon Maloy написа: > > Hi Chaks, > > The problem you describe is indeed real, and we have been discussing at this forum how to solve it. > > Your solution makes sense, but I think a much more radical solution is possible: we remove the global counter altogether. > > The real global limit would then be the amount of sk_bufs that can be allocated in the system. This is likely to be much higher than the global limit, but once it is reached, the result is the same as now: packets are dropped. > > This would basically solve your problem, since the non-drained sockets will be limited by their local limit, and all the others will be unaffected. > > > > I think we agreed (at least I and Allan) to try this solution at some moment, but it never got implemented. > > Maybe it is time for it now. > > > > Regards > > ///jon > > > > Jon Maloy M.Sc. EE > > Researcher > > Ericsson Canada > > Broadband and Systems Research > > 8400 Decarie > > H4P 2N2, Montreal, Quebec, Canada > > Phone + 1 514 345-7900 x42056 > > Mobile + 1 514 591-5578 > > jon...@er... > > www.ericsson.com > > > > > > > >> -----Original Message----- > >> From: Chigurupati, Chaks [mailto:ch...@wi...] > >> Sent: February-23-11 17:31 > >> To: tip...@li... > >> Subject: [tipc-discussion] Issue with overflow detection logic > >> > >> Hi all, > >> > >> If this has been discussed before and resolved, please ignore > >> and point me to the conclusions. > >> > >> I think that the overflow detection logic in dispatch() and > >> queue_overloaded() is not fair when there is a mix of > >> connection-less and connection-oriented sockets. > >> > >> The overall idea seems to be: apply a per-socket limit as > >> well as a system-wide limit on all sockets. In the current > >> code, the per-socket limit is half of the system-wide limit > >> (OVERLOAD_LIMIT_BASE). However, based on the importance of > >> the message or the type of the sockets, a multiplication > >> factor is applied on these limits in queue_overloaded() > >> function. For example, connection-oriented sockets have a > >> multiplication factor of 4 while low-priority messages on > >> connection-less sockets have a factor of 1. > >> > >> The system-wide count of all messages in all sockets is > >> maintained using one global variable, tipc_queue_size (that > >> is atomically incremented/decremented). The multiplication > >> factor is also applied on this counter when performing the > >> overflow checks. > >> > >> What this means is that a large number of connection-orinted > >> sockets can cause the connection-less sockets to suffer more > >> drops because of the multiplication factors. Low importance > >> messages arriving into connection-less socekts are dropped if > >> tipc_queue_size exceeds OVERLOAD_LIMIT_BASE. However, > >> messages arriving into connection-oriented sockets are > >> dropped only if tipc_queue_size exceeds OVERLOAD_LIMIT_BASE * > >> 4. So, the connection oriented sockets can make the > >> tipc_queue_size become a very large value (i.e. Larger than > >> OVERLOAD_LIMIT_BASE) and thereby cause the connection-less > >> sockets to suffer sustained and prolonged drops. > >> > >> Basically, one or more abusers can impact other sockets that > >> are getting drained at a normal pace. You don't really need a > >> mix of different socket types to hit this scenario. All you > >> need is two sockets that are not getting drained at all - > >> those two sockets will build up to half the system limit and > >> together will prevent any packets from getting queued into > >> other sockets. > >> > >> If there is agreement that this is indeed a problem, we can > >> probably discuss some solutions? The one I have in mind is to > >> allow the queuing of a packet into a socket's receive buffer > >> if its receive queue length is less than a certain limit (say > >> OVERLOAD_LIMIT_BASE / 100). This is irrespective of the > >> system-wide limit. That way any sockets that are getting > >> drained properly should not see drops. > >> > >> Thx > >> Chaks > >> > >> > >> > >> > >> ============================================================ > >> The information contained in this message may be privileged > >> and confidential and protected from disclosure. If the reader > >> of this message is not the intended recipient, or an employee > >> or agent responsible for delivering this message to the > >> intended recipient, you are hereby notified that any > >> reproduction, dissemination or distribution of this > >> communication is strictly prohibited. If you have received > >> this communication in error, please notify us immediately by > >> replying to the message and deleting it from your computer. > >> Thank you. Tellabs > >> ============================================================ > >> -------------------------------------------------------------- > >> ---------------- > >> Free Software Download: Index, Search & Analyze Logs and > >> other IT data in Real-Time with Splunk. Collect, index and > >> harness all the fast moving IT data generated by your > >> applications, servers and devices whether physical, virtual > >> or in the cloud. Deliver compliance at lower cost and gain > >> new business insights. http://p.sf.net/sfu/splunk-dev2dev > >> _______________________________________________ > >> tipc-discussion mailing list > >> tip...@li... > >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > >> > >> > > ------------------------------------------------------------------------------ > > Free Software Download: Index, Search & Analyze Logs and other IT data in > > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > > generated by your applications, servers and devices whether physical, virtual > > or in the cloud. Deliver compliance at lower cost and gain new business > > insights. http://p.sf.net/sfu/splunk-dev2dev > > _______________________________________________ > > tipc-discussion mailing list > > tip...@li... > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search & Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > |