Thread: [Sqlrelay-discussion] scaler stucked waiting for connection to rise
Brought to you by:
mused
From: Renat S. <sr...@st...> - 2011-06-10 14:21:12
Attachments:
scaler_timed_wait.diff
|
Hi! Sqlrelay got stuck recently on one of our production servers. Symptoms quite strange: there were 1 scaler, 1 connection and 17 listeners. I checked the status and got these results: Open Server Connections: 1 Opened Server Connections: 4016 Open Client Connections: 0 Opened Client Connections: 510926 Open Server Cursors: 3 Opened Server Cursors: 522974 Times New Cursor Used: 0 Times Cursor Reused: 11604823 Total Queries: 6313112 Total Errors: 4718 Forked Listeners: 16 Scaler's view: Connections: 1 Sessions: 18 Semaphores: +---------------------------------------------+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | +---+---+---+---+---+---+---+---+---+---+-----+ | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 16 | +---------------------------------------------+ Pay attention to semaphores. The normal state of them is +---------------------------------------------+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | +---+---+---+---+---+---+---+---+---+---+-----+ | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | +---------------------------------------------+ There differences in semaphores 5,6,8 5==0 and 6==1 means that listener got client request and signalled to scaler to increase the number of connections if it is possible (sqlrlistener::incrementSessionCount) Continious state of 6==1 signals that scaler stuck somewhere in the cycle inside scaler::openMoreConnections(). 8==0 gives a clue that scaler is waiting for connection to fire up, i.e. connects to DB and increment connection counter. By the way, it might be an error that the normal state of sem 8 is 1. It is getting this state after the first non-scaled connection started by sqlr-start and actually this semaphore changes state between 1 and 2 instead of 0 and 1 while scaler is firing up new connections. It might be a good idea to wait for sem8 in sqlr-start, isn't it? So why it is finally got state 0? I suppose it was the result of the startup process of connection. Some connection processes started, but got some errors and didn't signal sem8 and then exited. As a result scaler stucked in an infinite wait on this semaphore. After all I came up with a solution to this issue. If scaler is working in the "fork" mode, it is possible to wait for the connection to fire up for a while using the timed semset->wait() and then just terminate the buggy connection process. Please take a look on the diff in attachment. I would be very grateful for any comments on it. -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
From: David M. <dav...@fi...> - 2011-06-14 20:17:12
|
Hi Renat, Man, this is complicated :) I need to look at the patch carefully, but at first glance, it appears to solve the problem where the sqlr-connection starts up and either gets hung (maybe trying to log into the db) or dies before signalling on sem8. You're also right about it being an error that sem8 is greater than 0 when the scaler starts - incremented by the connections that were not started by the scaler. This is actually tricky to solve. sqlr-start probably shouldn't wait on them, because people sometimes use their own scripts to start the listener, connections and scaler. The scaler could wait on them when it starts up, but ideally you should be able to manually start up sqlr-connections whenever you want. It shouldn't be a rule that you have to start them before starting the scaler. The connections could only signal sem8 if they were started by the scaler, but what would be the right way for a connection to know it was started by the scaler? I'm open to suggestions here. Dave On 06/10/2011 09:57 AM, Renat Sabitov wrote: > Hi! > > Sqlrelay got stuck recently on one of our production servers. Symptoms > quite strange: there were 1 scaler, 1 connection and 17 listeners. I > checked the status and got these results: > > Open Server Connections: 1 > Opened Server Connections: 4016 > > Open Client Connections: 0 > Opened Client Connections: 510926 > > Open Server Cursors: 3 > Opened Server Cursors: 522974 > > Times New Cursor Used: 0 > Times Cursor Reused: 11604823 > > Total Queries: 6313112 > Total Errors: 4718 > > Forked Listeners: 16 > > Scaler's view: > Connections: 1 > Sessions: 18 > > Semaphores: > +---------------------------------------------+ > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | > +---+---+---+---+---+---+---+---+---+---+-----+ > | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 16 | > +---------------------------------------------+ > > Pay attention to semaphores. The normal state of them is > > +---------------------------------------------+ > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | > +---+---+---+---+---+---+---+---+---+---+-----+ > | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | > +---------------------------------------------+ > > There differences in semaphores 5,6,8 > > 5==0 and 6==1 means that listener got client request and signalled to > scaler to increase the number of connections if it is possible > (sqlrlistener::incrementSessionCount) > > Continious state of 6==1 signals that scaler stuck somewhere in the > cycle inside scaler::openMoreConnections(). > > 8==0 gives a clue that scaler is waiting for connection to fire up, > i.e. connects to DB and increment connection counter. > > By the way, it might be an error that the normal state of sem 8 is 1. > It is getting this state after the first non-scaled connection started > by sqlr-start and actually this semaphore changes state between 1 and > 2 instead of 0 and 1 while scaler is firing up new connections. It > might be a good idea to wait for sem8 in sqlr-start, isn't it? > > So why it is finally got state 0? I suppose it was the result of the > startup process of connection. Some connection processes started, but > got some errors and didn't signal sem8 and then exited. As a result > scaler stucked in an infinite wait on this semaphore. > > After all I came up with a solution to this issue. If scaler is > working in the "fork" mode, it is possible to wait for the connection > to fire up for a while using the timed semset->wait() and then just > terminate the buggy connection process. Please take a look on the diff > in attachment. I would be very grateful for any comments on it. > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > > _______________________________________________________ > Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting > http://www.doteasy.com > > > _______________________________________________ > Sqlrelay-discussion mailing list > Sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion > > > _______________________________________________________ > Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting > http://www.doteasy.com _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |
From: Renat S. <sr...@st...> - 2011-06-15 04:55:31
|
15.06.2011 00:16, David Muse wrote: > The > connections could only signal sem8 if they were started by the scaler, > but what would be the right way for a connection to know it was started > by the scaler? I'm open to suggestions here. I suppose it could be a new option, say "-scaled". :) -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
From: David M. <dav...@fi...> - 2011-06-20 21:00:12
|
Ahh, yes. Such an obvious solution :) Could you resend me your patch against the current CVS code? There are some differences vs. the code you patched against. I think mainly I made it so that it always uses fork(). Dave On 06/15/2011 12:55 AM, Renat Sabitov wrote: > 15.06.2011 00:16, David Muse wrote: >> The >> connections could only signal sem8 if they were started by the scaler, >> but what would be the right way for a connection to know it was started >> by the scaler? I'm open to suggestions here. > I suppose it could be a new option, say "-scaled". :) > _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |
From: Renat S. <sr...@st...> - 2011-06-23 07:21:36
|
23.06.2011 07:32, David Muse wrote: > Nevermind patching against current CVS, I got it figured out :) Thanks a lot for that, because we actually do not use the current CVS version. Do you plan to make a new release for sqlrelay? It is about 2 years from the 0.41, may be it is the time for step up? :) -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
From: David M. <dav...@fi...> - 2011-06-23 16:04:55
|
Yes, I've been working really hard for the past few days to get a release out. There's this issue, and a postgresql memory leak that need to be fixed, then it should be ready. Dave On 06/23/2011 03:21 AM, Renat Sabitov wrote: > 23.06.2011 07:32, David Muse wrote: >> Nevermind patching against current CVS, I got it figured out :) > Thanks a lot for that, because we actually do not use the current CVS > version. > > Do you plan to make a new release for sqlrelay? It is about 2 years from > the 0.41, may be it is the time for step up? :) > _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |
From: Renat S. <sr...@st...> - 2011-06-30 07:49:18
|
23.06.2011 07:32, David Muse пишет: > Nevermind patching against current CVS, I got it figured out :) Just a little correction scaler.C, 518 // try 3 times - in the first use SIGTERM and on the next 2 use SIGKILL Should be as follows: // try 3 times - in the first check whever it is already dead, then use SIGTERM and at last use SIGKILL -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
From: David M. <dav...@fi...> - 2011-06-30 16:07:59
|
Ok, I updated it. I'm going to start making the release now. Let me know if you see any last minute issues. Dave On 06/30/2011 03:49 AM, Renat Sabitov wrote: > 23.06.2011 07:32, David Muse пишет: >> Nevermind patching against current CVS, I got it figured out :) > Just a little correction > scaler.C, 518 > > // try 3 times - in the first use SIGTERM and on the next 2 use SIGKILL > > Should be as follows: > > // try 3 times - in the first check whever it is already dead, then use > SIGTERM and at last use SIGKILL > _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |
From: David M. <dav...@fi...> - 2011-06-23 03:33:22
|
Nevermind patching against current CVS, I got it figured out :) On 06/20/2011 04:59 PM, David Muse wrote: > Ahh, yes. Such an obvious solution :) > > Could you resend me your patch against the current CVS code? There > are some differences vs. the code you patched against. I think mainly > I made it so that it always uses fork(). > > Dave > > On 06/15/2011 12:55 AM, Renat Sabitov wrote: >> 15.06.2011 00:16, David Muse wrote: >>> The >>> connections could only signal sem8 if they were started by the scaler, >>> but what would be the right way for a connection to know it was started >>> by the scaler? I'm open to suggestions here. >> I suppose it could be a new option, say "-scaled". :) >> > _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |
From: David M. <dav...@fi...> - 2011-06-23 03:45:48
|
Now that I'm testing this patch, it appears to work, but I think it might introduce a race condition. If the new connection takes just slightly longer than 10 seconds to start up, the scaler's wait could fall through, then the connection could still signal on sem(8) before the scaler kills it, leaving sem(8) incremented, and nothing to wait on it. What do you think? Dave On 06/22/2011 11:32 PM, David Muse wrote: > Nevermind patching against current CVS, I got it figured out :) > > On 06/20/2011 04:59 PM, David Muse wrote: >> Ahh, yes. Such an obvious solution :) >> >> Could you resend me your patch against the current CVS code? There >> are some differences vs. the code you patched against. I think >> mainly I made it so that it always uses fork(). >> >> Dave >> >> On 06/15/2011 12:55 AM, Renat Sabitov wrote: >>> 15.06.2011 00:16, David Muse wrote: >>>> The >>>> connections could only signal sem8 if they were started by the scaler, >>>> but what would be the right way for a connection to know it was >>>> started >>>> by the scaler? I'm open to suggestions here. >>> I suppose it could be a new option, say "-scaled". :) >>> >> > _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |
From: Renat S. <sr...@st...> - 2011-06-23 06:58:14
|
Yes it could :(. Semaphores are such a thing one should stay away if it is possible. Could we just don't use sem(8)? It is used to signal the scaler that connection is started and current number of connections is incremented. Now with fork() scaler is aware of the connections spawned and actually do not have to wait to get the number of connections. If not, it is possible to wait for a small amount of time for sem(8) after connection process is killed to decrement sem(8). 23.06.2011 07:45, David Muse wrote: > Now that I'm testing this patch, it appears to work, but I think it > might introduce a race condition. > > If the new connection takes just slightly longer than 10 seconds to > start up, the scaler's wait could fall through, then the connection > could still signal on sem(8) before the scaler kills it, leaving sem(8) > incremented, and nothing to wait on it. > > What do you think? -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
From: David M. <dav...@fi...> - 2011-06-23 18:46:30
|
Yeah, I hate semaphores. They're great in theory, not so great in practice. I think resetting sem(8) to 0 before starting the connection will solve the problem. I'm not sure if it's the best solution, but it should take care of it. We might be able to do without sem(8), that's a fairly invasive change though. Maybe in the next release. Dave On 06/23/2011 02:58 AM, Renat Sabitov wrote: > Yes it could :(. Semaphores are such a thing one should stay away if it > is possible. Could we just don't use sem(8)? It is used to signal the > scaler that connection is started and current number of connections is > incremented. Now with fork() scaler is aware of the connections spawned > and actually do not have to wait to get the number of connections. > > If not, it is possible to wait for a small amount of time for sem(8) > after connection process is killed to decrement sem(8). > > 23.06.2011 07:45, David Muse wrote: >> Now that I'm testing this patch, it appears to work, but I think it >> might introduce a race condition. >> >> If the new connection takes just slightly longer than 10 seconds to >> start up, the scaler's wait could fall through, then the connection >> could still signal on sem(8) before the scaler kills it, leaving sem(8) >> incremented, and nothing to wait on it. >> >> What do you think? > _______________________________________________________ Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting http://www.doteasy.com |