sqlrelay-discussion Mailing List for SQL Relay (Page 13)
Brought to you by:
mused
You can subscribe to this list here.
| 2005 |
Jan
|
Feb
(20) |
Mar
(27) |
Apr
(17) |
May
(32) |
Jun
(45) |
Jul
(49) |
Aug
(68) |
Sep
(44) |
Oct
(29) |
Nov
(64) |
Dec
(25) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(61) |
Feb
(22) |
Mar
(25) |
Apr
(31) |
May
(18) |
Jun
(28) |
Jul
(19) |
Aug
(16) |
Sep
(8) |
Oct
(17) |
Nov
(32) |
Dec
(4) |
| 2007 |
Jan
(20) |
Feb
(25) |
Mar
(5) |
Apr
(12) |
May
(11) |
Jun
(18) |
Jul
(16) |
Aug
(22) |
Sep
(37) |
Oct
(20) |
Nov
(11) |
Dec
(2) |
| 2008 |
Jan
(11) |
Feb
(33) |
Mar
(12) |
Apr
(18) |
May
(22) |
Jun
(31) |
Jul
(23) |
Aug
(6) |
Sep
|
Oct
(10) |
Nov
(22) |
Dec
|
| 2009 |
Jan
(12) |
Feb
(8) |
Mar
(11) |
Apr
(20) |
May
(18) |
Jun
(7) |
Jul
(27) |
Aug
(2) |
Sep
(10) |
Oct
(5) |
Nov
(2) |
Dec
(1) |
| 2010 |
Jan
(11) |
Feb
(18) |
Mar
(10) |
Apr
(28) |
May
(28) |
Jun
|
Jul
(27) |
Aug
(9) |
Sep
(21) |
Oct
(2) |
Nov
(2) |
Dec
(11) |
| 2011 |
Jan
|
Feb
(2) |
Mar
(4) |
Apr
(2) |
May
(2) |
Jun
(44) |
Jul
(9) |
Aug
(2) |
Sep
(12) |
Oct
(7) |
Nov
(11) |
Dec
(7) |
| 2012 |
Jan
(5) |
Feb
|
Mar
(9) |
Apr
(9) |
May
(12) |
Jun
|
Jul
(13) |
Aug
(3) |
Sep
(3) |
Oct
(1) |
Nov
(1) |
Dec
(10) |
| 2013 |
Jan
(21) |
Feb
(3) |
Mar
(4) |
Apr
|
May
(3) |
Jun
(2) |
Jul
(3) |
Aug
(3) |
Sep
(3) |
Oct
|
Nov
|
Dec
(4) |
| 2014 |
Jan
(7) |
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2016 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2021 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Renat S. <sr...@st...> - 2010-07-13 04:37:30
|
12.07.2010 18:51, Carlos Vergara пишет: > Thank you Renat, I've been looking for a full list of fixes to the current version, this one is pretty good. > > The following files do not have global read access on your site, and I cannot download them: > > sqlrelay-0.41/15_connection_handle_signals.patch > rudiments-0.32/poll.FULL.patch.gz > I'm sorry for inconvenience. I've chenged permissions, please try again. I finally can reproduce the bug when sqlr-connection get SIGSEGV and going to make one more patch. In the same time I found that 15_connection_handle_signals.patch has to be modified because of blocking signals in connection init. Renat |
|
From: Carlos V. <cve...@em...> - 2010-07-12 15:03:56
|
Thank you Renat, I've been looking for a full list of fixes to the current version, this one is pretty good. The following files do not have global read access on your site, and I cannot download them: sqlrelay-0.41/15_connection_handle_signals.patch rudiments-0.32/poll.FULL.patch.gz CJ Vergara -----Original Message----- From: Renat Sabitov [mailto:sr...@st...] Sent: Sunday, July 11, 2010 10:25 AM To: sql...@li... Subject: [SPAM] - Re: [Sqlrelay-discussion] debugging sql-relay - Email found in subject 10.07.2010 02:22, sql...@ma... пишет: > The number of "Forked Listeners" might be the problem considering that number is usually 0. > This number represent count of client connections to sqlrelay waiting for free connection to DB. This means that all your 15 sessions to DB are busy. If it is so, you should check what they are doing and maybe increase maximum connection count in config. But be aware that usually sqlr-status shows information far from the reality. I recommend you to apply my patches for sqlrelay and rudiments (http://www.srr.pp.ru/www/sqlrelay/) and use -fork option (check description for patches in this list). ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Sqlrelay-discussion mailing list Sql...@li... https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion relay-discussion |
|
From: Cal H. <ca...@fb...> - 2010-07-12 14:43:49
|
This is an interesting coincidence, but I just had this same problem earlier this morning. We use big brother <http://bb4.com/> for our monitoring/paging, and I have a check setup to run a simple query on all of my instances. At 2:38am we rebooted one of the servers, the check seemed to hang at 2:39, and then recover at 2:40 after the machine was live again. Then mysteriously at 3:35am I got a page: Mon Jul 12 03:35:16 2010 relay Failed to authenticate. Mon Jul 12 03:35:16 2010 relay A network error may have ocurred. Mon Jul 12 03:35:16 2010 relay Rows Returned : 0 Mon Jul 12 03:35:16 2010 relay Fields Returned : 0 Mon Jul 12 03:35:16 2010 relay System time : 0 Now at 9:30am I still have 20 forked listeners running, and only 8 server connections. I have all of my config settings cranked way up... maxlisteners="300" listenertimeout="20" sessiontimeout="60" maxsessioncount="170" I haven't yet applied Renat's patches to my live systems, I only have my dynamic cursors patch installed. I was waiting for the "official" release with our two patches combined to see if David would catch any merge bugs between everything. (What's the status on that BTW?) Thanks, --Cal On Sun, Jul 11, 2010 at 10:25 AM, Renat Sabitov <sr...@st...> wrote: > 10.07.2010 02:22, sql...@ma... пишет: > > The number of "Forked Listeners" might be the problem considering that > number is usually 0. > > > This number represent count of client connections to sqlrelay waiting > for free connection to DB. This means that all your 15 sessions to DB > are busy. If it is so, you should check what they are doing and maybe > increase maximum connection count in config. > But be aware that usually sqlr-status shows information far from the > reality. I recommend you to apply my patches for sqlrelay and rudiments > (http://www.srr.pp.ru/www/sqlrelay/) and use -fork option (check > description for patches in this list). > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Sqlrelay-discussion mailing list > Sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion > |
|
From: Renat S. <sr...@st...> - 2010-07-11 15:20:52
|
10.07.2010 02:22, sql...@ma... пишет: > The number of "Forked Listeners" might be the problem considering that number is usually 0. > This number represent count of client connections to sqlrelay waiting for free connection to DB. This means that all your 15 sessions to DB are busy. If it is so, you should check what they are doing and maybe increase maximum connection count in config. But be aware that usually sqlr-status shows information far from the reality. I recommend you to apply my patches for sqlrelay and rudiments (http://www.srr.pp.ru/www/sqlrelay/) and use -fork option (check description for patches in this list). |
|
From: <sql...@ma...> - 2010-07-09 22:49:32
|
I have an xmlrpc app that uses sql-relay to access a mysql server each time a client connects to the xmlrpc server. In the course of a day, It receives approximately 500,000 requests and each request makes between 1 and 5 queries. sql-relay often dies (processes are still running but queries return nothing even though data should be returned). When this occurs I manually restart it. The most recent time, before restarting it I ran sqlr-status: Open Server Connections: 15 Opened Server Connections: 2089 Open Client Connections: 1 Opened Client Connections: 25857 Open Server Cursors: 45 Opened Server Cursors: 6267 Times New Cursor Used: 0 Times Cursor Reused: 64067 Total Queries: 63549 Total Errors: 624 Forked Listeners: 25 My question is, what is a good way of debugging sql-relay? I've put it in debug mode at times (connection and/or listener) and all it does is create thousands of files which don't seem to provide anything of use to me. The number of "Forked Listeners" might be the problem considering that number is usually 0. Thanks for any tips in debugging this issue, Phil |
|
From: Renat S. <sr...@st...> - 2010-07-09 09:37:13
|
Hello David and others. When signal is received (SIGALRM, SIGTERM or any other), sqlr-connection should delete pid file, disconnect from DB and modify statistics in shm to tell scaler and listener that it is about to die. I noticed that sometimes sqlr-connection dies without cleaning up, therefore I made a patches to rudiments and sqlrelay which do several things: 1) Signal handling setup was moved from main.C of particular connection to method of connection class 2) Connection now catch all signals but SIGKILL and SIGSEGV. SIGKILL is not able to be caught. SIGSEGV is not caught because usually we want to examine core dump in case of segmentation violation. 3) Daemon class in rudiments uses signal handler with int parameter (signal number) if it is possible 4) Connection exits with code 0 if there is no errors happenes or code 1 if error is detected 5) oracle connection has got extended signal handler, which exits with code 0 or 1 in case of expected or unexpected signals. 6) When scaler is started with option "-fork", it writes to stderr reason why connection was terminated, for example Connection (pid=17564) exited with code 1 Connection (pid=10898) terminated by signal 11 Patch to rudiments is named 02_signal_handling_int.patch Patch to sqlrelay is named 15_connection_handle_signals.patch After all, I found that sometimes connection catch signal 11 (SIGSEGV) with this debug ending: done getting command getting a cursor... found a free cursor: 1 done getting a cursor new query handling query... getting query... querylength: 106 query: select check_user_profile(1843) fin, check_user_profile(2502) tech, check_user_profile(4122) uni from dual getting query succeeded getting input binds... done getting input binds getting output binds... done getting output binds getting send column info... send column info done getting send column info... The ending is always the same, and this happens occasionly. Indeed, this log means that sqlr-connection dies somewhere in oracle8cursor::cleanUpData. Have anyone met this kind of problems? -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
|
From: Renat S. <sr...@st...> - 2010-05-28 04:52:43
|
28.05.2010 00:50, Cal Heldenbrand пишет: > I have my listenertimeout config setting to 5 seconds, and it appears > that the listener times out each time the client receives an error > back. (Changing this value changes how fast the client gets an error > message) Could that have something to do with it? My suggestion - forked listener starts handoff session to connection-db2, but there is no connection to take the session. So listener times out with signal SIGALRM and catch it in sqlrlistener::alarmHandler. It might be another good place for decrementing session count, but only after it was incremented in listener for sure. |
|
From: Cal H. <ca...@fb...> - 2010-05-27 20:51:00
|
Session counter is incremented in listener when client session is
> authorized. It is decremented when connection finishs processing client
> session's requests. What happens with you client session? Does it get an
> error or what? Does listener complete socket handoff procedure to
> client?
The client seems to handle the errors correctly. After the autoCommitOff()
returns false, sendQuery() fails with error message
"Failed to authenticate.
A network error may have ocurred."
When handoff exits with error, there might be the best place to
> decrement counter?
>
> ----
> if (dynamicscaling) {
> incrementSessionCount();
> }
> passstatus=handOffClient(clientsock);
> // addition:
> if (!passstatus) {
> decrementSessionCount(); // this function is
> not exist yet
> }
> ----
>
This is the strange part -- I added a print statement after handOffClient()
to see what the return status is, and it never prints. It is getting to it
though, authstatus is 1. I don't really understand how handOffClient()
works, but it must never return?
I have my listenertimeout config setting to 5 seconds, and it appears that
the listener times out each time the client receives an error back.
(Changing this value changes how fast the client gets an error message)
Could that have something to do with it?
Thanks,
--Cal
|
|
From: Renat S. <sr...@st...> - 2010-05-27 20:21:13
|
27.05.2010 21:28, Cal Heldenbrand пишет:
> Which seemed to work fine, but I'm not sure that's the correct place
> to decrement that counter.
>
> Any ideas on that?
>
Session counter is incremented in listener when client session is
authorized. It is decremented when connection finishs processing client
session's requests. What happens with you client session? Does it get an
error or what? Does listener complete socket handoff procedure to
client? When handoff exits with error, there might be the best place to
decrement counter?
----
if (dynamicscaling) {
incrementSessionCount();
}
passstatus=handOffClient(clientsock);
// addition:
if (!passstatus) {
decrementSessionCount(); // this function is
not exist yet
}
----
--
Renat
|
|
From: Renat S. <sr...@st...> - 2010-05-27 20:20:54
|
27.05.2010 21:28, Cal Heldenbrand пишет:
> Which seemed to work fine, but I'm not sure that's the correct place
> to decrement that counter.
>
> Any ideas on that?
>
Session counter is incremented in listener when client session is
authorized. It is decremented when connection finishs processing client
session's requests. What happens with you client session? Does it get an
error or what? Does listener complete socket handoff procedure to
connection? When handoff exits with error, there might be the best place
to decrement counter?
----
if (dynamicscaling) {
incrementSessionCount();
}
passstatus=handOffClient(clientsock);
// addition:
if (!passstatus) {
decrementSessionCount(); // this function is
not exist yet
}
----
--
Renat
|
|
From: Renat S. <sr...@st...> - 2010-05-27 19:26:53
|
27.05.2010 23:17, Renat Sabitov пишет: > Does listener complete socket handoff procedure to client? I'm sorry, of course "to connection", not "to client" |
|
From: Cal H. <ca...@fb...> - 2010-05-27 17:28:58
|
Well I think I like the last suggestion the best, since it's the easiest to
implement. ;-)
void scaler::incConnections()
{
if ( ! use_fork )
semset->wait(8);
if (use_fork) {
fprintf(stderr, "incConnections(): DO incr\n"); fflush(stderr);
this->currentconnections++;
}
}
This seems to work just the same, however I've noticed that the scalers
"Sessions" count doesn't decrease. Each failed connection causes the count
to increase by one.
I was playing around with something like:
if (! semset->waitWithUndo(8, 10, 0) )
{
// decrement session counter, I'm not sure if this should happen
here
shmdata *ptr=(shmdata *)idmemory->getPointer();
ptr->connectionsinuse--;
return;
}
Which seemed to work fine, but I'm not sure that's the correct place to
decrement that counter.
Any ideas on that?
Thanks Renat!
--Cal
On Thu, May 27, 2010 at 12:13 PM, Renat Sabitov <sr...@st...> wrote:
> 27.05.2010 20:04, Cal Heldenbrand пишет:
>
> #3 0x000000000040486c in scaler::incConnections (this=0x5061f0) at
> scaler.C:502
> 502 if (! semset->wait(8) )
>
> Ok, now I see. Scaler waits for new connection to report that it started.
> So if connection hangs (or exit) somewhere before calling signal(8), scaler
> will never stop waiting.
>
>
> void scaler::incConnections()
> {
> /* wait for the connection count to increase. Time out at 10 seconds.
> * Since the login timeout is 5 seconds, this gives a bit of buffer
> time
> */
> if (! semset->waitWithUndo(8, 10, 0) )
> return;
>
> I think that the best way is to move incrementing connection counter
> somewhere before interacting with db, just after conenction process starts.
> May be in init. Or even get rid of semaphore #8 at all in case of -fork,
> because scaler already knows if connection starts or not (it got PID).
>
> --
> Renat
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Sqlrelay-discussion mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion
>
>
|
|
From: Renat S. <sr...@st...> - 2010-05-27 17:10:47
|
27.05.2010 20:04, Cal Heldenbrand ?????:
> #3 0x000000000040486c in scaler::incConnections (this=0x5061f0) at
> scaler.C:502
> 502 if (! semset->wait(8) )
>
Ok, now I see. Scaler waits for new connection to report that it
started. So if connection hangs (or exit) somewhere before calling
signal(8), scaler will never stop waiting.
> void scaler::incConnections()
> {
> /* wait for the connection count to increase. Time out at 10 seconds.
> * Since the login timeout is 5 seconds, this gives a bit of
> buffer time
> */
> if (! semset->waitWithUndo(8, 10, 0) )
> return;
>
I think that the best way is to move incrementing connection counter
somewhere before interacting with db, just after conenction process
starts. May be in init. Or even get rid of semaphore #8 at all in case
of -fork, because scaler already knows if connection starts or not (it
got PID).
--
Renat
|
|
From: Cal H. <ca...@fb...> - 2010-05-27 16:04:42
|
Thanks for the info Renat, this helps me a lot. All of these signal
interactions are pretty confusing the first time. Also I should add that
the first operation works correctly, it's the second call that hangs
indefinitely.
-------------------------
sqlrconnection *conn = new sqlrconnection(...);
conn->autoCommitOff(); // hangs for 10 seconds, then returns false
sqlrcursor *cur = new sqlrcursor(conn);
cur->sendQuery(...); // hangs forever here
-------------------------
In addition, I can break out of my client app and fire it up a second time,
then it will hang forever on the first autoCommitOff() call. So, it seems
like the first operation is successful, but everything after that, even new
forked listeners are waiting on a series of semaphore blocks.
I've traced down the operation so far to this:
---------------
db2connection.C db2connection::logIn()
The SQL_LOGIN_TIMEOUT attr setting to 5 seconds, I believe causes this to
fail correctly at the SQLConnect() call. This function returns false.
---------------
initconnection.C
I have reloginatstart="no" since in my case, if a server is dead I want the
client to go into a failure mode. (I have my own methods on the client of
either picking a different server, or showing a "we'll be back soon"
message)
at around line 102 in initConnection(), attemptLogIn() returns false, which
causes initConnection() to return false.
---------------
---------------
connections/db2/main.C
since initConnection() returns false, the db2 connection proc does an
_exit(1);
---------------
Meanwhile scaler.C openMoreConnections() has done openOneConnection() which
returned successfully, since it's just checking the success of doing the
fork() call in the parent. It then goes into incConnections() where it
waits on semaphore 8.
Since initConnections() has returned before doing
incrementConnectionCount(), semaphore 8 is never signaled, which appears to
cause the scaler to wait inside incConnections() Since it's waiting there,
it will never start up any more connections after that, and we have a
downward spiral of clients and locked up listener processes.
---------------
Here's the scaler gdb run, with a few of my debugging statements added.
(gdb) r
Starting program: /usr/local/firstworks/bin/sqlr-scaler -id openport -debug
-fork -config /usr/local/firstworks/etc/sqlrelay.conf
openMoreConnections(): connections: 0
openMoreConnections(): sessions: 1
openMoreConnections(): grow loop: i=0
openMoreConnections(): start while loop
scaler::openOneConnection_fork(): doing fork with command:
sqlr-connection-db2 -silent -nodetatch -ttl 60 -id openport -connectionid
dev -config /usr/local/firstworks/etc/sqlrelay.conf -debug
scaler: forked pid 20163
openMoreConnections(): after openOneConnection() success=1
incrConnections() start
db2 main.C call initConnection()
Debugging to: /usr/local/firstworks/var/sqlrelay/debug/sqlr-connection.20163
db2connection::logIn() start connect
db2connection::logIn() error connect, return false
sqlrconnection_svr::initConnection(): attemptLogIn() fail
db2 main.C: connect fail, _exit(1)
Debugging to: /usr/local/firstworks/var/sqlrelay/debug/sqlr-listener.20433
listener: waiting for scaler
(hangs here, I did a ctrl C)
Program received signal SIGINT, Interrupt.
0x00000035058c83c9 in semop () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x00000035058c83c9 in semop () from /lib64/tls/libc.so.6
#1 0x0000002a956eeb9f in rudiments::semaphoreset::semOp ()
from /usr/local/firstworks/lib/librudiments-0.32.so.1
#2 0x0000002a956ee03c in rudiments::semaphoreset::wait ()
from /usr/local/firstworks/lib/librudiments-0.32.so.1
#3 0x000000000040486c in scaler::incConnections (this=0x5061f0) at
scaler.C:502
#4 0x0000000000404650 in scaler::openMoreConnections (this=0x5061f0) at
scaler.C:449
#5 0x00000000004049bf in scaler::loop (this=0x5061f0) at scaler.C:544
#6 0x0000000000404ba8 in main (argc=7, argv=0x7fbffff658) at main.C:26
(gdb) frame 3
#3 0x000000000040486c in scaler::incConnections (this=0x5061f0) at
scaler.C:502
502 if (! semset->wait(8) )
---------------
So it seems that the connection fails out, but the scaler just keeps waiting
for the connection proc to increment.
In looking at the rudiments API for semaphores, what if I did a
semset->waitWithUndo() instead?
It looks like that might have solved it. Here's what my
scaler::incConnections() looks like, cleaned up.
void scaler::incConnections()
{
/* wait for the connection count to increase. Time out at 10 seconds.
* Since the login timeout is 5 seconds, this gives a bit of buffer time
*/
if (! semset->waitWithUndo(8, 10, 0) )
return;
if (use_fork) {
this->currentconnections++;
}
}
I'm not sure if this change would cause other bugs though. I'm going to do
some more testing to see how this works, and I might email a patch in later
if I make any other changes.
Let me know if this is the wrong way to solve this.
Thanks!
--Cal
On Thu, May 27, 2010 at 2:21 AM, Renat Sabitov <sr...@st...> wrote:
> Hi Cal,
>
> I don't really understand what happens in your case, but have some ideas.
>
> > "waiting for the scaler..." which is from sqlrlistener.C around line
> > 1285. It hangs at that point until I manually kill the listener
> > process. I've been trying to study what is happening here between the
> > listener and scaler, but haven't determined anything so far.
>
> After this message listener waits for scaler to signal the semaphore
> number 7. You can see this with strace or looking at backtrace in gdb.
>
> Try to run command like this against sqlr-listener (here i did it for
> sqlr-scaler, you can see that it waits for semathore #6):
>
> $ sudo -u sqlrelay strace -p 2201
> Process 2201 attached - interrupt to quit
> semop(294921, {{6, -1, 0}}, 1
>
> Scaler always waits for signal 6 to start the procedure of firing up new
> connections. Then it counts sessions and connections and signals
> listener to keep going with signal 7.
>
> I believe that listener could freeze in this point if there is no scaler
> at all or if the semaphore #4 is acquired by any other process and
> scaler can't aquire it.
>
> You could examine the semaphore state with patched sqlr-status, if the
> value is 1 - then it's free for acquiring, 0 - already acquired.
>
> You could try "-fork" option to sqlr-start, in this case scaler doesn't
> use connection counter in shared memory and so doesn't use semaphore #4.
>
> Or you could just remove acquiring and releasing semaphore #4 from
> scaler::countConnections() because there is no need to serialize access
> to reading one value - who cares if some process write another value a
> bit earlier or later.
>
> But I don't really think that the problem is in the semaphores. You
> should examine the state of processes with strace and gdb first.
>
> --
> Renat Sabitov e-mail: sr...@st...
> Stack Soft jid: sr...@ja...
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Sqlrelay-discussion mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion
>
|
|
From: Renat S. <sr...@st...> - 2010-05-27 07:21:13
|
Hi Cal,
I don't really understand what happens in your case, but have some ideas.
> "waiting for the scaler..." which is from sqlrlistener.C around line
> 1285. It hangs at that point until I manually kill the listener
> process. I've been trying to study what is happening here between the
> listener and scaler, but haven't determined anything so far.
After this message listener waits for scaler to signal the semaphore
number 7. You can see this with strace or looking at backtrace in gdb.
Try to run command like this against sqlr-listener (here i did it for
sqlr-scaler, you can see that it waits for semathore #6):
$ sudo -u sqlrelay strace -p 2201
Process 2201 attached - interrupt to quit
semop(294921, {{6, -1, 0}}, 1
Scaler always waits for signal 6 to start the procedure of firing up new
connections. Then it counts sessions and connections and signals
listener to keep going with signal 7.
I believe that listener could freeze in this point if there is no scaler
at all or if the semaphore #4 is acquired by any other process and
scaler can't aquire it.
You could examine the semaphore state with patched sqlr-status, if the
value is 1 - then it's free for acquiring, 0 - already acquired.
You could try "-fork" option to sqlr-start, in this case scaler doesn't
use connection counter in shared memory and so doesn't use semaphore #4.
Or you could just remove acquiring and releasing semaphore #4 from
scaler::countConnections() because there is no need to serialize access
to reading one value - who cares if some process write another value a
bit earlier or later.
But I don't really think that the problem is in the semaphores. You
should examine the state of processes with strace and gdb first.
--
Renat Sabitov e-mail: sr...@st...
Stack Soft jid: sr...@ja...
|
|
From: Cal H. <ca...@fb...> - 2010-05-26 20:04:22
|
David / Renat, I was wondering if you might know the answer to this. I have a special fail condition with DB2 that isn't being handled correctly with relay. On some rare cases, if DB2 is overloaded or crashes in a mysterious way, the port is left open but unresponsive after the connection is setup. This is different than the typical connection timeout, because the connection is made quickly, it just hangs on authentication indefinitely. You can recreate this by setting up a DB2 catalog to point at some open service that doesn't respond. I setup one pointing to a running apache instance and it recreated my scenario perfectly. (Even connecting with the db2 command line utility hangs forever, and you can't break out or even kill the process) I've also looked around for DB2CLI.INI timeout settings for this, and there isn't any help. Only connection timeout settings exist. I've also applied all of Renat's patches, and still have the same problem. The last entry I see in the listener debug log is: "waiting for the scaler..." which is from sqlrlistener.C around line 1285. It hangs at that point until I manually kill the listener process. I've been trying to study what is happening here between the listener and scaler, but haven't determined anything so far. Let me know if you have any ideas. Thanks! --Cal |
|
From: David M. <dav...@fi...> - 2010-05-25 01:21:56
|
To Cal, Renat and others,
I've been out of town for a week, but I'm back and going to collect up
the latest patches, incorporate them into CVS and begin testing for a
release. There are still some items on the TODO list that I wanted to
get done, but all of these recent patches definitely warrant a release.
Thanks guys, these are some good fixes and features.
Dave
dav...@fi...
_______________________________________________________
Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting
http://www.doteasy.com
|
|
From: Cal H. <ca...@fb...> - 2010-05-24 15:35:23
|
My first attempt at this email was caught by the spam filter due to size. Here's some photo bucket links to the graphs. ---------------------------------------------------------------------------------------------------- Just an update -- I've been using this patch on my production systems for about a day now. Seems to be functional, and is a big memory reduction. I also set my initial connections to 0 with a 60 second TTL and a maxsessioncount of 170 so they shut down around every 5 minutes. Before my changes, each connection proc consumed 50-100MB of memory. After this patch and the config change, that dropped to 15-30MB. I believe the maxsessioncount helped a lot as well. For some reason, our page delivery times would steadily increase over the course of 2 hours. My only guess is that there is some kind of leak? It seems that setting the maxessioncount fixed the problem. Here are a few Cacti graphs I use to monitor everything. This is the distribution of average page delivery times. (APD) I switched everything over to SQL Relay right at 9:00am, and did the config change for maxsessioncount at 11:20 and restarted both of my relay servers. You can see that there is a nice steady, linear increase of delivery times, but no other statistic that I monitor shows the same relationship. http://i1001.photobucket.com/albums/af138/cal_heldenbrand/Average_Page_Delivery_Distribution.png During the same time frame the number of server procs increase, but doesn't really follow the same relationship as APD. My only conclusion here is that it's not based on the number of concurrent clients or servers. (I slowly switched over to relay, one server at a time over the course of 40 minutes) http://i1001.photobucket.com/albums/af138/cal_heldenbrand/Connections.png Here are the number of cursors used during that time period. I know it's crazy, but still doens't follow any correlation to delivery times. Note that this is with my patch. Without the dynamic cursors, I couldn't run this for more than 10 minutes before crushing the servers. http://i1001.photobucket.com/albums/af138/cal_heldenbrand/Cursors.png To also show that the amount of traffic isn't responsible, here are the number of queries per second over that timeframe. http://i1001.photobucket.com/albums/af138/cal_heldenbrand/Queries.png So my only conclusion is that there is something in the relay server connection daemon that slowly gets more inefficient under a good amount of load. maxsessioncount is certainly a band-aid for the problem, but it might be something to look into. Has anyone else experienced something like this? Thanks, --Cal On Tue, May 18, 2010 at 6:05 PM, Cal Heldenbrand <ca...@fb...> wrote: > Hi everyone, > > Here's my patch against SQL Relay release 0.41 to implement dynamic > cursors. This will allow you to start up connections with a small number of > cursors, then grow them as needed until a defined maximum is reached. This > is handy in the case of a few pages that might go crazy with the number of > cursors needed. Sometimes it's difficult to track down where the leak is, > so it's nice to have the server take care of this for you. (Without needing > the memory bloat of many cursors across all connections) > > I added 2 new config file parameters to the *instance* tag: > > *maxcursors*: limit the maximum number of cursors to this number. > Defaults to 1300. I'm not sure what other databases are like, but DB2 has a > magic limit of 1326 statement handles. > > *cursors_growby*: When we need to allocate more cursors, add on a group > of this many at a time. (Avoids many realloc() conditions under heavy use) > Defaults to 5. > > The *cursors* parameter still behaves as usual, it starts up X number of > initial cursors per connection. > > I did add one tiny bug fix to this as well -- it seems that the "times new > cursor used" stat wasn't updated. I modified the behavior to increment the > counter when the *client* requests a new cursor. (Even if the server has > already allocated one) > > Please let me know if you find any bugs with this patch. > > Thanks, > > --Cal > |
|
From: Carlos V. <cve...@em...> - 2010-05-21 15:50:17
|
It seems there have been a number of significant bug fixes & patches rollout for 0.41 that are critical for a stable production environment implementation. Is there any chance of getting these all rolled together into a 0.42 release? CJ Vergara -----Original Message----- From: Renat Sabitov [mailto:sr...@st...] Sent: Wednesday, May 19, 2010 4:58 AM To: Discussion of topics related to SQL Relay; David Muse Subject: [SPAM] - Re: [Sqlrelay-discussion] a couple of patches - one more - Email found in subject 14.05.2010 17:34, Renat Sabitov пишет: > 09_sessioncount_correction.patch - prevents sessioncount to go below > zero Unfortunately this patch does not work properly. I found, that testing connectionsinuse and totalconnections to go below zero make no sense since they are unsigned and decremeting zero value produces UINT32_MAX. But in other parts of program this values are cast to signed int, wich goes below zero because of overloading. So I made a patch which changed type of them to signed int. I hope you'll find it useful. -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
|
From: Renat S. <sr...@st...> - 2010-05-19 09:58:14
|
14.05.2010 17:34, Renat Sabitov пишет: > 09_sessioncount_correction.patch - prevents sessioncount to go below zero Unfortunately this patch does not work properly. I found, that testing connectionsinuse and totalconnections to go below zero make no sense since they are unsigned and decremeting zero value produces UINT32_MAX. But in other parts of program this values are cast to signed int, wich goes below zero because of overloading. So I made a patch which changed type of them to signed int. I hope you'll find it useful. -- Renat Sabitov e-mail: sr...@st... Stack Soft jid: sr...@ja... |
|
From: Renat S. <sr...@st...> - 2010-05-19 05:17:35
|
19.05.2010 01:03, Cal Heldenbrand пишет:
> I can't seem to find the error in my console history... it had to do
> with this line:
>
> for (int i=0; i< SEM_COUNT; i++) {
> sem[i] = conn->semset->getValue(i);
> }
>
> Saying that "semset" is a private data member of that object. I've
> already rolled back my source code, but I can try it later after I'm
> done with the cursor stuff.
Ok, now I see. semset became public since 03_statistics_mutex.patch, and
not in rudiments but in sqlrelay-0.41/src/connection/sqlrconnection.h
I needed semaphores in public to control access to statistics shm.
--
Renat
|
|
From: Cal H. <ca...@fb...> - 2010-05-18 23:06:13
|
Hi everyone, Here's my patch against SQL Relay release 0.41 to implement dynamic cursors. This will allow you to start up connections with a small number of cursors, then grow them as needed until a defined maximum is reached. This is handy in the case of a few pages that might go crazy with the number of cursors needed. Sometimes it's difficult to track down where the leak is, so it's nice to have the server take care of this for you. (Without needing the memory bloat of many cursors across all connections) I added 2 new config file parameters to the *instance* tag: *maxcursors*: limit the maximum number of cursors to this number. Defaults to 1300. I'm not sure what other databases are like, but DB2 has a magic limit of 1326 statement handles. *cursors_growby*: When we need to allocate more cursors, add on a group of this many at a time. (Avoids many realloc() conditions under heavy use) Defaults to 5. The *cursors* parameter still behaves as usual, it starts up X number of initial cursors per connection. I did add one tiny bug fix to this as well -- it seems that the "times new cursor used" stat wasn't updated. I modified the behavior to increment the counter when the *client* requests a new cursor. (Even if the server has already allocated one) Please let me know if you find any bugs with this patch. Thanks, --Cal |
|
From: Cal H. <ca...@fb...> - 2010-05-18 21:04:24
|
On Tue, May 18, 2010 at 3:40 PM, Renat Sabitov <sr...@st...> wrote:
> 18.05.2010 23:29, Cal Heldenbrand пишет:
> > Oh ok, I picked just the status_more_info patch and applied that, I
> > had compilation errors with trying to access a private member in
> > rudiments.
>
> Hm. Neither this patch nor others require private members in rudiments
> (I use 0.32 version). I'll try to apply my patches against the clean
> sqlrelay-0.41 tomorrow and test it for compiling.
>
I can't seem to find the error in my console history... it had to do with
this line:
for (int i=0; i< SEM_COUNT; i++) {
sem[i] = conn->semset->getValue(i);
}
Saying that "semset" is a private data member of that object. I've already
rolled back my source code, but I can try it later after I'm done with the
cursor stuff.
> > Before I go any futher though, has anyone already implemented this?
> > Or do you guys foresee any problems with what I'm trying to do?
> No, I suppose nobody have tried to do this. As for me, in our
> application we use no more than 2 cursors in a moment.
>
Yeah, we usually have at most 5 statements allocated, but it's difficult to
force programmers to close stuff manually in a garbage collected
environment. As a result, we have some pages that go crazy and allocate
hundreds, sometimes thousands of cursor objects. The number of lines of
high level web code is so large, that it's just easier to handle this
management in lower layers.
I'm almost done with the patch, just a few things to clean up and I'll
attach it to an email to the list.
Thanks,
--Cal
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Sqlrelay-discussion mailing list
> Sql...@li...
> https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion
>
|
|
From: Renat S. <sr...@st...> - 2010-05-18 20:37:45
|
18.05.2010 23:29, Cal Heldenbrand пишет: > Oh ok, I picked just the status_more_info patch and applied that, I > had compilation errors with trying to access a private member in > rudiments. Hm. Neither this patch nor others require private members in rudiments (I use 0.32 version). I'll try to apply my patches against the clean sqlrelay-0.41 tomorrow and test it for compiling. > Before I go any futher though, has anyone already implemented this? > Or do you guys foresee any problems with what I'm trying to do? No, I suppose nobody have tried to do this. As for me, in our application we use no more than 2 cursors in a moment. |
|
From: Cal H. <ca...@fb...> - 2010-05-18 19:30:24
|
Oh ok, I picked just the status_more_info patch and applied that, I had compilation errors with trying to access a private member in rudiments. I'm also working up a patch of my own, to do dynamically allocated cursors. I tried to compile the CVS version of relay, but realized it needed a new function in the CVS version of rudiments, which wouldn't compile. I gave up on that venture. A few dusty corners of my web application consume a lot of statement handles, so as a result I needed to crank up my cursors to 1320 in the config file. Each server side connection consumed around 50 - 100MB of memory. Multiply that by around 200 concurrent connections on a machine with 16GB of memory results in bad behavior. By dropping the initial cursors array to 10, each connection only consumes around 14MB of memory. I have an initial test running now, using malloc() and realloc() to resize the sqlrcursor_svr array and it seems to be operating correctly. Before I go any futher though, has anyone already implemented this? Or do you guys foresee any problems with what I'm trying to do? Thanks, --Cal On Tue, May 18, 2010 at 2:19 PM, Renat Sabitov <sr...@st...> wrote: > 18.05.2010 19:24, Cal Heldenbrand пишет: > > Are your patches against the 0.41 release, or the CVS code? I'd like to > give a few of these a try. > > They are against the 0.41, but some of them depends on others so you > can't just apply the latest to the original 0.41 code. > > I have posted all patches to this list, but it might be better to get them > from this location: > > http://www.srr.pp.ru/www/sqlrelay/ > > Be aware that 05_exit_on_broken_socket.patch requires patch to rudiments ( > poll.FULL.patch.gz<http://www.srr.pp.ru/www/sqlrelay/rudiments-0.32/poll.FULL.patch.gz> > ) > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Sqlrelay-discussion mailing list > Sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion > > |