Hi Dave,

Sorry it has taken so long to get back on this. I haven't been able to reproduce the problem reliably until now.

Since last time, I've switched to Oracle 11g database. I noticed if Oracle has quite a low maximum number of processes (150) and if I configure too many max connections in SQL Relay, this same problem happens.

If you want to reproduce it, you could deliberately set the processes in Oracle quite low (alter system set processes = N) and then configure SQL Relay to use max connections > N. My application is a FUSE-based filesystem that I implemented, which keeps the filesystem metadata in Oracle by using SQL Relay. When I run some stress tests (fs_racer from Linux Test Project package does this nicely when I run it for few minutes), I can easily get SQL Relay to reach the max numbers.

What I'm seeing (with sqlrelay-0.50) is that after Oracle has reach the max number of processes,, the number of sqlr-listeners grows and doesn't back down (until I kill my application). You can see when Oracle has encountered the error from the alter log:

# pwd

# tail alert_UNOFSDEV.log
Tue Mar 05 11:16:24 2013
ORA-00020: maximum number of processes (150) exceeded
 ORA-20 errors will not be written to the alert log for
 the next minute. Please look at trace files to see all
 the ORA-20 errors.
Tue Mar 05 11:16:39 2013
Process m000 submission failed with error = 20

I'm not sure what exactly happens in sqlr-listener or sqlr-connection when they get ORA-00020 error (or if they even receive the error), I never see it in my application. From my point of view, my application just hangs when it tries to connect.

The only fix I've found so far is to avoid ORA-00020 like plague by making sure the max SQL Relay connections never reaches the max Oracle processes. You also need to note that Oracle has a set of its own server processes that are included in this total (35 in my case), so if your Oracle has max 150 processes, you actually have 115 to use for the SQL Relay connections. I got my test working by configuring max 25 connections from 4 test machines (4 * 25 + 35 = 135).

I just thought I'd report this in case anyone else has been struggling with similar issues. Here is the SQL Relay configuration I'm using:

    <instance id="unofs" port="9011" socket="/var/sqlrelay/tmp/sockets/unofs.socket" dbase="oracle8" connections="5" maxconnections="25" maxqueuelength="0" growby="1" ttl="60" endofsession="rollback" sessiontimeout="600" runasuser="w3secure" runasgroup="systems" cursors="5" authtier="listener" handoff="pass" debug="none">
            <user user="unofs" password="secret"/>
            <connection connectionid="unofsdev" string="user=USER;password=PWD;oracle_sid=UNOFSDEV;oracle_home=/usr/lib64/oracle/" metric="1"/>

If anyone has any experiences with ORA-00020 and SQL Relay, and how to configure SQL Relay to cope with it, I'd be very grateful to learn more.


On 26 September 2012 13:38, David Muse <david.muse@firstworks.com> wrote:
Hi Ville,

Sorry for not responding earlier.

SQL Relay should automatically re-log in if it loses connection to the
database.  I recently added a parameter to disable that behavior, but it
should do it by default.

It sounds like the sqlr-connection-mysql process is crashing for some
reason but the semaphores are in such a state that the sqlr-scaler
doesn't realize this and doesn't spawn any new processes.

I've actually tried to reproduce this and gone through the code, looking
for some semi-obvious thing that could cause it, but I can't find
anything.  My guess is that there is some particular query that crashes
the process.

The last time I encountered a problem like this, it was with oracle and
the problem was that I misinterpreted characters for bytes and if OCI
said "allocate 100 characters", I'd allocate a buffer of 100 bytes,
which was fine until somebody started using unicode and the query
returned multi-byte characters.  Something like that.  I looked again
for that kind of thing, but I didn't see anything.

Nevertheless, I suspect something like that is the problem.  It would be
helpful if you could find the query or condition that's causing the
connections to crash, though I imagine that would be like looking for a
needle in a haystack.

I'll be out of town for the next few days, but I can look into it again
when I get back.


On 09/25/2012 06:00 AM, Ville Silventoinen wrote:
> Hi,
> Just few more details on this:
> - I run SQL Relay 0.46 with two different instances configured in
> sqlrelay.conf; one for Oracle and one for MySQL.
> - When the problem happens, it's always with the MySQL connections. All
> the sqlr-connection-mysql processes have stopped/crashed and there are
> several sqlr-listener processes for the MySQL instance. They seem to be
> unable to start any sqlr-connection-mysql processes.
> - Just guesswork, but could the problem be caused if for whatever reason
> SQL Relay fails to login to MySQL? Would that stop the
> sqlr-connection-mysql process? I've asked the MySQL admin, but we don't
> log the failed login attempts. I could turn on the connection debug mode
> in SQL Relay, but last time we did that, it filled the /var filesystem
> very fast, so I'd need to find some disk space for that.
> - I'm waiting when the problem happens next time so I can run the
> "sqlr-status -id INSTANCE" command to see what the raw semaphore numbers
> are. I saw an old thread that seems quite similar to this, but I'm not
> sure if it's related.
> Cheers,
> Ville
> On 4 September 2012 12:15, Ville Silventoinen
> <ville.silventoinen@gmail.com <mailto:ville.silventoinen@gmail.com>> wrote:
>     Hi,
>     I'm using SQL Relay 0.45 on RedHat 6. Sometimes SQL Relay goes into
>     a state where it starts several sqlr-listener processes, but there
>     are no sqlr-connection processes. The application that uses SQL
>     Relay hangs. There may be a problem with RedHat and the network card
>     we are using, which can cause temporary loss of network
>     connectivity. The only cure that I've found is to restart SQL Relay.
>     Today I decided to try SQL Relay 0.46. As I was running tests, our
>     DBAs shutdown the test Oracle database (I wasn't aware of this).
>     Again, the same situation happened: Several sqlr-listener processes
>     and no connections.
>     I'm looking for some insight into how I could configure SQL Relay so
>     it would automatically recover from the temporary loss of network
>     connectivity or database availability. Is this possible? Would it
>     help if I set maxlisteners to 0, listenertimeout to 60, or tweak
>     some other configuration variable?
>     Thanks for any advice,
>     Ville
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________________
> Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting
>                http://www.doteasy.com
> _______________________________________________
> Sqlrelay-discussion mailing list
> Sqlrelay-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sqlrelay-discussion
> _______________________________________________________
> Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting
>                http://www.doteasy.com

Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Sqlrelay-discussion mailing list