I didn't realize this when I originally implemented
named pipes in jTDS, but there is one error condition
that should be handled differently, apparently through
some kind of back-off/retry algorithm. (Anyone
remember 10base2 network cabling?) It is documented by
this MSDN library article about implementing a named
pipe client in Windows:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ipc/base/named_pipe_client.asp
Basically, when the "All pipe instances are busy" error
is received, the jTDS driver should be doing the
equivalent of the WaitNamedPipe() function (whatever
that is; it will have to be determined).
A development team that I work with has been able to
reproduce the "All pipe instances are busy" error if
more than one person tries to access the same server
(not just the same database on the same server) at the
same time (or within a certain period of time). Note
that we've only reproduced it against SQL Server 6.5
thus far.
In theory, this could be easily tested by running the
jTDS test suite from two different computers, both
configured to hit the same database server using named
pipes.
Logged In: YES
user_id=641437
David,
I assume that you are using network pipes rather than local
pipes? You may find the following link interesting although
the solution described here (increasing the number of pipes
the server listens on) is difficult to implement with jTDS
at present as the pipe name is fixed
http://support.microsoft.com/default.aspx?scid=KB;EN-US;165189
I dont think JCIFS implements WaitNamedPipe. I guess
trapping the All pipe instances are busy error from
SmbNamedPipe, sleeping for a while and retrying a limited
number of times is the only option.
The real WaitNamedPipe receives a notification from the
server when the pipe becomes available which terminates the
wait state quicker than is possible using the simple
approach suggested above. In practice I doubt this would be
much of an issue with the jTDS application if the sleep time
is short. Of course to fit with your 10 Base-2 analogy, you
should use a random time out : -)
Mike.
Logged In: YES
user_id=84089
Mike,
Thanks for the pointer to the article! I've been wanting to
implement a mechanism to change the named pipe path;
depending on how we solve this issue, I may do that. (I'll
post an RFE and a patch for review.)
However, from that MS article, it looks like jTDS must
implement a back-off algorithm for retrying. (The MS
article said their code used a delay between 200 ms and 1000
ms, although they didn't say if they randomized it or not.)
I'll probably try to implement this as well since we want
to continue using jTDS at work.
Finally, note that in my testing today, I saw the "All pipe
instances are busy" error when using both the "local Windows
filesystem" and the "jCIFS" methods of communication.
To reproduce the error, though, I had to have two DIFFERENT
computers try to access the SQL Server 6.5 server at the
same time. I tried creating a unit test that created 100
threads and ran them all concurrently. Each thread would
try to connect, then disconnect from the same database as
fast as it could. You'll have to run this on one computer,
then attempt to connect to the same database server from a
different computer to reproduce the failures.
Dave
Unit test that creates 100 concurrent threads to connect to a database
Stack trace when using SharedNamedPipe with jTDS-1.1-cvs
Stack trace when using SharedLocalNamedPipe with jTDS-1.1-cvs
Logged In: YES
user_id=84089
Attaching a sample patch to fix the "All pipe instances are
busy" exception by implementing a retry mechanism. Comments
and feedback are welcome. I do NOT expect this patch to be
the final fix. (I left some debugging output in it.)
In testing, the most retries I've seen are 12. Most of the
time, only 1 or 2 retries are needed to establish a connection.
Logged In: YES
user_id=641437
David,
I was able to use your test case to replicate the error and
also to show that your patch works fine. I also tested
against a SQL 7.0 server although with only two clients I
only got the pipe busy error once. I guess the newer servers
have more than one thread listening by default. I tested
both the network pipes and the local pipes options.
I did get up to 20 retries on one PC but that was running on
a wireless LAN while the other client was connected at 100Mbs.
Looks good to go to me.
Given that SQL 6.5 officially went out of support 1/1/2002
it is surprising how many people are still using it. A case
of if its not broke dont fix it I suppose.
Mike.
Logged In: YES
user_id=84089
Mike,
It's not so much if-it's-not-broke-don't-fix-it as it is
oops-we-didn't-realize-the-product-is-no-longer-supported-
and-now-we-have-to-deal-with-it-until-we-have-the-resources-
to-upgrade.
I've uploaded my "final" patch. However, after reviewing my
code, I'm wondering if the hard-coded "retryTimeout"
variable (which is set to 20 seconds) shouldn't be replaced
with the "loginTimeout" value. This would force users to
set this parameter (since it defaults to 0) if they're using
named pipes and experience the "All pipe instances are busy"
error message.
If I leave the "retryTimeout" in the driver would "just
work" in most instances, but some users wouldn't understand
why performance is degraded with named pipes, and I would be
covering up the "All pipe instances are busy" error message
for them. If they had timeouts that always were within one
second of a 20-second timeout, they would see the failures
as random and not understand why they were happening.
Comments?
Dave
Logged In: YES
user_id=641437
David,
I agree that it is generally a good idea to avoid hard coded
constants but I wonder if the use of the loginTimeout
property isnt a bit counter intuitive. What I mean is that
a loginTimeout value of zero usually means there is no login
timeout but in this case any pipe in use error will cause
an immediate failure.
Perhaps the best compromise is to say that if loginTimeout
is 0 then the default retry timeout of 20 seconds is used
otherwise the loginTimeout is used.
I understand your point about masking the underlying error
but looking at things from a support perspective, it is
better to have the app retry and keep working than have it
fail with an exception that could be regarded as a natural
consequence of the way named pipe listeners are implemented.
Why not output a message to the logger if a retry is invoked?
One day we should enhance the logging options so that we can
have useful diagnostic messages such as this one without the
massive network dump.
Mike.
Patch to fix "All pipe instances are busy" error plus FAQ update and TestAllPipeInstancesAreBusy class
Logged In: YES
user_id=84089
Mike,
Good point about defaulting the retry timeout to 20 seconds
when loginTimeout is 0 (default). I'm much happier with the
change now.
I've already added "Logger" output to the createNamedPipe()
method so that information is logged about retries if
logging is enabled.
I agree that we need a better logging infrastructure. I
know some people hate commons-logging, but I think it's very
useful when implemented correctly. The all-or-nothing of
the Logger class is just way too painful. :)
Dave
Logged In: YES
user_id=84089
Committed patch v3 to CVS. Closing bug.
Dave