Re: [Tcl-tdbc] Connection pooling for TDBC

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Feb 11, 2020 at 9:32 AM Cyan Ogilvie <cya...@gm...> wrote:
> I've put together a thread-aware connection pooling mechanism for TDBC and would like to know if there is interest in merging the changes into TDBC itself, and if so, what the process for that would be.

This sounds like a terrific idea. At this point, the process is quite
informal - we publish 'Informational' TIP's for major API changes, but
that's about it.

I'm presuming that the 'thread-aware'-ness is at the driver (and
connection attachment) level? At the higher levels, there are quite a
lot of Tcl_Obj's about, and the logic is mostly designed as
'multi-interp-safe and thread-oblivious'.

> The design looks like this:
>
> To retrieve a handle from the pool:
>
>     tdbc::${driver}::connection create db -host localhost -db foo
>
> becomes:
>
>     tdbc::pool $driver create db -host localhost -db foo
>
> When the thread is finished with the connection, it instead of
>
>     db destroy   (or db close)
>
> it would do:
>
>     db release
>
> which destroys db and any statements and resultsets associated with it, rolls back the transaction if one is open, and parks the database handle for the connection in a pool of detached connections.  The next call to tdbc::pool that matches the driver and connection arguments will retrieve the most recently released connection instead of creating a new connection (making it more of a stack than a pool).  This is done to keep the busy set of handles to a minimum and allow the idle ones to time out (2 minutes) and be closed.  This way the pool will grow as needed with a load spike and then shrink back down once the spike has passed and idle detached connections time out.

Nice, clean, simple.  I like it a lot.

I wonder if we could actually override the object destruction in some
way, or else go through a wrapper object, so that the user of the
connection would use 'db destroy' or 'rename db {}' in either case. I
can easily see myself forgetting which pathway to use when finalizing
a connection!

I'm copying Donal and Sean on this so as to get suggestions from the
TclOO guru and the architecture astronaut. I could certainly work up
wrapping logic that would work, but I'm guessing that they will point
out subtleties that I will miss.

> When tdbc::pool issues a connection from the pool (creating a fresh one if needed), and he pool is empty, it initiates a new connection in a background thread and adds it to the pool, priming it for the next request.  The intent of the pooling mechanism is primarily to hide the connection setup latency from short-lived users of the db handles, as is typical of web server usage patterns.
>
> Time to get a handle from the pool on my test machine is between 80 and 300 microseconds, vs 7000 to 18000 microseconds without the pool (postgres or mysql database on localhost).

Sweet!

Now I'm thinking. Having some sort of wrapper object would be really
nice, because another potential performance win would be to allow a
pooled connection to retain its prepared statements. That would be
trickier, since the Statement object isn't thread-safe at present, but
most of the stuff in it could be transferred among threads without too
much baggage. I _think_. No time right now to delve into it, I'm
afraid.

> The detached connections in the pool are periodically checked (every 20 seconds), and those that are no longer connected to the server or which are older than 2 minutes are closed and removed from the pool.  If the pool size goes to 0 as a result, a new connection is created to prime the pool.

Does this create a blip every 20 seconds on an idle application, where
a never-used connection is closed and immediately reopened? Is this
intentional, perhaps to make sure that the server connection is still
live, or am I misunderstanding what's up here?

> In order to support connection pooling, a TDBC driver needs 3 things:
>
> - Support construction with a single parameter "-attach $handle", which should create a new connection instance and attach it to the previously detached database handle indicated by $handle:
>
>     tdbc::${driver}::connection new -attach $handle
>
> - A new method "detach" that destroys the connection and returns a handle that can be used in the future to reattach to this database connection (possibly from another thread):
>
>     $connection detach
>
> - A new method "connected" which verifies that the underlying database connection is still valid, returning a boolean:
>
>     $connection connected

Simple enough!
>
> I have implemented these changes for the postgres and mysql drivers, using a global Tcl_HashTable protected with a mutex to store the ConnectionData struct against the issued handle (with the pidata removed).

Simple enough again.  Even an array with linear searching would be
workable, since we're highly unlikely to be dealing with thousands (or
even hundreds) of connections in a process.

> If a driver lacks these additions, the behaviour falls back to detach -> destroy, and tdbc::pool will just always construct a fresh connection object, so it is backwards and forwards compatible with drivers that don't know about the connection pooling.

Perfect!

ODBC would be easy enough to add. There's already a global mutex in
the driver because the SQLENV (needed for connection construction and
a few other bits) is per-process. With SQLite3 being entirely
in-process, and with the fact that we have all the overhead coming
from the fact that its driver is written in Tcl atop an existing Tcl
API, it's probably a game not worth the candle. The native Oracle
driver is moribund, and the only other driver I'm aware of is Steve
Redler's network-proxy thing.  You've done the hard part.

> I have a basic set of test cases for the changes in the postgres and mysql drivers, and the connection pool machinery itself (currently implemented in the TDBC core package, in Tcl code).  If this feature is something that could be merged, then I'll write the docs and try to write test cases for the more pathological race conditions.

> It currently supports vanilla Tcl using the Thread package, and NaviServer using it's own threading (because thread::create interacts badly NaviServer during certain parts of the server startup).  That means there is some polyfill code to provide thread::send when using NaviServer's threads, which lack that capability (using [chan pipe]).  Since my motivation for this connection pooling mechanism is to replace ns_db in NaviServer for rubylane.com, it's important to me that it works in NaviServer, but the polyfill may be too grubby and specific for TDBC generally.  It could be merged with just Thread support and I'll just maintain the polyfill for NaviServer separately for our use.

Fraser's Law: All the world's problems are solved with one more layer
of indirection.

I'd be fine separating the communication interface from
implementation, and if we choose not to ship the NaviServer plugin,
the API will still be there to support it. (The Notifier in 8.4-8.6
was a similar decision: it was implemented at C level with an abstract
API [a structure of function pointers], and only one implementation
shipped per platform. Browser plugins replaced the implementation as
part of their initialization.

That's kind of what OO is for, isn't it?

-- 
73 de ke9tv/2, Kevin

Re: [Tcl-tdbc] Connection pooling for TDBC

The Tool Command Language implementation

Re: [Tcl-tdbc] Connection pooling for TDBC