Thread: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

Status: Inactive

Brought to you by: fhwang, mneumann

ruby-dbi-devel

[Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Florian G\. P. <fg...@ph...> - 2003-05-06 18:55:59

Hi

I did some performance testing to see if async_exec is slower than exec.

I created a database named dbi_test with just one table consisting of two
columns. One is a unique id (called id), the other is a varchar (called
data).

I insert 1000 records into this column in one thread, and in another thread
I increment a counter every 0.01 seconds. If the first thread completed, I
compare the time it needed to insert 1000 records to the current counter
value.

This is run ten times in a row, and both the counter value and the executing
time are averaged. The table is not truncated in between those runs.

Between the 4 runs shown below, the table _was_ truncated.

Here are the results:
=====================

synchronous functions, counter thread enabled:
Counter: 25.1, Duration:1.5601856

real    0m15.785s
user    0m3.040s
sys     0m0.270s
----------------------------------------------

pseudo-asynchronous functions, counter thread enabled:
Counter: 131.1, Duration:1.7516624

real    0m17.708s
user    0m2.890s
sys     0m0.220s
------------------------------------------------------

synchronous functions, counter thread disabled:
Counter: 0.0, Duration:1.5609098

real    0m15.826s
user    0m2.670s
sys     0m0.120s
-----------------------------------------------

asynchronous functions, counter thread disabled:
Counter: 0.0, Duration:1.5932554

real    0m16.113s
user    0m2.880s
sys     0m0.220s
------------------------------------------------


The first two block show a performance penality of the "asynchronous" method
of about 200ms for 1000 inserts. This could have two reasons
.) Those 200ms are lost in the counter thread, which counds to 131 insert to
25 now.
.) Those 200ms are lost due to the asynchronous method having more overhead
than the synchronous one.

Since the difference is reduced to 30ms (for 1000 inserts again) once I
disable the counter thread, reason 1) seems to account for most of the
performance loss (which isn't really a loss in this case - it just the other
threads getting their fair share of cpu time).

greetings, Florian Pflug

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Florian G\. P. <fg...@ph...> - 2003-05-06 23:16:23

On Wed, May 07, 2003 at 12:46:11AM +0200, Michael Neumann wrote:
> > Since the difference is reduced to 30ms (for 1000 inserts again) once I
> > disable the counter thread, reason 1) seems to account for most of the
> > performance loss (which isn't really a loss in this case - it just the other
> > threads getting their fair share of cpu time).
> 
> Do I interpret this correctly when I say that there's no reason to use the
> pseudo-asynchronous methods (as they are in any case slower)?
But you see that in the pseudo-asynchronous case the counter threads manages
to count up to 131. It counts for 1.5 seconds, incrementing every 0.01
seconds. In a perfect world, it should therefore count up to 150 in 1.5
seconds. 

Now, without the "alias" magic, thus using the synchronous api, it counts
only up to 25. This is because every time ruby executes a query, the _whole_
ruby process is blocked until the database backend has completed its
operation.

To see when this is a problem, just image a xmlrpc database wrapper. It
performs queries on behalf of a client, and return the data using xmlrpc.
Now consider a client making a very expensive query, that takes 10 seconds
to complete. In those 10 seconds, no other client can even _connect_ to the
xmlrpc serve (because the whole ruby process blocks, and can't accept() the
connection).

> Could you try the same with two (or more) threads inserting rows into a table
> vs one thread? Does this make a difference?
I can, but it won't really show why the current behaviour is a problem. If
both threads access the database, than it doesn't really matter if they do
this concurrently, or one after the other. The throughput will even slightly
decrease, because of more context-switching overheader and the like if two
backends work at the same time.

What is much more interesting is one threads doing some serious selecting,
and the other doing some non-database work. Then you will see that with the
fully synchronous calls, the non-database thread stands still while the
database threads waits for an answer.

In the pseudo-asynchronous case, however, the non-database thread continues
to operator normally.

This is why I put this counter-thread into my benchmark. To show that in the
pseudo-asynchronous case the non-database thread (my counter thread) manages
to do more work, without hurting the performance of the database thread much
(30 ms for 1000 inserts makes 30us/insert)

greetings, Florian Pflug

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Florian G\. P. <fg...@ph...> - 2003-05-06 23:26:25

Hi

sorry to reply to my own message - but after thinking a bit, I came to the
conclusion that there might be a noticeable performance penality for a lot
of very fast selects (or inserts, but I guess selects are much faster than
inserts in general).

This is due to the fact that the pseudo-asynchronous functions always
activate the scheduler at least once (when doing rb_thread_select), and I
guess this means that they have to wait for other threads to do their work
(if there are non-blocked other threads) before they can continue.

This might be what a user wants (if the other threads maintain a gui, this
is most certainly what the user wants), but in some cases one might prefer
the "traditional" behaviour. 

So, maybe the most elegant way is to introduct a parameter to the
query-executing functions which lets the user decide what behaviour he
prefers. I would still suggest to make the pseudo-asynchronous behaviour the
default, since in most cases it won't have noticeable disadvantages, but
very noticeable advantages.

Greetings, Florian Pflug

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Michael N. <mne...@nt...> - 2003-05-07 09:00:15

On Wed, May 07, 2003 at 01:09:48AM +0200, Florian G. Pflug wrote:
> On Wed, May 07, 2003 at 12:46:11AM +0200, Michael Neumann wrote:
> > > Since the difference is reduced to 30ms (for 1000 inserts again) once I
> > > disable the counter thread, reason 1) seems to account for most of the
> > > performance loss (which isn't really a loss in this case - it just the other
> > > threads getting their fair share of cpu time).
> > 
> > Do I interpret this correctly when I say that there's no reason to use the
> > pseudo-asynchronous methods (as they are in any case slower)?
> But you see that in the pseudo-asynchronous case the counter threads manages
> to count up to 131. It counts for 1.5 seconds, incrementing every 0.01
> seconds. In a perfect world, it should therefore count up to 150 in 1.5
> seconds. 
> 
> Now, without the "alias" magic, thus using the synchronous api, it counts
> only up to 25. This is because every time ruby executes a query, the _whole_
> ruby process is blocked until the database backend has completed its
> operation.
> 
> To see when this is a problem, just image a xmlrpc database wrapper. It
> performs queries on behalf of a client, and return the data using xmlrpc.
> Now consider a client making a very expensive query, that takes 10 seconds
> to complete. In those 10 seconds, no other client can even _connect_ to the
> xmlrpc serve (because the whole ruby process blocks, and can't accept() the
> connection).
> 
> > Could you try the same with two (or more) threads inserting rows into a table
> > vs one thread? Does this make a difference?
> I can, but it won't really show why the current behaviour is a problem. If
> both threads access the database, than it doesn't really matter if they do
> this concurrently, or one after the other. The throughput will even slightly
> decrease, because of more context-switching overheader and the like if two
> backends work at the same time.

Thanks, now I got the point :-)

> What is much more interesting is one threads doing some serious selecting,
> and the other doing some non-database work. Then you will see that with the
> fully synchronous calls, the non-database thread stands still while the
> database threads waits for an answer.
> 
> In the pseudo-asynchronous case, however, the non-database thread continues
> to operator normally.
> 
> This is why I put this counter-thread into my benchmark. To show that in the
> pseudo-asynchronous case the non-database thread (my counter thread) manages
> to do more work, without hurting the performance of the database thread much
> (30 ms for 1000 inserts makes 30us/insert)

I think 30us is acceptable (at least, it's for me).

There are now two possible ways to go:

1) Replace query/exec with asyc_query/async_exec.

2) Introduce an option "async" => true/false that let you choose
   which variant to use.

I'm not sure if 2) is really worth.

Regards,

  Michael

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Florian G\. P. <fg...@ph...> - 2003-05-07 09:05:03

On Wed, May 07, 2003 at 10:22:16AM +0200, Michael Neumann wrote:
> > This is why I put this counter-thread into my benchmark. To show that in the
> > pseudo-asynchronous case the non-database thread (my counter thread) manages
> > to do more work, without hurting the performance of the database thread much
> > (30 ms for 1000 inserts makes 30us/insert)
> 
> I think 30us is acceptable (at least, it's for me).
> 
> There are now two possible ways to go:
> 
> 1) Replace query/exec with asyc_query/async_exec.
> 
> 2) Introduce an option "async" => true/false that let you choose
>    which variant to use.

I beliebe that other database backends might have the same problem - and for
some the fix might hurt performance alot. Thats why after thinking about
this a bit, I believe this should be made an option, but the default should
be "pseudo-asynchronous" (because in the average case it's the more sensible
behaviour)

Maybe some generic name like "assume_fastquery" would be best - since it
would depend on the backend if it his information is used, and in which way.

greetings, Florian Pflug

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Michael N. <mne...@nt...> - 2003-05-09 20:19:44

On Wed, May 07, 2003 at 11:03:46AM +0200, Florian G. Pflug wrote:
> On Wed, May 07, 2003 at 10:22:16AM +0200, Michael Neumann wrote:
> > > This is why I put this counter-thread into my benchmark. To show that in the
> > > pseudo-asynchronous case the non-database thread (my counter thread) manages
> > > to do more work, without hurting the performance of the database thread much
> > > (30 ms for 1000 inserts makes 30us/insert)
> > 
> > I think 30us is acceptable (at least, it's for me).
> > 
> > There are now two possible ways to go:
> > 
> > 1) Replace query/exec with asyc_query/async_exec.
> > 
> > 2) Introduce an option "async" => true/false that let you choose
> >    which variant to use.
> 
> I beliebe that other database backends might have the same problem - and for
> some the fix might hurt performance alot. Thats why after thinking about
> this a bit, I believe this should be made an option, but the default should
> be "pseudo-asynchronous" (because in the average case it's the more sensible
> behaviour)

There's currently no other database module that has such pseudo-async
methods (or I am simply not aware of it).

I've come to the conclusion that the best probably is to introduce a
"pg_async_exec" flag and then dependend on this flag execute either
async_exec or exec in the Statement#execute method. Note that all other
methods still call exec. As those calls usually issue COMMIT, ROLLBACK
or BEGIN statements, I don't think there's a big differece when using 
async_exec.  But of course I may be wrong. Or should I use everywhere 
the async method when the flag is set?

Also note, that pg_async_exec will be true by default.

> Maybe some generic name like "assume_fastquery" would be best - since it
> would depend on the backend if it his information is used, and in which way.

As long as there are no other database modules, I am against this.

Regards,

  Michael

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Florian G\. P. <fg...@ph...> - 2003-05-10 02:12:33

On Fri, May 09, 2003 at 10:19:30PM +0200, Michael Neumann wrote:
> There's currently no other database module that has such pseudo-async
> methods (or I am simply not aware of it).

I just checked - the mysql module seems to have no support.
The "old" (oracle 7) compatible module doesn't seem to have support either
(altough I'm not sure with oracle... the oci interface is more complex than
the mysql or postgres interface, so I might have misread the code)

But there is a oci8 interface for ruby (http://www.jiubao.org/ruby-oci8/).
This also includes a DBI Driver (DBD::OCI8). This driver implements the flag
"NonBlocking" to let the user decide indicate which behaviour he prefers.

-----------------------------------------------------------------
class Driver < DBI::BaseDriver
...
...
def connect( dbname, user, auth, attr )
    handle = ::OCI8.new(user, auth, dbname, attr['Privilege'])
    handle.non_blocking = true if attr['NonBlocking']
    return Database.new(handle, attr)
  rescue OCIException => err
    raise raise_error(err)
end
...
...
-----------------------------------------------------------------

> I've come to the conclusion that the best probably is to introduce a
> "pg_async_exec" flag and then dependend on this flag execute either
> async_exec or exec in the Statement#execute method.

See above. Since there is a oracle DBD, and it has this kind of flag, I
think it would be best to name it the same ("NonBlocking"). Come to think if
it I also believe, that "NonBlocking" better describes what it actually
does, since from a users point of view exec() and query() calls are still
synchronous - they just don't influence other threads.

> Note that all other
> methods still call exec. As those calls usually issue COMMIT, ROLLBACK
> or BEGIN statements, I don't think there's a big differece when using 
> async_exec.  But of course I may be wrong. Or should I use everywhere 
> the async method when the flag is set?

IMHO you need to use for every query that may potentially take a long time.
Remember, when a query is issued using the "synchronous" call
(exec instead of async_exec), _all_ _other_ _threads_ are blocked until
the query completes. 

"BEGIN" (and possibly "ROLLBACK") are usually quite fast, I guess. But
"COMMIT" could take quite a time of you did a lot of inserts and updates in
the transaction (or so I think).

greetings, Florian Pflug

Re: [Ruby-DBI-devel] PGconn::exec vs. PGconn::async_exec benchmark

From: Michael N. <mne...@nt...> - 2003-05-10 11:05:56

On Sat, May 10, 2003 at 04:12:24AM +0200, Florian G. Pflug wrote:
> On Fri, May 09, 2003 at 10:19:30PM +0200, Michael Neumann wrote:
> > There's currently no other database module that has such pseudo-async
> > methods (or I am simply not aware of it).
> 
> I just checked - the mysql module seems to have no support.
> The "old" (oracle 7) compatible module doesn't seem to have support either
> (altough I'm not sure with oracle... the oci interface is more complex than
> the mysql or postgres interface, so I might have misread the code)
> 
> But there is a oci8 interface for ruby (http://www.jiubao.org/ruby-oci8/).
> This also includes a DBI Driver (DBD::OCI8). This driver implements the flag
> "NonBlocking" to let the user decide indicate which behaviour he prefers.

Yes, this flags to be the same as the async_exec method in Pg. I agree
with you that we should reuse this flag. 

OCI8 disable NonBlocking by default. We should do the same for DBD::Pg. 

> > I've come to the conclusion that the best probably is to introduce a
> > "pg_async_exec" flag and then dependend on this flag execute either
> > async_exec or exec in the Statement#execute method.
> 
> See above. Since there is a oracle DBD, and it has this kind of flag, I
> think it would be best to name it the same ("NonBlocking"). Come to think if
> it I also believe, that "NonBlocking" better describes what it actually
> does, since from a users point of view exec() and query() calls are still
> synchronous - they just don't influence other threads.

I'd even like "Blocking" as flag more, because when NonBlocking=false you have 
to think a bit more if it's now blocking or not :-) (at least this is the case 
for my lazy mind).
But luckily it's not the case that one would use this flag very often in a
program.

> > Note that all other
> > methods still call exec. As those calls usually issue COMMIT, ROLLBACK
> > or BEGIN statements, I don't think there's a big differece when using 
> > async_exec.  But of course I may be wrong. Or should I use everywhere 
> > the async method when the flag is set?
> 
> IMHO you need to use for every query that may potentially take a long time.
> Remember, when a query is issued using the "synchronous" call
> (exec instead of async_exec), _all_ _other_ _threads_ are blocked until
> the query completes. 

No problem. So I'll use it for every statement.

> "BEGIN" (and possibly "ROLLBACK") are usually quite fast, I guess. But
> "COMMIT" could take quite a time of you did a lot of inserts and updates in
> the transaction (or so I think).

Thanks for your comments.


Regards,

  Michael