From: Florian G\. P. <fg...@ph...> - 2003-05-06 18:55:59
|
Hi I did some performance testing to see if async_exec is slower than exec. I created a database named dbi_test with just one table consisting of two columns. One is a unique id (called id), the other is a varchar (called data). I insert 1000 records into this column in one thread, and in another thread I increment a counter every 0.01 seconds. If the first thread completed, I compare the time it needed to insert 1000 records to the current counter value. This is run ten times in a row, and both the counter value and the executing time are averaged. The table is not truncated in between those runs. Between the 4 runs shown below, the table _was_ truncated. Here are the results: ===================== synchronous functions, counter thread enabled: Counter: 25.1, Duration:1.5601856 real 0m15.785s user 0m3.040s sys 0m0.270s ---------------------------------------------- pseudo-asynchronous functions, counter thread enabled: Counter: 131.1, Duration:1.7516624 real 0m17.708s user 0m2.890s sys 0m0.220s ------------------------------------------------------ synchronous functions, counter thread disabled: Counter: 0.0, Duration:1.5609098 real 0m15.826s user 0m2.670s sys 0m0.120s ----------------------------------------------- asynchronous functions, counter thread disabled: Counter: 0.0, Duration:1.5932554 real 0m16.113s user 0m2.880s sys 0m0.220s ------------------------------------------------ The first two block show a performance penality of the "asynchronous" method of about 200ms for 1000 inserts. This could have two reasons .) Those 200ms are lost in the counter thread, which counds to 131 insert to 25 now. .) Those 200ms are lost due to the asynchronous method having more overhead than the synchronous one. Since the difference is reduced to 30ms (for 1000 inserts again) once I disable the counter thread, reason 1) seems to account for most of the performance loss (which isn't really a loss in this case - it just the other threads getting their fair share of cpu time). greetings, Florian Pflug |
From: Florian G\. P. <fg...@ph...> - 2003-05-06 23:16:23
|
On Wed, May 07, 2003 at 12:46:11AM +0200, Michael Neumann wrote: > > Since the difference is reduced to 30ms (for 1000 inserts again) once I > > disable the counter thread, reason 1) seems to account for most of the > > performance loss (which isn't really a loss in this case - it just the other > > threads getting their fair share of cpu time). > > Do I interpret this correctly when I say that there's no reason to use the > pseudo-asynchronous methods (as they are in any case slower)? But you see that in the pseudo-asynchronous case the counter threads manages to count up to 131. It counts for 1.5 seconds, incrementing every 0.01 seconds. In a perfect world, it should therefore count up to 150 in 1.5 seconds. Now, without the "alias" magic, thus using the synchronous api, it counts only up to 25. This is because every time ruby executes a query, the _whole_ ruby process is blocked until the database backend has completed its operation. To see when this is a problem, just image a xmlrpc database wrapper. It performs queries on behalf of a client, and return the data using xmlrpc. Now consider a client making a very expensive query, that takes 10 seconds to complete. In those 10 seconds, no other client can even _connect_ to the xmlrpc serve (because the whole ruby process blocks, and can't accept() the connection). > Could you try the same with two (or more) threads inserting rows into a table > vs one thread? Does this make a difference? I can, but it won't really show why the current behaviour is a problem. If both threads access the database, than it doesn't really matter if they do this concurrently, or one after the other. The throughput will even slightly decrease, because of more context-switching overheader and the like if two backends work at the same time. What is much more interesting is one threads doing some serious selecting, and the other doing some non-database work. Then you will see that with the fully synchronous calls, the non-database thread stands still while the database threads waits for an answer. In the pseudo-asynchronous case, however, the non-database thread continues to operator normally. This is why I put this counter-thread into my benchmark. To show that in the pseudo-asynchronous case the non-database thread (my counter thread) manages to do more work, without hurting the performance of the database thread much (30 ms for 1000 inserts makes 30us/insert) greetings, Florian Pflug |
From: Florian G\. P. <fg...@ph...> - 2003-05-06 23:26:25
|
Hi sorry to reply to my own message - but after thinking a bit, I came to the conclusion that there might be a noticeable performance penality for a lot of very fast selects (or inserts, but I guess selects are much faster than inserts in general). This is due to the fact that the pseudo-asynchronous functions always activate the scheduler at least once (when doing rb_thread_select), and I guess this means that they have to wait for other threads to do their work (if there are non-blocked other threads) before they can continue. This might be what a user wants (if the other threads maintain a gui, this is most certainly what the user wants), but in some cases one might prefer the "traditional" behaviour. So, maybe the most elegant way is to introduct a parameter to the query-executing functions which lets the user decide what behaviour he prefers. I would still suggest to make the pseudo-asynchronous behaviour the default, since in most cases it won't have noticeable disadvantages, but very noticeable advantages. Greetings, Florian Pflug |
From: Michael N. <mne...@nt...> - 2003-05-07 09:00:15
|
On Wed, May 07, 2003 at 01:09:48AM +0200, Florian G. Pflug wrote: > On Wed, May 07, 2003 at 12:46:11AM +0200, Michael Neumann wrote: > > > Since the difference is reduced to 30ms (for 1000 inserts again) once I > > > disable the counter thread, reason 1) seems to account for most of the > > > performance loss (which isn't really a loss in this case - it just the other > > > threads getting their fair share of cpu time). > > > > Do I interpret this correctly when I say that there's no reason to use the > > pseudo-asynchronous methods (as they are in any case slower)? > But you see that in the pseudo-asynchronous case the counter threads manages > to count up to 131. It counts for 1.5 seconds, incrementing every 0.01 > seconds. In a perfect world, it should therefore count up to 150 in 1.5 > seconds. > > Now, without the "alias" magic, thus using the synchronous api, it counts > only up to 25. This is because every time ruby executes a query, the _whole_ > ruby process is blocked until the database backend has completed its > operation. > > To see when this is a problem, just image a xmlrpc database wrapper. It > performs queries on behalf of a client, and return the data using xmlrpc. > Now consider a client making a very expensive query, that takes 10 seconds > to complete. In those 10 seconds, no other client can even _connect_ to the > xmlrpc serve (because the whole ruby process blocks, and can't accept() the > connection). > > > Could you try the same with two (or more) threads inserting rows into a table > > vs one thread? Does this make a difference? > I can, but it won't really show why the current behaviour is a problem. If > both threads access the database, than it doesn't really matter if they do > this concurrently, or one after the other. The throughput will even slightly > decrease, because of more context-switching overheader and the like if two > backends work at the same time. Thanks, now I got the point :-) > What is much more interesting is one threads doing some serious selecting, > and the other doing some non-database work. Then you will see that with the > fully synchronous calls, the non-database thread stands still while the > database threads waits for an answer. > > In the pseudo-asynchronous case, however, the non-database thread continues > to operator normally. > > This is why I put this counter-thread into my benchmark. To show that in the > pseudo-asynchronous case the non-database thread (my counter thread) manages > to do more work, without hurting the performance of the database thread much > (30 ms for 1000 inserts makes 30us/insert) I think 30us is acceptable (at least, it's for me). There are now two possible ways to go: 1) Replace query/exec with asyc_query/async_exec. 2) Introduce an option "async" => true/false that let you choose which variant to use. I'm not sure if 2) is really worth. Regards, Michael |
From: Florian G\. P. <fg...@ph...> - 2003-05-07 09:05:03
|
On Wed, May 07, 2003 at 10:22:16AM +0200, Michael Neumann wrote: > > This is why I put this counter-thread into my benchmark. To show that in the > > pseudo-asynchronous case the non-database thread (my counter thread) manages > > to do more work, without hurting the performance of the database thread much > > (30 ms for 1000 inserts makes 30us/insert) > > I think 30us is acceptable (at least, it's for me). > > There are now two possible ways to go: > > 1) Replace query/exec with asyc_query/async_exec. > > 2) Introduce an option "async" => true/false that let you choose > which variant to use. I beliebe that other database backends might have the same problem - and for some the fix might hurt performance alot. Thats why after thinking about this a bit, I believe this should be made an option, but the default should be "pseudo-asynchronous" (because in the average case it's the more sensible behaviour) Maybe some generic name like "assume_fastquery" would be best - since it would depend on the backend if it his information is used, and in which way. greetings, Florian Pflug |
From: Michael N. <mne...@nt...> - 2003-05-09 20:19:44
|
On Wed, May 07, 2003 at 11:03:46AM +0200, Florian G. Pflug wrote: > On Wed, May 07, 2003 at 10:22:16AM +0200, Michael Neumann wrote: > > > This is why I put this counter-thread into my benchmark. To show that in the > > > pseudo-asynchronous case the non-database thread (my counter thread) manages > > > to do more work, without hurting the performance of the database thread much > > > (30 ms for 1000 inserts makes 30us/insert) > > > > I think 30us is acceptable (at least, it's for me). > > > > There are now two possible ways to go: > > > > 1) Replace query/exec with asyc_query/async_exec. > > > > 2) Introduce an option "async" => true/false that let you choose > > which variant to use. > > I beliebe that other database backends might have the same problem - and for > some the fix might hurt performance alot. Thats why after thinking about > this a bit, I believe this should be made an option, but the default should > be "pseudo-asynchronous" (because in the average case it's the more sensible > behaviour) There's currently no other database module that has such pseudo-async methods (or I am simply not aware of it). I've come to the conclusion that the best probably is to introduce a "pg_async_exec" flag and then dependend on this flag execute either async_exec or exec in the Statement#execute method. Note that all other methods still call exec. As those calls usually issue COMMIT, ROLLBACK or BEGIN statements, I don't think there's a big differece when using async_exec. But of course I may be wrong. Or should I use everywhere the async method when the flag is set? Also note, that pg_async_exec will be true by default. > Maybe some generic name like "assume_fastquery" would be best - since it > would depend on the backend if it his information is used, and in which way. As long as there are no other database modules, I am against this. Regards, Michael |
From: Florian G\. P. <fg...@ph...> - 2003-05-10 02:12:33
|
On Fri, May 09, 2003 at 10:19:30PM +0200, Michael Neumann wrote: > There's currently no other database module that has such pseudo-async > methods (or I am simply not aware of it). I just checked - the mysql module seems to have no support. The "old" (oracle 7) compatible module doesn't seem to have support either (altough I'm not sure with oracle... the oci interface is more complex than the mysql or postgres interface, so I might have misread the code) But there is a oci8 interface for ruby (http://www.jiubao.org/ruby-oci8/). This also includes a DBI Driver (DBD::OCI8). This driver implements the flag "NonBlocking" to let the user decide indicate which behaviour he prefers. ----------------------------------------------------------------- class Driver < DBI::BaseDriver ... ... def connect( dbname, user, auth, attr ) handle = ::OCI8.new(user, auth, dbname, attr['Privilege']) handle.non_blocking = true if attr['NonBlocking'] return Database.new(handle, attr) rescue OCIException => err raise raise_error(err) end ... ... ----------------------------------------------------------------- > I've come to the conclusion that the best probably is to introduce a > "pg_async_exec" flag and then dependend on this flag execute either > async_exec or exec in the Statement#execute method. See above. Since there is a oracle DBD, and it has this kind of flag, I think it would be best to name it the same ("NonBlocking"). Come to think if it I also believe, that "NonBlocking" better describes what it actually does, since from a users point of view exec() and query() calls are still synchronous - they just don't influence other threads. > Note that all other > methods still call exec. As those calls usually issue COMMIT, ROLLBACK > or BEGIN statements, I don't think there's a big differece when using > async_exec. But of course I may be wrong. Or should I use everywhere > the async method when the flag is set? IMHO you need to use for every query that may potentially take a long time. Remember, when a query is issued using the "synchronous" call (exec instead of async_exec), _all_ _other_ _threads_ are blocked until the query completes. "BEGIN" (and possibly "ROLLBACK") are usually quite fast, I guess. But "COMMIT" could take quite a time of you did a lot of inserts and updates in the transaction (or so I think). greetings, Florian Pflug |
From: Michael N. <mne...@nt...> - 2003-05-10 11:05:56
|
On Sat, May 10, 2003 at 04:12:24AM +0200, Florian G. Pflug wrote: > On Fri, May 09, 2003 at 10:19:30PM +0200, Michael Neumann wrote: > > There's currently no other database module that has such pseudo-async > > methods (or I am simply not aware of it). > > I just checked - the mysql module seems to have no support. > The "old" (oracle 7) compatible module doesn't seem to have support either > (altough I'm not sure with oracle... the oci interface is more complex than > the mysql or postgres interface, so I might have misread the code) > > But there is a oci8 interface for ruby (http://www.jiubao.org/ruby-oci8/). > This also includes a DBI Driver (DBD::OCI8). This driver implements the flag > "NonBlocking" to let the user decide indicate which behaviour he prefers. Yes, this flags to be the same as the async_exec method in Pg. I agree with you that we should reuse this flag. OCI8 disable NonBlocking by default. We should do the same for DBD::Pg. > > I've come to the conclusion that the best probably is to introduce a > > "pg_async_exec" flag and then dependend on this flag execute either > > async_exec or exec in the Statement#execute method. > > See above. Since there is a oracle DBD, and it has this kind of flag, I > think it would be best to name it the same ("NonBlocking"). Come to think if > it I also believe, that "NonBlocking" better describes what it actually > does, since from a users point of view exec() and query() calls are still > synchronous - they just don't influence other threads. I'd even like "Blocking" as flag more, because when NonBlocking=false you have to think a bit more if it's now blocking or not :-) (at least this is the case for my lazy mind). But luckily it's not the case that one would use this flag very often in a program. > > Note that all other > > methods still call exec. As those calls usually issue COMMIT, ROLLBACK > > or BEGIN statements, I don't think there's a big differece when using > > async_exec. But of course I may be wrong. Or should I use everywhere > > the async method when the flag is set? > > IMHO you need to use for every query that may potentially take a long time. > Remember, when a query is issued using the "synchronous" call > (exec instead of async_exec), _all_ _other_ _threads_ are blocked until > the query completes. No problem. So I'll use it for every statement. > "BEGIN" (and possibly "ROLLBACK") are usually quite fast, I guess. But > "COMMIT" could take quite a time of you did a lot of inserts and updates in > the transaction (or so I think). Thanks for your comments. Regards, Michael |