Re: [naviserver-devel] commit/nsdbi: 3 new changesets

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Am 03.02.14 02:39, schrieb Stephen:
> How about introducing a _db_foreach helper to replace the
> stereotypical while loop outlined above -- something with an interface
> more amenable to talking with both dbi and nsdb. Looking briefly at
> db_foreach and db_multirow, it looks like some kind of helper could
> clean up turning the sets into local variables without touching the
> bulk of the code around it. Then sets could be avoided entirely when
> using dbi.
converting ns-sets naively into local variables will clobber other locals,
using some strange array names etc. would still require to modify
all functions accessing the ns-sets, or passing the ns-sets around etc.
> By legacy I mean before dicts, sets were a useful alternative to the
> weird by-name arrays, but not any more. And it looks like you're
> focused on some kind of xo orm, so the bits under the hood which
> connect db_* with dbi are to keep legacy code working as more stuff is
> moved to xo. With that in mind, it seems a shame to re-surface sets in
> the dbi api, to be used in code no one's going to see, especially when
> there are alternatives.
i am still exploring, what's feasible for OpenACS. there is both, a orm 
mapping
and a db-abstraction layer in the xotcl-core packages, which work together.
Since code-wise, most parts of OpenACS are not object oriented, this db 
abstraction
layer is a good candidate for reuse in wider parts of OpenACS. This code is
handle-free, and simpler (it not deal with the huge number seldomly
used features of ns_db*)
>
>>> 2) New generic xo functions
>>>
>>> I'm not following how the implementation as it is now, which returns a
>>> list of lists with duplicated column names, is faster than just
>>> returning a list of dicts. You have to feed the lists to dict create
>>> before you use it, which is extra work and always going to be slower.
>> i am not following you here. The print string of a tcl dict is a tcl
>> list. there is no need to use "dict create". try:
>>
>>     dict get {a 1 b 2} b
>>
>> what i said in the comment was that adding elements to a list is faster
>> than adding entries to the dict in the loop iterating over the columns.
>> Furthermore, the list without the internal representation of the dict
>> (essentially the hash table) requires less memory.
> 'dict get' is implemented by DictGetCommand:
sure, i know what "dict get" does. I (mis?)read from your argument, that
an explicit "dict create" is required. this lazy implementation is
quite useful (see below).
>> you say that
>>
>>       set results [dbi_rows -colums cols -- *SQL*]
>>       set dicts [dbi_dicts $results $cols]
>>
>> is faster than
>>
>>       set dicts [dbi_rows -result dicts -- *SQL*]
> No, this:
>
>      set rows [dbi_rows -columns cols -- $sql]
>      foreach d [dbi_dicts $cols $rows] {
>          puts [dict get $d k]
>      }
>
> ...is faster and uses less memory than this:
>
>      foreach row [dbi_rows -result dicts -- $sql] {
>          puts [dict get $row k]
>      }
No, its not. The data duplication and double iteration are still needed
in your suggested approach. Due to the lazy dict conversion there
might be many cases, where not all dicts are created.
But even, when all lists are converted to dicts, you suggestion is slower.
Look at the following tests:

p1: dbi_rows returns a flat list, convert the flat list to a list of 
lists, convert  lazy to dicts
p2: dbi_rows returns a flat list, convert the flat list to a list of 
dicts, use the dicts
p3: dbi_returns a list of lists, convert  lazy to dicts

The times for retrieving 1000 values with 14 columns via SQL and to 
process as indicated are
(micro seconds average of 100 runs).

    p1    35692
    p2    45767
    p3    29407

The experiment indicates, that p2 = p3 * 1.56.
The built-in solution (p3) is much faster, appending to the list and
convert lazy to the dict (p1) is faster than straight dict creation (p2)

If we look at ns_sets, we see that ns-sets better than some people
might suggest:

p4: dbi_rows returns a flat list, convert the flat list to a list of 
ns-sets, access all columns in the loop
p5: dbi_returns a list of lists, convert  rows to ns-sets, access all 
columns in the loop
p6: dbi_returns a list of ns-sets, access all columns in the loop

    p4    41607
    p5    25508
    p6    16095

Interestingly, p6 is the fastest of all tests, and p4 is the slowest of 
the ns-set tests
(p4 = p6 * 2.59). also unexpected to me, p5 is faster than p3, since the
list-of-lists is converted to an ns-set, which is faster than the dict 
creation.
p6 has the advantage that for 1000 tuples and 14 cols the result list
contain just 1000 (small) entries, where the flat list contains 14000 
entries.

Below are some more tests with result-set sizes 100, 1,000 and 10,000.
All results are normalized to the fastest results (always p6).

These results can be made more similar by coding the flat-to-struct 
statements
in C, but i see no indication that the eager dict generation will be 
better than
the list operations used now. My internal drive to drop the fastest solution
(with ns-sets) is not too big. ns-sets have the disadvantage of not being
cachable, but i am not a big fan of caching raw sql row-sets at all.

>> what is the problem with a dbi_foreach in the dbi infrastructure of the
>> form:
>>
>>      dbi_foreach ?-db name? ?-autonull? ?-timeout t? ?-bind bindSource?
>> ?-max nrows? ?--? query ?body?
>>
> Because client-side nested queries are a bad idea in a web server
> environment and it was an explicit design goal of nsdbi to prevent
> them.
>
> ....
Non of these arguments look as show-stoppers to me. I am not a fan
of db_foreach either, and yes, globbering etc. is bad. In OpenACS core
+ packages i count 1490 db_foreach and 1697 db_multirow
(where the latter concerns me less). The 1490 db_foreach
have these properties, and it is unclear, how many of
these require break etc.

It is to easy to say, "get your sql queries right" than you need
no calls to the tcl api to restrict the answer set. This means
that for every call in the tcl-api in the body of the db_foreach loop
partitioning the results, one needs as well a SQL implementation
of that same functionality. OpenACS on our production
environment has more than 1000 tables and views,
many tables have 10mio+ tuples, and our environment is
not special about this. Relying there on a maxrow limit
and returning the full sets does not seem practical.

...
> Another downside of foreach is that it clobbers local variables, which
> was the original motivation for adding dicts. (dbi_rows with a
> template doesn't have this problem).
dbi_rows + templating is pretty limited, since it supports
just hard-coded output, not some computed output
(widgets with a tcl api, tdom etc.). it is not feasible to
code this in sql.
>> PS: using the OpenACS based xowiki with dbi (and the functions in
>> mercurial head) lead to a 20% performance improvement with very little
>> changes.
> Neat.
>
> I wonder if you'll notice improvements under load, in real life
> conditions.
that would require to get more of the OpenACS ns_db interface
kind of working with dbi_*

> The original micro benchmarks were impressive, but there's
> also the effects of handle management.
sure, these tests benefit also over-proportional from prepare
(running every query 100 times in sequence), but running
every query just once would be even less realistic.

The overall speedup is hard to compare, especially when
comparing OpenACS sites with non-OpenACS sites.
If one has a smaller tcl layer than OpenACS, the speed
improvement will be much higher than the measured 20%.
but this is already very notable.
> You have a choice between fewer
> back-ends, due to shorter lifetimes, or more with the 1-1 conn-thread
> to db backend mode, if you discover that on average every conn thread
> uses a db backend.
in our current production environment (with ns_db) requests to the
application server require on average about 9 SQL queries
(but up to 700 SQL queries uncached for the personalized start
page). We are still running there behind an nginx for static files.
These SQL queries are often served from multiple handles
(OpenACS has three pools for primary, secondary and
tertiary requests, which are used for nesting sql queries,
e.g. in db_foreach). Therefore every active connection
thread on the backend uses typically 1 or more db-backends.
The db-pools have up to 100 SQL handles.

all the best
-gn

Re: [naviserver-devel] commit/nsdbi: 3 new changesets

NaviServer, a high performance web server written in C and Tcl

Re: [naviserver-devel] commit/nsdbi: 3 new changesets