From: Gustaf N. <ne...@wu...> - 2014-02-03 14:41:12
|
Am 03.02.14 02:39, schrieb Stephen: > How about introducing a _db_foreach helper to replace the > stereotypical while loop outlined above -- something with an interface > more amenable to talking with both dbi and nsdb. Looking briefly at > db_foreach and db_multirow, it looks like some kind of helper could > clean up turning the sets into local variables without touching the > bulk of the code around it. Then sets could be avoided entirely when > using dbi. converting ns-sets naively into local variables will clobber other locals, using some strange array names etc. would still require to modify all functions accessing the ns-sets, or passing the ns-sets around etc. > By legacy I mean before dicts, sets were a useful alternative to the > weird by-name arrays, but not any more. And it looks like you're > focused on some kind of xo orm, so the bits under the hood which > connect db_* with dbi are to keep legacy code working as more stuff is > moved to xo. With that in mind, it seems a shame to re-surface sets in > the dbi api, to be used in code no one's going to see, especially when > there are alternatives. i am still exploring, what's feasible for OpenACS. there is both, a orm mapping and a db-abstraction layer in the xotcl-core packages, which work together. Since code-wise, most parts of OpenACS are not object oriented, this db abstraction layer is a good candidate for reuse in wider parts of OpenACS. This code is handle-free, and simpler (it not deal with the huge number seldomly used features of ns_db*) > >>> 2) New generic xo functions >>> >>> I'm not following how the implementation as it is now, which returns a >>> list of lists with duplicated column names, is faster than just >>> returning a list of dicts. You have to feed the lists to dict create >>> before you use it, which is extra work and always going to be slower. >> i am not following you here. The print string of a tcl dict is a tcl >> list. there is no need to use "dict create". try: >> >> dict get {a 1 b 2} b >> >> what i said in the comment was that adding elements to a list is faster >> than adding entries to the dict in the loop iterating over the columns. >> Furthermore, the list without the internal representation of the dict >> (essentially the hash table) requires less memory. > 'dict get' is implemented by DictGetCommand: sure, i know what "dict get" does. I (mis?)read from your argument, that an explicit "dict create" is required. this lazy implementation is quite useful (see below). >> you say that >> >> set results [dbi_rows -colums cols -- *SQL*] >> set dicts [dbi_dicts $results $cols] >> >> is faster than >> >> set dicts [dbi_rows -result dicts -- *SQL*] > No, this: > > set rows [dbi_rows -columns cols -- $sql] > foreach d [dbi_dicts $cols $rows] { > puts [dict get $d k] > } > > ...is faster and uses less memory than this: > > foreach row [dbi_rows -result dicts -- $sql] { > puts [dict get $row k] > } No, its not. The data duplication and double iteration are still needed in your suggested approach. Due to the lazy dict conversion there might be many cases, where not all dicts are created. But even, when all lists are converted to dicts, you suggestion is slower. Look at the following tests: p1: dbi_rows returns a flat list, convert the flat list to a list of lists, convert lazy to dicts p2: dbi_rows returns a flat list, convert the flat list to a list of dicts, use the dicts p3: dbi_returns a list of lists, convert lazy to dicts The times for retrieving 1000 values with 14 columns via SQL and to process as indicated are (micro seconds average of 100 runs). p1 35692 p2 45767 p3 29407 The experiment indicates, that p2 = p3 * 1.56. The built-in solution (p3) is much faster, appending to the list and convert lazy to the dict (p1) is faster than straight dict creation (p2) If we look at ns_sets, we see that ns-sets better than some people might suggest: p4: dbi_rows returns a flat list, convert the flat list to a list of ns-sets, access all columns in the loop p5: dbi_returns a list of lists, convert rows to ns-sets, access all columns in the loop p6: dbi_returns a list of ns-sets, access all columns in the loop p4 41607 p5 25508 p6 16095 Interestingly, p6 is the fastest of all tests, and p4 is the slowest of the ns-set tests (p4 = p6 * 2.59). also unexpected to me, p5 is faster than p3, since the list-of-lists is converted to an ns-set, which is faster than the dict creation. p6 has the advantage that for 1000 tuples and 14 cols the result list contain just 1000 (small) entries, where the flat list contains 14000 entries. Below are some more tests with result-set sizes 100, 1,000 and 10,000. All results are normalized to the fastest results (always p6). These results can be made more similar by coding the flat-to-struct statements in C, but i see no indication that the eager dict generation will be better than the list operations used now. My internal drive to drop the fastest solution (with ns-sets) is not too big. ns-sets have the disadvantage of not being cachable, but i am not a big fan of caching raw sql row-sets at all. >> what is the problem with a dbi_foreach in the dbi infrastructure of the >> form: >> >> dbi_foreach ?-db name? ?-autonull? ?-timeout t? ?-bind bindSource? >> ?-max nrows? ?--? query ?body? >> > Because client-side nested queries are a bad idea in a web server > environment and it was an explicit design goal of nsdbi to prevent > them. > > .... Non of these arguments look as show-stoppers to me. I am not a fan of db_foreach either, and yes, globbering etc. is bad. In OpenACS core + packages i count 1490 db_foreach and 1697 db_multirow (where the latter concerns me less). The 1490 db_foreach have these properties, and it is unclear, how many of these require break etc. It is to easy to say, "get your sql queries right" than you need no calls to the tcl api to restrict the answer set. This means that for every call in the tcl-api in the body of the db_foreach loop partitioning the results, one needs as well a SQL implementation of that same functionality. OpenACS on our production environment has more than 1000 tables and views, many tables have 10mio+ tuples, and our environment is not special about this. Relying there on a maxrow limit and returning the full sets does not seem practical. ... > Another downside of foreach is that it clobbers local variables, which > was the original motivation for adding dicts. (dbi_rows with a > template doesn't have this problem). dbi_rows + templating is pretty limited, since it supports just hard-coded output, not some computed output (widgets with a tcl api, tdom etc.). it is not feasible to code this in sql. >> PS: using the OpenACS based xowiki with dbi (and the functions in >> mercurial head) lead to a 20% performance improvement with very little >> changes. > Neat. > > I wonder if you'll notice improvements under load, in real life > conditions. that would require to get more of the OpenACS ns_db interface kind of working with dbi_* > The original micro benchmarks were impressive, but there's > also the effects of handle management. sure, these tests benefit also over-proportional from prepare (running every query 100 times in sequence), but running every query just once would be even less realistic. The overall speedup is hard to compare, especially when comparing OpenACS sites with non-OpenACS sites. If one has a smaller tcl layer than OpenACS, the speed improvement will be much higher than the measured 20%. but this is already very notable. > You have a choice between fewer > back-ends, due to shorter lifetimes, or more with the 1-1 conn-thread > to db backend mode, if you discover that on average every conn thread > uses a db backend. in our current production environment (with ns_db) requests to the application server require on average about 9 SQL queries (but up to 700 SQL queries uncached for the personalized start page). We are still running there behind an nginx for static files. These SQL queries are often served from multiple handles (OpenACS has three pools for primary, secondary and tertiary requests, which are used for nesting sql queries, e.g. in db_foreach). Therefore every active connection thread on the backend uses typically 1 or more db-backends. The db-pools have up to 100 SQL handles. all the best -gn |