From: Gary L. <gar...@en...> - 2008-07-15 13:03:56
|
Wolfgang, This information was very helpful and matches exactly with the results I am seeing in testing. The real performance eater was count(distinct-values()) which on larger data sets would run until java heap error. The application has a GUI where grouping can be selected with optional summary counts. The counts can be useful sometimes, but now the penalty is understood. Bringing back the grouped detail data, even on large datasets, is incredibly fast. Thanks again, gary > -----Original Message----- > From: exi...@li... [mailto:exist-open- > bo...@li...] On Behalf Of Wolfgang Meier > Sent: Monday, July 14, 2008 5:21 PM > To: Gary Larsen > Cc: exi...@li... > Subject: Re: [Exist-open] query tuning > > > The total cache allocated does not seem to be fully used. I'm noticing > that > > the dom.dbx data buffers never go above 256: > > Unlike the other caches, the one for dom.dbx is indeed fixed and will > never grow. dom.dbx contains the actual document nodes and access to > it is mostly sequential. Over time, page access to dom.dbx will be > more or less random, so keeping too many pages in cache doesn't > improve the performance. > > For the other .dbx files, the access patterns are very different. The > caching algorithms will thus consider page access frequencies and it > makes sense to grow/shrink those caches on demand. > > > Is the dom.dbx data buffers adjustable? > > No, as explained above. > > > I'm also seeing this in the log after a query - always with the same > > numbers: > > > >>>>2008-07-14 15:00:05,186 [http-8080-Processor23] DEBUG (LRDCache.java > >>>> [cleanup]:153) - totalReferences = 640001; maxReferences = 640000 > > > > Does this mean anything? > > No, that's just for debugging and could probably be removed. > > > Are there any rules on why an expression cannot be optimized even if an > > index exists? > > eXist can optimize an expression by > > 1) using an index to speed up the unmodified expression > > 2) modifying the expression in such a way that index-based selections > are processed ahead of the rest of the expression, thus reducing the > size of the node sets to be handled as early as possible > > both things are tightly connected. Rewriting the expression doesn't > make sense unless there's an index to benefit from. Contrary to that, > an existing index can be used without rewriting the query, but then > the performance win is much smaller. > > Automatic query rewriting (2) is at least implemented for the > following common types of comparisons: > > //a/b[c = 'd'] > > //a/b[c = ('d', 'e')] > > //a/b[. = 'd'] > > //a/*[c = 'd'] > > //a/b[c = 'd'][e = 'f'] > > let $a := //a/b return $a[c = 'd'] (:trunk only:) > > The same patterns apply to expressions involving <, >, <=, >=, !=, > matches(), contains(), starts-with(), ends-with(), all ngram > functions, and all full text functions and operators. > > >>>>2008-07-14 15:09:55,259 [http-8080-Processor25] TRACE > >>>> (GeneralComparison.java [quickNodeSetCompare]:534) - found an index > of type: > >>>> xs:string > > > >>>>2008-07-14 15:09:55,275 [http-8080-Processor25] TRACE > >>>> (GeneralComparison.java [quickNodeSetCompare]:578) - Checking if > range index > >>>> can be used for key: Series7 / ccoughlin / My Folders > > > >>>>2008-07-14 15:09:55,275 [http-8080-Processor25] TRACE > >>>> (GeneralComparison.java [quickNodeSetCompare]:583) - Using range > index for > >>>> key: Series7 / ccoughlin / My Folders > > > >>>>2008-07-14 15:09:55,275 [http-8080-Processor25] TRACE (Optimize.java > >>>> [eval]:162) - exist:optimize: Cannot optimize expression. > > In this case, the query could not be rewritten and is thus processed > unmodified. Existing range indexes are used nevertheless, as shown in > the log messages printed from the GeneralComparison class. To find out > why the query rewriting optimizer failed to handle a certain part of > the query, you have to isolate the corresponding predicate expression > and see if it matches one of the patterns above or could be changed to > match one or if there's indeed an index defined on the qname. > > > For example it seems that count() expressions are not optimized. > > No. We do not have any indexes to speed up count, so it can't be > optimized. > > Wolfgang > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open |