From: Andrzej J. T. <an...@ch...> - 2009-11-29 00:29:31
|
Jim: >> I'm trying to optimize some of our query code, and have a function that >> basically does something like this: >> >> exists( collection( ...)[ some predicates ] ) > > would be useful to have some example predicates .... a lot of times > exists() can be rewritten to take advantage of 'early exit' ... if > eXist is doing this rewriting is another question. OK...here's a test case that I ran in the sandbox: declare function local:find-existing-items( $itemList ) as item()* { collection( "/db/chaeron/data" )//l1/l2[ exists( index-of( $itemList, @extension ) ) and @identifierType = 'XYZ' ]/@extension }; let $items := ( "999123456", "9999912345", "444444444", "444444443", "444444441", "44444444x", "111111111", "222222222", "333333333", "111111119", "" ) let $logs1 := util:log-app( "info", "com.chaeron.lookup.test", "Started - exists check" ) let $existing := local:find-existing-items( $items ) let $logs2 := util:log-app( "info", "com.chaeron.lookup.test", "Done - exists check" ) return <existing> { string-join( $existing, " " ) }</existing> Here's some more background info: 1) There are about 2200 documents, ranging from 100K to 1.0MB (serialized XML text) in the /db/chaeron/data collection, spread across about 30 subcollections. 2) both the @extension and @identifierType attributes have range indexes on them (I checked!) 3) When I first start up Tomcat, it can take 10-20 seconds for the local:find-existing-items() to complete (that's what the log entries are for...to get timestamps). But after 4-5 executions it gets down to about 1.5 seconds consistently. 4) What is interesting is that changing the number of strings in the $items list has no appreciable effect on the time the function takes to complete. Even with zero entries, it still takes about 1.5 seconds. I would have thought that with nothing else running, the active indexes and the like, that this kind of query on only 2200 documents (albeit reasonably complicated ones) would take a lot less than 1.5 seconds. The problem is when I ramp up to 40K documents and 100 items in the list, my query takes 30 seconds (and this with a 2.5GB JVM, 1GB of cache set in conf.xml and 128M of collection cache) on a pretty decent Core 2 Duo with 7200 rpm drive. It's this 30 second time I'm trying to reduce, since it's done at user login, and thus the user experience is not great. Any ideas on what might be causing the almost fixed query response time and what I might be able to do about it? Thanks! -- Andrzej Taramina Chaeron Corporation: Enterprise System Solutions http://www.chaeron.com |