Thread: [Exist-development] query collection containing one million documents

eXist-db is a feature rich Open Source native XML database

Brought to you by: deliriumsky, dizzzz, windauer, wolfgang_m

exist-development

[Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-09 13:18:04

  Hi,

I am using eXist to index one million xml documents of about 2.3GiB 
within one collection.

Unfortunately the query execution time is rather poor. Even a very 
simple XQuery results in execution times around 4sec:

   declare namespace xq="http://www.xml-cml.org/schema";
   for $i in collection("tuscreen")//xq:molecule[@id="298038"]
     return
<smile>{$i}</smile>

Logging says:

09 Jul 2010 15:11:55,783 [P1-9] DEBUG (XQuery.java [compile]:155) - 
Query diagnostics:
for <2>
     $i in collection("tuscreen")
     (# exist:optimize #)
     {
         descendant-or-self::xq:molecule[attribute::id = "298038"]
     }

return <3>
     element {"smile"} {
         {
             $i
         }
     }
09 Jul 2010 15:11:55,784 [P1-9] DEBUG (XQuery.java [compile]:161) - 
Compilation took 4 ms
09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java 
[visitGeneralComparison]:219) - exist:optimize: found optimizable: 
org.exist.xquery.GeneralComparison
09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java [before]:240) - 
exist:optimize: context step: 
descendant-or-self::xq:molecule[attribute::id = "298038"]
09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java [before]:241) - 
exist:optimize: context var: null
09 Jul 2010 15:11:59,496 [P1-9] TRACE (GeneralComparison.java 
[preSelect]:266) - Using QName index on type xs:string
09 Jul 2010 15:11:59,496 [P1-9] TRACE (GeneralComparison.java 
[preSelect]:298) - Using QName range index for key: 298038
09 Jul 2010 15:11:59,666 [P1-9] TRACE (Optimize.java [eval]:120) - 
exist:optimize: pre-selection: 1
...
9 Jul 2010 15:11:59,837 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 4.050 ms


Is there a way to speed up queries?

Thank you for any help

Patrick

Re: [Exist-development] query collection containing one million documents

From: Dmitriy S. <sha...@gm...> - 2010-07-09 14:24:03

Attachments: smime.p7s

On Fri, 2010-07-09 at 15:17 +0200, Patrick Schäfer wrote:
> 09 Jul 2010 15:11:55,784 [P1-9] DEBUG (XQuery.java [compile]:161) - 
> Compilation took 4 ms
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java 
> [visitGeneralComparison]:219) - exist:optimize: found optimizable: 
> org.exist.xquery.GeneralComparison
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java [before]:240) - 
> exist:optimize: context step: 
> descendant-or-self::xq:molecule[attribute::id = "298038"]
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java [before]:241) - 
> exist:optimize: context var: null
> 09 Jul 2010 15:11:59,496 [P1-9] TRACE (GeneralComparison.java 
> [preSelect]:266) - Using QName index on type xs:string
> 09 Jul 2010 15:11:59,496 [P1-9] TRACE (GeneralComparison.java 
> [preSelect]:298) - Using QName range index for key: 298038
> 09 Jul 2010 15:11:59,666 [P1-9] TRACE (Optimize.java [eval]:120) - 
> exist:optimize: pre-selection: 1
> ...
> 9 Jul 2010 15:11:59,837 [P1-9] DEBUG (XQuery.java [execute]:231) - 
> Execution took 4.050 ms

If calculate execution time throw log, get 58,119 - 58,119 = 1.547 ms
for collection("tuscreen")//xq:molecule[@id="298038"] Yes, it's slow,
but the rest should be much faster ... strange ...

What version do you use?

-- 
Cheers,

Dmitriy Shabanov

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-09 14:33:47

  Hi,

thx for the response.

> If calculate execution time throw log, get 58,119 - 58,119 = 1.547 ms
> for collection("tuscreen")//xq:molecule[@id="298038"] Yes, it's slow,
> but the rest should be much faster ... strange ...
>
> What version do you use?
I am using the jar file eXist-setup-1.4.0-rev10440.jar ...

09 Jul 2010 15:22:53,534 [main] INFO  (JettyStart.java [run]:101) - 
[eXist Version : 1.4.0]
09 Jul 2010 15:22:53,534 [main] INFO  (JettyStart.java [run]:103) - 
[eXist Build : 20091111]
09 Jul 2010 15:22:53,535 [main] INFO  (JettyStart.java [run]:105) - 
[eXist Home : /usr/local/exist]
09 Jul 2010 15:22:53,535 [main] INFO  (JettyStart.java [run]:107) - [SVN 
Revision : 10440]
09 Jul 2010 15:22:53,535 [main] INFO  (JettyStart.java [run]:115) - 
[Operating System : Linux 2.6.31.12-0.2-default amd64]


Patrick

Re: [Exist-development] query collection containing one million documents

From: Thomas W. <tho...@gm...> - 2010-07-09 14:32:00

Patrick,

What type index did you defined?
It may help to define a numeric type of range index and cast the value to a
number when you compare it.

Thomas

------

Thomas White

Mobile:+44 7711 922 966
Skype: thomaswhite
gTalk: thomas.0007
Linked-In:http://www.linkedin.com/in/thomaswhite0007
facebook: http://www.facebook.com/thomas.0007



On 9 July 2010 14:17, Patrick Schäfer <pat...@zi...> wrote:

>  Hi,
>
> I am using eXist to index one million xml documents of about 2.3GiB
> within one collection.
>
> Unfortunately the query execution time is rather poor. Even a very
> simple XQuery results in execution times around 4sec:
>
>   declare namespace xq="http://www.xml-cml.org/schema";
>   for $i in collection("tuscreen")//xq:molecule[@id="298038"]
>     return
> <smile>{$i}</smile>
>
> Logging says:
>
> 09 Jul 2010 15:11:55,783 [P1-9] DEBUG (XQuery.java [compile]:155) -
> Query diagnostics:
> for <2>
>     $i in collection("tuscreen")
>     (# exist:optimize #)
>     {
>         descendant-or-self::xq:molecule[attribute::id = "298038"]
>     }
>
> return <3>
>     element {"smile"} {
>         {
>             $i
>         }
>     }
> 09 Jul 2010 15:11:55,784 [P1-9] DEBUG (XQuery.java [compile]:161) -
> Compilation took 4 ms
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java
> [visitGeneralComparison]:219) - exist:optimize: found optimizable:
> org.exist.xquery.GeneralComparison
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java [before]:240) -
> exist:optimize: context step:
> descendant-or-self::xq:molecule[attribute::id = "298038"]
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java [before]:241) -
> exist:optimize: context var: null
> 09 Jul 2010 15:11:59,496 [P1-9] TRACE (GeneralComparison.java
> [preSelect]:266) - Using QName index on type xs:string
> 09 Jul 2010 15:11:59,496 [P1-9] TRACE (GeneralComparison.java
> [preSelect]:298) - Using QName range index for key: 298038
> 09 Jul 2010 15:11:59,666 [P1-9] TRACE (Optimize.java [eval]:120) -
> exist:optimize: pre-selection: 1
> ...
> 9 Jul 2010 15:11:59,837 [P1-9] DEBUG (XQuery.java [execute]:231) -
> Execution took 4.050 ms
>
>
> Is there a way to speed up queries?
>
> Thank you for any help
>
> Patrick
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development
>

Re: [Exist-development] query collection containing one million documents

From: Joe W. <jo...@gm...> - 2010-07-09 14:41:44

Hi Patrick,

>   for $i in collection("tuscreen")//xq:molecule[@id="298038"]
...
> Is there a way to speed up queries?

Sounds like there are some theories about getting to the bottom of the
slowness, but another tack might be to adopt this approach from the
Performance Tuning article, "4.9. Use fn:id to lookup xml:id
attributes" - see http://exist-db.org/tuning.html#N10382.  This would
require either using @xml:id instead of @id, or declaring @id as type
ID in a DTD - maybe more trouble than it's worth.  But I have a hunch
that the id() function is faster than a range index; unfortunately, I
don't actually see that spelled out anywhere.

Cheers,
Joe

Re: [Exist-development] query collection containing one million documents

From: Joe W. <jo...@gm...> - 2010-07-09 14:42:53

p.s. I think this question/discussion actually belongs in exist-open,
not exist-development.

Joe

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-09 14:48:51

  Hi Joe,

thank you for the response.
> Sounds like there are some theories about getting to the bottom of the
> slowness, but another tack might be to adopt this approach from the
> Performance Tuning article, "4.9. Use fn:id to lookup xml:id
> attributes" - see http://exist-db.org/tuning.html#N10382.  This would
> require either using @xml:id instead of @id, or declaring @id as type
> ID in a DTD - maybe more trouble than it's worth.  But I have a hunch
> that the id() function is faster than a range index; unfortunately, I
> don't actually see that spelled out anywhere.
That might solve this problem. Unfortunately the given query was just an 
example. The following query gives me similar execution times though I 
defined an index on "@value":

   declare namespace xq="http://www.xml-cml.org/schema";
   for $i in 
collection("tuscreen")//*[@value="Cc1c2c(ncn(c2=O)C)sc1C(=O)NCc1ccc(cc1)CN1CCOCC1"]
     return
       $i

09 Jul 2010 16:47:17,746 [P1-9] DEBUG (XQuery.java [compile]:161) - 
Compilation took 3 ms
09 Jul 2010 16:47:19,971 [P1-9] TRACE (Optimize.java 
[visitGeneralComparison]:219) - exist:optimize: found optimizable: 
org.exist.xquery.GeneralComparison
09 Jul 2010 16:47:19,972 [P1-9] TRACE (Optimize.java [before]:240) - 
exist:optimize: context step: null
09 Jul 2010 16:47:19,972 [P1-9] TRACE (Optimize.java [before]:241) - 
exist:optimize: context var: null
09 Jul 2010 16:47:21,267 [P1-9] TRACE (GeneralComparison.java 
[preSelect]:266) - Using QName index on type xs:string
09 Jul 2010 16:47:21,267 [P1-9] TRACE (GeneralComparison.java 
[preSelect]:298) - Using QName range index for key: 
Cc1c2c(ncn(c2=O)C)sc1C(=O)NCc1ccc(cc1)CN1CCOCC1
09 Jul 2010 16:47:21,428 [P1-9] TRACE (Optimize.java [eval]:120) - 
exist:optimize: pre-selection: 1
09 Jul 2010 16:47:21,530 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 3.784 ms
09 Jul 2010 16:47:21,530 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 1278439917815
09 Jul 2010 16:47:21,531 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 1278439917815
09 Jul 2010 16:47:21,531 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 3785ms.
09 Jul 2010 16:47:21,531 [P1-9] DEBUG (RpcConnection.java [queryP]:2434) 
- found 1



Index definition is:
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index xmlns:atom="http://www.xml-cml.org/schema">
<fulltext default="none" attributes="no"/>
<!-- Range indexes by qname -->
<create qname="@id" type="xs:string"/>
<create qname="@value" type="xs:string"/>
</index>
</collection>


Patrick

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-09 14:54:34

  Hi Thomas,

> What type index did you defined?
my index definition is:

<collection xmlns="http://exist-db.org/collection-config/1.0">
<index xmlns:atom="http://www.xml-cml.org/schema">
<fulltext default="none" attributes="no"/>
         ...
<create qname="@id" type="xs:string"/>
</index>
</collection>

> It may help to define a numeric type of range index and cast the value 
> to a number when you compare it.
I changed the index to integer:
<create qname="@id" type="xs:integer"/>

but there is still no improvement

09 Jul 2010 16:52:47,078 [P1-9] DEBUG (XQuery.java [compile]:161) - 
Compilation took 5 ms
09 Jul 2010 16:52:49,177 [P1-9] TRACE (Optimize.java 
[visitGeneralComparison]:219) - exist:optimize: found optimizable: 
org.exist.xquery.GeneralComparison
09 Jul 2010 16:52:49,178 [P1-9] TRACE (Optimize.java [before]:240) - 
exist:optimize: context step: 
descendant-or-self::xq:molecule[attribute::id = 298038]
09 Jul 2010 16:52:49,178 [P1-9] TRACE (Optimize.java [before]:241) - 
exist:optimize: context var: null
09 Jul 2010 16:52:50,465 [P1-9] TRACE (GeneralComparison.java 
[preSelect]:266) - Using QName index on type xs:integer
09 Jul 2010 16:52:50,465 [P1-9] TRACE (GeneralComparison.java 
[preSelect]:298) - Using QName range index for key: 298038
09 Jul 2010 16:52:50,625 [P1-9] TRACE (Optimize.java [eval]:120) - 
exist:optimize: pre-selection: 1
09 Jul 2010 16:52:50,696 [P1-9] TRACE (Optimize.java [eval]:140) - 
Ancestor selection took 71
09 Jul 2010 16:52:50,697 [P1-9] TRACE (Optimize.java [eval]:141) - Found: 1
09 Jul 2010 16:52:50,697 [P1-9] TRACE (Optimize.java [eval]:152) - 
exist:optimize: context after optimize: 1
09 Jul 2010 16:52:50,788 [P1-9] TRACE (Optimize.java [eval]:160) - 
exist:optimize: inner expr took 91; found: 1
09 Jul 2010 16:52:50,789 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 3.710 ms
09 Jul 2010 16:52:50,789 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
09 Jul 2010 16:52:50,789 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
09 Jul 2010 16:52:50,790 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 3711ms.
09 Jul 2010 16:52:50,790 [P1-9] DEBUG (RpcConnection.java [queryP]:2434) 
- found 1
09 Jul 2010 16:52:50,814 [P1-9] DEBUG (RpcConnection.java 
[retrieveFirstChunk]:2991) - Writing to temporary file: 
eXistRPCC4083305984497633446.xml
09 Jul 2010 16:53:08,575 [exist_QuartzScheduler_Worker-2] DEBUG 
(NGramIndex.java [sync]:86) - SYNC NGRAM
09 Jul 2010 16:53:08,575 [exist_QuartzScheduler_Worker-2] INFO  
(NativeBroker.java [sync]:3191) - Memory: 3.448.768K total; 7.281.792K 
max; 2.636.440K free


Thank you

Patrick

Re: [Exist-development] query collection containing one million documents

From: Joe W. <jo...@gm...> - 2010-07-09 21:52:41

Hi Patrick,

What are your memory settings?  Seeing this sometimes helps folks on
the list troubleshoot.  -Xmx, -Xms, and the "cacheSize" and
"collectionCache" and other attributes of the <db-connection> element
in your conf.xml file.

This might take you on the wrong track, but you could try adding a
Lucene index on @id / @value.  Even though we usually think of using
Lucene for fulltext search, it might be worth trying out the
whitespace analyzer -- see http://exist-db.org/lucene.html#N102F8.
Given all of the weird characters in your @value example, you should
probably construct your query as XML rather than using the lucene
search query parser, which could get thrown off -- see
http://exist-db.org/lucene.html#N10352.  It'd be interesting to know
how this works compared to the range indexes.

Joe

Re: [Exist-development] query collection containing one million documents

From: Dmitriy S. <sha...@gm...> - 2010-07-10 03:53:11

Attachments: smime.p7s

On Fri, 2010-07-09 at 16:54 +0200, Patrick Schäfer wrote:
> 09 Jul 2010 16:52:50,788 [P1-9] TRACE (Optimize.java [eval]:160) - 
> exist:optimize: inner expr took 91; found: 1
> 09 Jul 2010 16:52:50,789 [P1-9] DEBUG (XQuery.java [execute]:231) - 
> Execution took 3.710 ms

Is it first request, do you have same value for next one?

As you can see the selection is quite fast. The problem is some where
else, IMHO.

-- 
Cheers,

Dmitriy Shabanov

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-12 09:14:58

  Hi Dmitriy,

I get similar results for every repetition of the query:
12 Jul 2010 11:13:02,503 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 3.780 ms
12 Jul 2010 11:13:02,504 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:13:02,504 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:13:02,504 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 3781ms.
12 Jul 2010 11:13:02,504 [P1-9] DEBUG (RpcConnection.java [queryP]:2434) 
- found 1


12 Jul 2010 11:13:43,418 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 3.779 ms
12 Jul 2010 11:13:43,418 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:13:43,418 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:13:43,418 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 3779ms.

12 Jul 2010 11:13:59,979 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 3.908 ms
12 Jul 2010 11:13:59,979 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:13:59,980 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:13:59,980 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 3909ms.
12 Jul 2010 11:13:59,980 [P1-9] DEBUG (RpcConnection.java [queryP]:2434) 
- found 1


Patrick



> On Fri, 2010-07-09 at 16:54 +0200, Patrick Schäfer wrote:
>> 09 Jul 2010 16:52:50,788 [P1-9] TRACE (Optimize.java [eval]:160) -
>> exist:optimize: inner expr took 91; found: 1
>> 09 Jul 2010 16:52:50,789 [P1-9] DEBUG (XQuery.java [execute]:231) -
>> Execution took 3.710 ms
> Is it first request, do you have same value for next one?
>
> As you can see the selection is quite fast. The problem is some where
> else, IMHO.
>

Re: [Exist-development] query collection containing one million documents

From: Wolfgang M. <wol...@ex...> - 2010-07-10 08:19:31

> 09 Jul 2010 15:11:55,784 [P1-9] DEBUG (XQuery.java [compile]:161) -
> Compilation took 4 ms
> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java
> [visitGeneralComparison]:219) - exist:optimize: found optimizable:
> org.exist.xquery.GeneralComparison

Most of the time seems to be spent after the query was compiled and
before query execution starts. I would thus suspect that loading the
metadata of your documents is the bottleneck, not the actual query.
How long does it take to execute collection("tuscreen") on its own?

The document loading should only have a negative effect the first time
the query is executed. Subsequent queries will use the cached set.

You may also check if increasing the collectionCache setting in
conf.xml does help.

Wolfgang

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-12 09:24:11

  Hi Wolfgang,

I tried the following query:
count(collection("tuscreen"))


and in fact this gives me an execution time of several seconds:

12 Jul 2010 11:16:30,390 [P1-9] DEBUG (XQuery.java [compile]:155) - 
Query diagnostics:
count(collection("tuscreen"))
Query diagnostics:
collection("tuscreen")
12 Jul 2010 11:15:20,729 [P1-9] DEBUG (XQuery.java [compile]:161) - 
Compilation took 1 ms
12 Jul 2010 11:15:22,818 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 2.089 ms
12 Jul 2010 11:15:30,800 [P1-9] DEBUG (LRDCache.java [cleanup]:153) - 
totalReferences = 640001; maxReferences = 640000
12 Jul 2010 11:15:37,814 [P1-9] DEBUG (LRDCache.java [cleanup]:153) - 
totalReferences = 640001; maxReferences = 640000
12 Jul 2010 11:15:44,559 [P1-9] DEBUG (LRDCache.java [cleanup]:153) - 
totalReferences = 640001; maxReferences = 640000
12 Jul 2010 11:15:51,305 [P1-9] DEBUG (LRDCache.java [cleanup]:153) - 
totalReferences = 640001; maxReferences = 640000
12 Jul 2010 11:15:51,828 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 1278447949870
12 Jul 2010 11:15:51,992 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 1278447949870
12 Jul 2010 11:15:51,992 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 31263ms.

and the second time it says:

12 Jul 2010 11:16:30,390 [P1-9] DEBUG (XQuery.java [compile]:161) - 
Compilation took 1 ms
12 Jul 2010 11:16:33,475 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 3.085 ms
12 Jul 2010 11:16:33,475 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:16:33,475 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:16:33,476 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 3086ms.
12 Jul 2010 11:16:33,476 [P1-9] DEBUG (RpcConnection.java [queryP]:2434) 
- found 1

I increased the conf.xml settings to:
<db-connection cacheSize="4000M" collectionCache="1000M" database="native"
         files="webapp/WEB-INF/data" pageSize="4096" nodesBuffer="-1">

but this still results in several seconds:
12 Jul 2010 11:21:29,600 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 2.823 ms
12 Jul 2010 11:21:29,600 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:21:29,600 [P1-9] DEBUG (HTTPUtils.java 
[addLastModifiedHeader]:61) - mostRecentDocumentTime: 0
12 Jul 2010 11:21:29,601 [P1-9] INFO  (RpcConnection.java [doQuery]:303) 
- query took 2824ms.
12 Jul 2010 11:21:29,601 [P1-9] DEBUG (RpcConnection.java [queryP]:2434) 
- found 1

And memory usage is:
12 Jul 2010 11:22:23,519 [exist_QuartzScheduler_Worker-3] INFO  
(NativeBroker.java [sync]:3191) - Memory: 1.595.904K total; 7.281.792K 
max; 644.760K free


Patrick

>> 09 Jul 2010 15:11:55,784 [P1-9] DEBUG (XQuery.java [compile]:161) -
>> Compilation took 4 ms
>> 09 Jul 2010 15:11:58,119 [P1-9] TRACE (Optimize.java
>> [visitGeneralComparison]:219) - exist:optimize: found optimizable:
>> org.exist.xquery.GeneralComparison
> Most of the time seems to be spent after the query was compiled and
> before query execution starts. I would thus suspect that loading the
> metadata of your documents is the bottleneck, not the actual query.
> How long does it take to execute collection("tuscreen") on its own?
>
> The document loading should only have a negative effect the first time
> the query is executed. Subsequent queries will use the cached set.
>
> You may also check if increasing the collectionCache setting in
> conf.xml does help.
>
> Wolfgang

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-12 09:08:07

  Hi Joe,

I use the following settings:

conf.xml
<db-connection cacheSize="3000M" collectionCache="128M" database="native"
         files="webapp/WEB-INF/data" pageSize="4096" nodesBuffer="-1">


eXist-settings.sh:
     set_client_java_options() {
         if [ -z "${CLIENT_JAVA_OPTIONS}" ]; then
             CLIENT_JAVA_OPTIONS="-Xms128m -Xmx8000m -Dfile.encoding=UTF-8";
     }
     set_java_options() {
         if [ -z "${JAVA_OPTIONS}" ]; then
             JAVA_OPTIONS="-Xms128m -Xmx8000m -Dfile.encoding=UTF-8";
     }


> This might take you on the wrong track, but you could try adding a
> Lucene index on @id / @value.  Even though we usually think of using
> Lucene for fulltext search, it might be worth trying out the
> whitespace analyzer -- see http://exist-db.org/lucene.html#N102F8.
> Given all of the weird characters in your @value example, you should
> probably construct your query as XML rather than using the lucene
> search query parser, which could get thrown off -- see
> http://exist-db.org/lucene.html#N10352.  It'd be interesting to know
> how this works compared to the range indexes.
I created an lucene index on @value:
<lucene>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
<text qname="@value" analyzer="ws"/>
</lucene>

and used the query:

let $query :=
<query>
<term>Cc1c2c(ncn(c2=O)C)sc1C(=O)NCc1ccc(cc1)CN1CCOCC1</term>
</query>
for $i in collection("tuscreen")//*[ft:query(@value, $query)]
   return
     $i

Unfortunately it doesn't return a result?

But the query still takes:
12 Jul 2010 11:05:44,920 [P1-9] DEBUG (XQuery.java [execute]:231) - 
Execution took 5.456 ms

Patrick

Re: [Exist-development] query collection containing one million documents

From: Joe W. <jo...@gm...> - 2010-07-12 22:59:41

Hi Patrick,

> I created an lucene index on @value:
...
> for $i in collection("tuscreen")//*[ft:query(@value, $query)]
>  return
>    $i
>
> Unfortunately it doesn't return a result?

Strange.  Just checking: did you reindex after adding the index definition?

Cheers,
Joe

Re: [Exist-development] query collection containing one million documents

From: Patrick S. <pat...@zi...> - 2010-07-13 08:14:33

  Hi Joe,

>> for $i in collection("tuscreen")//*[ft:query(@value, $query)]
>>   return
>>     $i
>>
>> Unfortunately it doesn't return a result?
> Strange.  Just checking: did you reindex after adding the index definition?
Yes, I did ;)

Patrick

Re: [Exist-development] query collection containing one million documents

From: Adam R. <ad...@ex...> - 2010-07-15 09:21:12

Do you have the correct namespaces etc. Is that actually the correct
collection? Collection paths are typically expressed as
"/db/somecollection"

Also, what is your query and does it match your documents, perhaps you
could provide an example?

On 12 July 2010 23:59, Joe Wicentowski <jo...@gm...> wrote:
> Hi Patrick,
>
>> I created an lucene index on @value:
> ...
>> for $i in collection("tuscreen")//*[ft:query(@value, $query)]
>>  return
>>    $i
>>
>> Unfortunately it doesn't return a result?
>
> Strange.  Just checking: did you reindex after adding the index definition?
>
> Cheers,
> Joe
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb