Re: [Exist-open] Java Heap Space Error

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Sharmin,

I can't speak directly to the memory aspect of your question, but I
have some suggestions for ways to optimize your query.

> <create qname="marc:record"/>
> <ngram qname="marc:record"/>

> for $record in /marc:collection/marc:record[fn:matches(., "design", 'i')]

Let me observe here that you've defined range and ngram indexes on
marc:record.  By using fn:matches(), your query is going to make use
of the range index.

1. "Function fn:matches returns true if any substring of its argument
string matches the regular expression. The query engine thus needs to
scan all index entries as the match could be at any position of an
entry.  You can reduce the range of entries to be scanned by anchoring
your pattern at the start of a string (where applicable):"  (See
http://exist-db.org/tuning.html#d1973e783; see also
http://exist-db.org/indexing.html#rangeidx and
http://demo.exist-db.org/functions/fn/matches.)

So try:

  for $record in /marc:collection/marc:record[fn:matches(., "^design$", 'i')]

See also http://demo.exist-db.org/functions/fn/matches for info
regarding these regular expression anchors.

2. Range indexes are best for strongly typed data or strings (see
http://exist-db.org/tuning.html#d1973e497).  I'd go further and say
that range indexes perform best when you're using =, <, or >.  If
marc:record doesn't fit this qualification, consider applying a Lucene
full text index on marc:record rather than a range or ngram (see
http://www.exist-db.org/lucene.html).  Your query would look like:

  for $record in /marc:collection/marc:record[ft:query(., "design")]

3. Whether you go with range or lucene for the index, you can optimize
your query by avoiding the top-down approach of selecting
"/marc:collection" before "marc:record" (see
http://exist-db.org/tuning.html#d1973e783).  Consider these
alternatives that would take this advice:

  for $record in //marc:record[fn:matches(., "^design$",
'i')][parent::marc:collection]

or if the parent::marc:collection isn't significant for your query:

  for $record in //marc:record[fn:matches(., "^design$", 'i')]

or in the Lucene case:

  for $record in //marc:record[ft:query(., "design")][parent::marc:collection]
  for $record in //marc:record[ft:query(., "design")]

Cheers,
Joe

On Thu, Sep 15, 2011 at 9:07 AM, Sharmin Choudhury
<sha...@ya...> wrote:
> Hi,
> I have posted about this before and unfortunately have still been unable to
> solve the problem. So I am trying again,
> eXist setup: Running on a standalone Jetty Server with JVM options -Xms1000m
> -Xmx5000m. The machine itself has 12.0 GB of memory. So plenty of memory.
> Dataset: MARC21XML library database that 831 MB in size.
> Index for the MARC21XML:
> <?xml version="1.0" encoding="UTF-8"?>
> <collection xmlns="http://exist-db.org/collection-config/1.0">
> <index xmlns:marc="http://www.loc.gov/MARC21/slim">
> <create qname="marc:record"/>
> <create qname="@tag"/>
> <create qname="@code"/>
> <create qname="marc:subfield"/>
> <create qname="marc:leader"/>
> <create qname="marc:datafield"/>
> <ngram qname="@tag"/>
> <ngram qname="@code"/>
> <ngram qname="marc:subfield"/>
> <ngram qname="marc:leader"/>
> <ngram qname="marc:datafield"/>
> <ngram qname="marc:record"/>
>     </index>
> </collection>
> Query I am trying to execute through the Sandbox:
> declare namespace marc = "http://www.loc.gov/MARC21/slim";
> for $record in /marc:collection/marc:record[fn:matches(., "design", 'i')]
> let $title := $record/marc:datafield[@tag='245']/marc:subfield/text()
> let $author :=
> $record/marc:datafield[@tag='100']/marc:subfield[@code='a']/text()
> let $otherauthor :=
> $record/marc:datafield[@tag='700']/marc:subfield[@code='a']/text()
> let $publocation :=
> $record/marc:datafield[@tag='260']/marc:subfield[@code='a']/text()
> let $publisher :=
> $record/marc:datafield[@tag='260']/marc:subfield[@code='b']/text()
> let $pubdate :=
> $record/marc:datafield[@tag='260']/marc:subfield[@code='c']/text()
> let $edition := $record/marc:datafield[@tag='250']/marc:subfield/text()
> let $description := $record/marc:datafield[@tag='653']/marc:subfield/text()
> let $campus :=
> $record/marc:datafield[@tag='949']/marc:subfield[@code='l']/text()
> let $shelf :=
> $record/marc:datafield[@tag='949']/marc:subfield[@code='s']/text()
> let $isbn :=
> $record/marc:datafield[@tag='020']/marc:subfield[@code='a']/text()
> return <data id="MDX
> Catalogue"><sort><x_sort>Date</x_sort><y_sort>Title</y_sort></sort><title
> id="Title">{$title}</title><top_left
> id="Date">{$pubdate}</top_left><subtitle
> id="Edition">{$edition}</subtitle><cat_1 id="Author(s)">
> <keyword_1>{$author[1]}</keyword_1> <keyword_2>{$otherauthor[1]}</keyword_2>
> <keyword_3>{$otherauthor[2]}</keyword_3></cat_1><cat_2><keyword_1
> id="Location">{$publocation}</keyword_1><keyword_2
> id="Publisher">{$publisher}</keyword_2><keyword_3 id="Library
> Location">{$campus[1]}, {$shelf[1]}</keyword_3></cat_2><blurb
> id="Description">{$description[1]}</blurb><drill_1 id="Web" type="Library
> Website">{$isbn}</drill_1><drill_2 id="Web"
> type="Waterstones"></drill_2></data>
>
> But eXist crashes giving Java Heap Space Error. I have been assured that
> eXist should be able to handle much larger datasets then the one I have but
> I am not sure what I am doing wrong. If anyone has any insights, that would
>  be really helpful.
> Thanks!
> ------------------------------------------------------------------------------
> Doing More with Less: The Next Generation Virtual Desktop
> What are the key obstacles that have prevented many mid-market businesses
> from deploying virtual desktops?   How do next-generation virtual desktops
> provide companies an easier-to-deploy, easier-to-manage and more affordable
> virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
> _______________________________________________
> Exist-open mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>

Re: [Exist-open] Java Heap Space Error

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Java Heap Space Error