From: Sharmin C. <sha...@ya...> - 2011-07-12 15:59:05
|
Thanks for the tips. I have about 8 GB of memory allocated to the JVM for eXist. I did not know you could index on @tag and @code. I guess I'll start there and then optimise my XQuery. Thanks again. ________________________________ From: Adam Retter <ad...@ex...> To: Sharmin Choudhury <sha...@ya...> Cc: "exi...@li..." <exi...@li...> Sent: Tuesday, July 12, 2011 4:31 PM Subject: Re: [Exist-open] eXist XQuery performance issue On 12 July 2011 17:30, Adam Retter <ad...@ex...> wrote: >> Novice eXist user here. I have a xml file stored on my eXist database which >> is about 1.09 GB in size. It's in MARC21XML format >> (http://www.loc.gov/standards/marcxml/), i.e. it's a library catalogue file, >> and I am just trying to run some basic search queries on the data. Below is >> my the XQuery I ran but it threw an out of memory error after about 6 >> minutes. > > How much memory have you allocated to the JVM for eXist? > >> declare namespace marc = "http://www.loc.gov/MARC21/slim"; >> let $record := /marc:collection/marc:record >> for $record in doc("catalog.xml") >> where >> fn:matches(/marc:collection/marc:record/marc:datafield[@tag='245']/marc:subfield[@code='a']/text(), >> "drawing") >> return $record > > I think this query could be re-factored, also try and avoid using > 'where' and use predicates instead - > > > for $record in doc("catalog.xml")/marc:collection/marc:record[marc:datafield[@tag > eq '245'][marc:subfield/@code eq 'a'][. eq 'drawing']] > return $record Also you should ensure that you have indexes defined on @tag, @code and marc:record... > >> I am also pretty new to XQuery so the above query, which is searching for >> matches in title for the word "drawing" might also not be the most >> efficient. So any tips and pointer, either to index eXist differently or to >> write my XQuery better would be much appreciated. >> Thanks in advance! >> -A n00b :-) >> ------------------------------------------------------------------------------ >> All of the data generated in your IT infrastructure is seriously valuable. >> Why? It contains a definitive record of application performance, security >> threats, fraudulent activity, and more. Splunk takes this data and makes >> sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunk-d2d-c2 >> _______________________________________________ >> Exist-open mailing list >> Exi...@li... >> https://lists.sourceforge.net/lists/listinfo/exist-open >> >> > > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > ad...@ex... > irc://irc.freenode.net/existdb > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |