From: Dan T. <sli...@gm...> - 2013-01-11 05:53:38
|
On Tue, Jan 8, 2013 at 1:28 AM, Adam Retter <ad...@ex...> wrote: >> Wanted to ask for hints on how I could optimize performance of some >> XQuery application that I'm playing with. The performance really >> become a development bottleneck so need to address it. > > Firstly you need to find out which part of your query is causing you > performance problems. There are several ways to do this: > > 1) add util:log statements throughout your code, logging the time that > parts that you think might be hot-spots are to narrow down where the > problem is. > > 2) use the profiler in the Web Admin to work out what is slow about your query. > > 3) Run eXist using a Java Profiler, and then execute your query. This > will show the hot-spots in eXist where most of the time is spent > during processing. > > I would start with 1 or 2 first. > > Once you have done this, tell us, we can discuss optimisations of > specific query parts. There is potentially lots to be gained from > small changes to your xquery without needing to dig into Java at all. So first observations without too much digging into profiling are: * as you mentioned - modules are loaded per query execution; I have 3 modules of size around 2k lines that build around 1000 maps that form graph, leafs of which are atomic types; seems to create few second bottleneck each time I change the query on file system; the structures are defined as "declare variable $bla:bla := ..." - can I measure execution of declare statement somehow? * my second bottleneck is the initialization procedure that now is executed per query execution; I tried to put it into session but still it is generally bottleneck; can I somehow have initi XQuery script that creates structures shared application scope wide? On the rest I'll follow up. > >> So first question would be how much the syntax tree of query involved >> in request processing is affecting the overall resources? Lets say if >> I generate code that contains hundreds of lines, how is that affecting >> performance? Would I be much better off making that code external data >> (say XML) and import it? > > Probably very little effect, we have users that run applications which > are thousands of lines of XQuery split across many modules of code. At > runtime XQuery is compiled into Java (effectively), this compiled > representation is re-used each time your query executes, so the > compilation is only required the first time and its quite fast anyway. > Also after multiple executions the Java HotSpot optimizer will > optimize the code further, so it will get faster too. Well I do experience few seconds delay explicitly because of reloading of large modules. > >> Second would be what is the cost of maps and functions? Can my >> performance bottleneck be completely a cost of having few thouthands >> of maps (3 or something) and about hundred of closures hanging as >> module variables in memory? > > I guess this is not a problem so much, how much data are you keeping > in your maps? In total I guess 5k of maps maybe less, not sure what is the size in KB/MB, but memory eaten by eXist Java process is quite siginificant - at magnitude of 50MB I think (not exact measurement) - and constantly leaking somewhere. I guess the leaking part is on my side, but I do not use any functionality with side effects except session (I do read files but not write). > >> Also I'm not sure what is the module memory consumption strategy - is >> it singleton entity in memory or are those all coppies per request? > > A module is loaded for each query execution. Main module can be re-used. :( What do you mean by main module? > >> Sometimes I don't really fill desired expression evaluation caching - >> rerunning request on non changed files is of the same cost as first >> run. Do I need to configure anything? > > Nothing I can think of, although I dont really understand what you are > explaining exactly... That as functional language XQuery expressions must be heavily cached, but I don't really feel like it happens or don't notice. A simple example would be: I sometimes need an index of an element in collection (or any other computation you can think of), I cold compute those initially and bind them to the element somehow and pass around, instead I pass around the collection and element and all the parties that need the index invoke a function to compute it. I didn't try this explicitly, but I would hope that this: ( my:index-of-element-in-collection($element, $collection), my:index-of-element-in-collection($element, $collection), my:index-of-element-in-collection($element, $collection), my:index-of-element-in-collection($element, $collection), my:index-of-element-in-collection($element, $collection), my:index-of-element-in-collection($element, $collection) ) has an execution time of slightly more than single invokation (not in all cases of course, but in case of "pure" expressions), and definitely not close to 6 times single execution time. > >> Looking far I would probably need to move towards Java implementation >> of those structures that I store in maps and closures (XML is no good >> because of inability to reference existing entities), and I would be >> courious as to how much it's possible to integrate external structures >> into XQuery? Is it possible at all? If it is where would I start >> digging? > > You dont have to store XML in your maps, you can store any XDM type. Well XDM still means Sequence + XML + AtomicType right? I need more of a Sequence + CustomComplexDataType + AtomicType. Is it really possible to create such CustomComplexDataType that XQuery could recieve and work with? And by "work with" I mean for example being able to use 'eq' operator and have map-like component access - $object('name'), etc. I would suppose it's not easy (not by default supported) but maybe possible? > > However if your data storage (in maps) is complex and you store a node > from the database, this should internally in eXist just be a proxy to > the node in the database and so should be quite efficient. You could > also try storing a reference to the node directly in a map, if its a > document use the uri (fn:document-uri(fn:root($node)), otherwise there > is also a function for returning a numeric id for a node > (util:node-id($node)), which can then later be looked up by id > (util:node-by-id($id)). However if you are constructing XML and > storing it in the map then this is probably quite memory expensive, > however I would not expect it to be slow. I don't use XML data, it has limitations that create too much overhead for me (mainly incapability of references and strings as the only attribute type, plus attr vs element is still a constant war in my head :) ). So my data is purely Sequence + Map + AtomicType, nothing else (except some few more anonymous functions passed around as access wrappers. The code is written without too much care about optimal computations and so on, but as I said previosly I would hope that caching would save me in that regard. End result my code, which is not small and simple, but also not overly heavy computations, actually mostly simple ones, takes around 1min 30sec. I'll dig in profiling, but I wouldn't also want to distract from my current problem and go into performance optimizations. So first I look for quick wins and the option of making Java implementation is hypothetical, but a good possibilities opener for the future. > > I very much doubt that you will need to do any implementation in Java, > more than likely with a bit of investigation we can help you adjust > your xquery code to be much faster. > > However, for the record, yes it is easy to pass Java objects between > custom XQuery functions in eXist. However this approach lives outside > of the XQuery spec and is really not recommended, we pretty much never > do this anymore. No I'm not that interested in "hacking" things in, rather trying to see if there is a right way to do this (to add custom data types/structures). > >> >> Thanx in advance! >> >> ------------------------------------------------------------------------------ >> Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS >> and more. Get SQL Server skills now (including 2012) with LearnDevNow - >> 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only - learn more at: >> http://p.sf.net/sfu/learnmore_122512 >> _______________________________________________ >> Exist-open mailing list >> Exi...@li... >> https://lists.sourceforge.net/lists/listinfo/exist-open > > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > ad...@ex... > irc://irc.freenode.net/existdb |