From: Adam R. <ad...@ex...> - 2008-08-06 08:20:44
|
Some questions - 1) Do you have a single XML file with all data in and do you need to be able to search all data? Could this be split by operation or date or something? Reducing the data domain of course will help any system. 2) Have you established any indexes on the data for your query? 3) Are you sorting your results, sorting is slow? 4) What does your actual query for this operation look like? Small changes to this can make a big difference... 2008/8/6 Dan Retzlaff <dre...@gm...>: > Dear Smart People, :) > > My team is having a hard time creating a query that scales well with > increasing data set size. We have several events that happen repeatedly > during operation (call them A, B, and X). We represent the events as XML and > insert them into a single resource (call it Data). They're all essentially > random, but A and B happen significantly more often than X. So we end up > with a document structure like this (children of A, B and X are omitted for > brevity): > > <Data> > <A/> > <A/> > <B/> > <A/> > <B/> > <B/> > <X/> > <A/> > </Data> > > Our query is simple in principle: We need to return all instances of X, > including some additional information contained in the most recent A, i.e. > X/preceding-sibling::A[1]. We're finding that this query executes in O(N**2) > time, which does not meet our performance requirements when count(Data/*) > reaches into the thousands. Since the distance between X and the preceding A > is constant (statistically speaking), I would have expected O(N). > > My interpretation / assumption is that despite the "[1]" subscript, eXist > evaluates the "preceding-sibling" term by building a *complete* node-set > before filtering it down to the first match. Do you agree? > > Maybe someone can suggest a O(N) or O(NlogN) query or design for us? We have > total flexibility in reorganizing the collection/resource structure if that > would help. > > Thanks, > Dan > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > > -- Adam Retter eXist Developer { England } ad...@ex... irc://irc.freenode.net/existdb |