From: Greg B. <bar...@gm...> - 2012-01-28 14:34:41
|
Hi Wolfgang, Good question. The answer is exactly 3 child elements. However, if it makes any difference and it probably does not, this node set is filtered from the general population. The general population would have a bout 200 different child elements -- as are about 50 variations. My fix was to hard code each of the 3 element's output instead of the wildcard loop. This, quite literally, changed the processing time from 250+ seconds to about 1.3 seconds. Maybe it is something else in the loop -- though I do not see any prospects, as it is very simple. I did re-arrange the code a bit to hard code the loop. Below are the before and after code snippets: ( At this point in the code there are 25 $searchedQuotes elements each with 3 simple child elements.) *Before:* return <Quotes searchString='{$data/searchString}' total='{$total}' begin='{$begin}' count='{$returnCount}'> { for $ent in $searchedQuotes let $score := if (exists($data/searchString) and $searchString != "") then ft:score($ent) else 0 let $tDoc := $ent/.. let $doc := if ($col_Lists) then ( $collection[@docURI = $ent/docURI] ) else ( let $title := $tDoc/@title let $title:= replace($title, "'", "'") let $title:= replace($title, '"', """) return <Document title='{$title}' docURI='{$tDoc/@docURI}' URL='{$tDoc/@URL}' addedDate='{$tDoc/@addedDate}' mimeType="{$tDoc/@mimeType}"/> ) return <Event score='{$score}' name='Quotation' URI='{$ent/@URI}' type="Quotation"> { for $node in $ent/* let $nName := local-name($node) return if ($nName = "docURI") then ($doc) else ( $node, (: include the entity of the quote, is this needed? :) if ($nName = "Subject") then ($tDoc/Entity[@URI = $node/@URI]) else () ) } </Event> } </Quotes> *After:* return <Quotes searchString='{$data/searchString}' total='{$total}' begin='{$begin}' count='{$returnCount}'> { for $ent in $searchedQuotes let $score := if (exists($data/searchString) and $searchString != "") then ft:score($ent) else (0) return <Event score='{$score}' name='Quotation' URI='{$ent/@URI}' type="Quotation"> { $ent/Object, $ent/../Entity[@URI = $ent/Subject/@URI], $collection[@docURI = $ent/docURI] } </Event> } </Quotes> Everything else is the same. Hope this helps some. Thank you, Greg On Sat, Jan 28, 2012 at 7:34 AM, Wolfgang Meier <wol...@ex...>wrote: > Hi Greg, > > > I managed to boil my problems all down to this one specific line of code > > that cause the blowup! > > > > This is: > > for $node in $ent/* > > As I wrote, eXist 1.5/trunk takes a different path than 1.4.x when > evaluating $ent/*: in both cases, the evaluation of * is postponed > until local-name($node) forces eXist to materialize the node set. > 1.4.x will do a tree traversal from $ent to find matching child > elements. 1.5/trunk takes a different path and checks the structural > index for possible matches. It thus needs one index check for each > node in the context sequence ($ent) and each element name in the > collection. For larger sets this should be quite a bit faster than a > tree traversal and better for concurrency. > > To test this, I tried to simulate your query on a data set with 50k > documents. Unfortunately, eXist picked the correct elements matching a > given $ent/* in 4ms. > > There must thus be something specific in your data set which makes the > performance break down completely. Unfortunately I'm not sure what it > could be, but I certainly want to fix it. One difference I could think > of: my data set only used a couple of different elements, maybe a > dozen altogether. Is it possible that your schema has a lot more > distinguished element/attribute names? If the number goes into the > hundreds, it could maybe explain the dramatic performance breakdown > and we could add a rule to deal with this scenario. > > Wolfgang > -- Greg Bardwell 301-910-2199 (C) 301-299-0254 (W) bar...@gm... |