From: Hugh A. C. <hca...@em...> - 2007-08-08 13:41:09
|
On Wed, 2007-08-08 at 07:58 +0100, Michael Beddow wrote: > Hugh Cayless wrote: > > > (NativeElementIndex.java [findDescendantsByTagName]:798) - Found > > > wrong prefix len: 4. Previous: 1.3.4.51 > > > > The pertinent part of the stack trace suggests you have uncovered a bug in > the newer indexing code. > > Normally, I'd advise just zipping up one of the problem files as-is and > sending it to Wolfgang, but he and the other developers have a huge amount > on their plates right now, and it might be helpful to try to narrow down the > possibilities a bit more first. > > Suggestions for doing that are a bit hard to formulate because of the > terminology in your posting. I'm sure that what you call "documents" have > every right to that name from the perspective of your users, but they pretty > plainly aren't "documents" in an XML sense, nor are the TEI document > instances. In the terms we need to use here, the documents are what you > call your "files", and that is what I shall mean by "document" from here on. No, that's right. Its a bit confusing because these are compilations of original documents, but each compilation is a single XML document, as you say. > > 1. Try to determine whether the error is in the generation or traversal of > the indexes. What happens if you completely re-index the collections > concerned? If the queries then work as expected, then things went wrong with > the numbering scheme on the earlier indexing run but not when it was > repeated. I'd neglected to mention I already tried that. Reindexing does nothing. > > 2. If you get the same wrong results and error report after a complete > re-index, try to make eXist follow a partially different evaluation strategy > for the query. e.g. query on //*[@id='csr06-0052'] or matches(//*/@id, > 'csr06-0052'). Not that I would recommend either of these for performance > reasons, but the aim is to see if different paths through the matching code > hit the same bug. for $div in //*[@id='csr06-0052'] return $div does return the right results, as does for $div in //TEI.2/text/(body | front)/div1[@id='csr06-0053'] return $div Which is encouraging > You could also modify the queries to be a lot more > structurally explicit, again to alter eXist's internal evaluation strategy. > I don't know your markup details, but something like > //text/group/text/body/div1[@id=csr06-0052']. More drastically still, remove > your custom index configuration files (I assume you have configured range > indexes on those div id's otherwise you would have a very long wait even if > the right results were forthcoming), reindex (vital to do that), and then > retry your queries. I actually hadn't done any range indexing yet. This is at the early stages of development, so I haven't gotten to the "make it run faster stage". In any case, performance is really pretty good even without them--in the 10s of milliseconds. EXist is a really impressive piece of software! > Again, not a production suggestion, but simply a means > of trying to pinpoint the portion of the eXist code where things are going > wrong. > > 3. If the problem persists, try to concentrate investigations on structure > rather than possible side effects of textual quantity. Put one of the large > documents through an identity transform with the text() node handler doing a > NOP. That will give you the same structure for eXist to index, but without > the textual bulk. Load the result into eXist and try the query again. If you > get the same failure on this structural skeleton, then that would probably > be a suitable testbed document to send for developer scrutiny. But before > doing that, if you understand the logic of eXist's internal ID generation > scheme, you could output the skeleton after turning on addition of all eXist > IDs in the serialiser and then scrutinise the IDs that have been assigned to > the divs concerned to see if they make sense. I'm afraid I haven't been able to make the ID generation work. Setting add-exist-id to "all" in the conf.xml file and restarting has no effect. Something else must be overriding it, but I won't have time to fool with it more until tomorrow at the earliest, unfortunately. I'll make a bug report and attach my skeleton XML file. At least there seems to be a workaround for the specific problem I was having. Thanks! Hugh > > Michael Beddow > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open |