From: Michael B. <mbe...@mb...> - 2007-08-08 06:56:17
|
Hugh Cayless wrote: > (NativeElementIndex.java [findDescendantsByTagName]:798) - Found > > wrong prefix len: 4. Previous: 1.3.4.51 > The pertinent part of the stack trace suggests you have uncovered a bug in the newer indexing code. Normally, I'd advise just zipping up one of the problem files as-is and sending it to Wolfgang, but he and the other developers have a huge amount on their plates right now, and it might be helpful to try to narrow down the possibilities a bit more first. Suggestions for doing that are a bit hard to formulate because of the terminology in your posting. I'm sure that what you call "documents" have every right to that name from the perspective of your users, but they pretty plainly aren't "documents" in an XML sense, nor are the TEI document instances. In the terms we need to use here, the documents are what you call your "files", and that is what I shall mean by "document" from here on. 1. Try to determine whether the error is in the generation or traversal of the indexes. What happens if you completely re-index the collections concerned? If the queries then work as expected, then things went wrong with the numbering scheme on the earlier indexing run but not when it was repeated. 2. If you get the same wrong results and error report after a complete re-index, try to make eXist follow a partially different evaluation strategy for the query. e.g. query on //*[@id='csr06-0052'] or matches(//*/@id, 'csr06-0052'). Not that I would recommend either of these for performance reasons, but the aim is to see if different paths through the matching code hit the same bug. You could also modify the queries to be a lot more structurally explicit, again to alter eXist's internal evaluation strategy. I don't know your markup details, but something like //text/group/text/body/div1[@id=csr06-0052']. More drastically still, remove your custom index configuration files (I assume you have configured range indexes on those div id's otherwise you would have a very long wait even if the right results were forthcoming), reindex (vital to do that), and then retry your queries. Again, not a production suggestion, but simply a means of trying to pinpoint the portion of the eXist code where things are going wrong. 3. If the problem persists, try to concentrate investigations on structure rather than possible side effects of textual quantity. Put one of the large documents through an identity transform with the text() node handler doing a NOP. That will give you the same structure for eXist to index, but without the textual bulk. Load the result into eXist and try the query again. If you get the same failure on this structural skeleton, then that would probably be a suitable testbed document to send for developer scrutiny. But before doing that, if you understand the logic of eXist's internal ID generation scheme, you could output the skeleton after turning on addition of all eXist IDs in the serialiser and then scrutinise the IDs that have been assigned to the divs concerned to see if they make sense. Michael Beddow |