On Wed, 2007-08-08 at 07:58 +0100, Michael Beddow wrote:
> Hugh Cayless wrote:
> > (NativeElementIndex.java [findDescendantsByTagName]:798) - Found >
> > wrong prefix len: 4. Previous: 126.96.36.199
> The pertinent part of the stack trace suggests you have uncovered a bug in
> the newer indexing code.
> Normally, I'd advise just zipping up one of the problem files as-is and
> sending it to Wolfgang, but he and the other developers have a huge amount
> on their plates right now, and it might be helpful to try to narrow down the
> possibilities a bit more first.
> Suggestions for doing that are a bit hard to formulate because of the
> terminology in your posting. I'm sure that what you call "documents" have
> every right to that name from the perspective of your users, but they pretty
> plainly aren't "documents" in an XML sense, nor are the TEI document
> instances. In the terms we need to use here, the documents are what you
> call your "files", and that is what I shall mean by "document" from here on.
No, that's right. Its a bit confusing because these are compilations of
original documents, but each compilation is a single XML document, as
> 1. Try to determine whether the error is in the generation or traversal of
> the indexes. What happens if you completely re-index the collections
> concerned? If the queries then work as expected, then things went wrong with
> the numbering scheme on the earlier indexing run but not when it was
I'd neglected to mention I already tried that. Reindexing does nothing.
> 2. If you get the same wrong results and error report after a complete
> re-index, try to make eXist follow a partially different evaluation strategy
> for the query. e.g. query on //*[@id='csr06-0052'] or matches(//*/@id,
> 'csr06-0052'). Not that I would recommend either of these for performance
> reasons, but the aim is to see if different paths through the matching code
> hit the same bug.
for $div in //*[@id='csr06-0052']
does return the right results, as does
for $div in //TEI.2/text/(body | front)/div1[@id='csr06-0053']
Which is encouraging
> You could also modify the queries to be a lot more
> structurally explicit, again to alter eXist's internal evaluation strategy.
> I don't know your markup details, but something like
> //text/group/text/body/div1[@id=csr06-0052']. More drastically still, remove
> your custom index configuration files (I assume you have configured range
> indexes on those div id's otherwise you would have a very long wait even if
> the right results were forthcoming), reindex (vital to do that), and then
> retry your queries.
I actually hadn't done any range indexing yet. This is at the early
stages of development, so I haven't gotten to the "make it run faster
stage". In any case, performance is really pretty good even without
them--in the 10s of milliseconds. EXist is a really impressive piece of
> Again, not a production suggestion, but simply a means
> of trying to pinpoint the portion of the eXist code where things are going
> 3. If the problem persists, try to concentrate investigations on structure
> rather than possible side effects of textual quantity. Put one of the large
> documents through an identity transform with the text() node handler doing a
> NOP. That will give you the same structure for eXist to index, but without
> the textual bulk. Load the result into eXist and try the query again. If you
> get the same failure on this structural skeleton, then that would probably
> be a suitable testbed document to send for developer scrutiny. But before
> doing that, if you understand the logic of eXist's internal ID generation
> scheme, you could output the skeleton after turning on addition of all eXist
> IDs in the serialiser and then scrutinise the IDs that have been assigned to
> the divs concerned to see if they make sense.
I'm afraid I haven't been able to make the ID generation work. Setting
add-exist-id to "all" in the conf.xml file and restarting has no effect.
Something else must be overriding it, but I won't have time to fool with
it more until tomorrow at the earliest, unfortunately. I'll make a bug
report and attach my skeleton XML file. At least there seems to be a
workaround for the specific problem I was having. Thanks!
> Michael Beddow
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> Exist-open mailing list