Re: [Exist-open] Odd problem with large files.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hugh Cayless wrote:

> (NativeElementIndex.java [findDescendantsByTagName]:798) - Found  >
> wrong prefix len: 4. Previous: 1.3.4.51
>

The pertinent part of the stack trace suggests you have uncovered a bug in
the newer indexing code.

Normally, I'd advise just zipping up one of the problem files as-is and
sending it to Wolfgang, but he and the other developers have a huge amount
on their plates right now, and it might be helpful to try to narrow down the
possibilities a bit more first.

Suggestions for doing that are a bit hard to formulate because of the
terminology in your posting. I'm sure that what you call "documents" have
every right to that name from the perspective of your users, but they pretty
plainly aren't "documents" in an XML sense, nor are the TEI document
instances. In the terms we need to use here, the  documents are what you
call your "files", and that is what I shall mean by "document" from here on.

1. Try to determine whether the error is in the generation or traversal of
the indexes. What happens if you completely re-index the collections
concerned? If the queries then work as expected, then things went wrong with
the numbering scheme on the earlier indexing run but not when it was
repeated.

2. If you get the same wrong results and error report after a complete
re-index, try to make eXist follow a partially different evaluation strategy
for the query. e.g. query on //*[@id='csr06-0052'] or matches(//*/@id,
'csr06-0052'). Not that I would recommend either of these for performance
reasons, but the aim is to see if different paths through the matching code
hit the same bug. You could also modify the queries to be a lot more
structurally explicit, again to alter eXist's internal evaluation strategy.
I don't know your markup details, but something like
//text/group/text/body/div1[@id=csr06-0052']. More drastically still, remove
your custom index configuration files (I assume you have configured range
indexes on those div id's otherwise you would have a very long wait even if
the right results were forthcoming), reindex (vital to do that),  and then
retry your queries. Again, not a production suggestion, but simply a means
of trying to pinpoint the portion of the eXist code where things are going
wrong.

3. If the problem persists, try to concentrate investigations on structure
rather than possible side effects of textual quantity. Put one of the large
documents through an identity transform with the text() node handler doing a
NOP. That will give you the same structure for eXist to index, but without
the textual bulk. Load the result into eXist and try the query again. If you
get the same failure on this structural skeleton, then that would probably
be a suitable testbed document to send for developer scrutiny. But before
doing that, if you understand the logic of eXist's internal ID generation
scheme, you could output the skeleton after turning on addition of all eXist
IDs in the serialiser and then scrutinise the IDs that have been assigned to
the divs concerned to see if they make sense.

Michael Beddow

Re: [Exist-open] Odd problem with large files.

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Odd problem with large files.