Re: [Exist-open] Some questions about qname indexes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Kai,

> Here some examples (docs + index conf attached). The following query yields 11 hits with
> the "old" configuration and 5 hits with the qname configuration:

The differing query results were caused by differences in whitespace
handling and text tokenization between the standard full text index
and the index configured by QName. Fixing this problem wasn't that
easy, especially since the old indexer code was a bit chaotic. I thus
decided to migrate the relevant parts of the code to our new
modularized indexing framework, which has a much cleaner design (the
switch to the new architecture was planned for later this year, but
now we have already done a part of it).

The index configuration is now more consistent:

<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index>
        <fulltext default="none">
            <!-- 1. tokenizer splits text at element boundaries -->
            <include path="/elem"/>
            <create qname="elem"/>

            <!-- 2. ignore element boundaries, index as mixed content -->
            <include path="/elem" content="mixed"/>
            <create qname="elem" content="mixed"/>
        </fulltext>
    </index>
</collection>

Without the content="mixed" attribute, the tokenizer will split the
text at element boundaries, i.e.
unexpected will result in 2 tokens in
the index: "un" and "expected". If you add content="mixed",
"unexpected" will be treated as 1 token!

For your use case - query the entire <content> element with all
subelements - you should create an index without the content="mixed"
attribute.

Wolfgang

Re: [Exist-open] Some questions about qname indexes

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Some questions about qname indexes