Re: [Exist-open] Bug? Case-sensitive searches with fn:matches() not working

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> Everybody has his own needs. IMHO, case insensitivity is foreign to the
> XML spirit (despite the everyday usage of ISO-XXXX users... like me).

Yes, but I also share Michael's argument that case insensitive matching 
is important if you want to search mixed content efficiently. I'm 
currently trying to figure out a good compromise that serves all users ;-)

> To make everybody happy, see
> http://sourceforge.net/tracker/index.php?func=detail&aid=1069335&group_id=17691&atid=367691 
> 
> : the basic idea is to send a "stream" (element content, attribute
> content, mixed-content, whatever in fact...) to an analyzer that, in
> turn, generates positionned tokens in the index files (positionning is
> important with ambiguous tokens, phrase queries...).
> 
> Tokenization, transformation, filtering would so be the analyzer's job
> and, thus, Lucene's contributers' one ;-)

Integrating Lucene's analyzer is on my wish list too. I had a look at 
the sources a few days ago. I would really like to work on an 
integration, but I currently lack the time.

> Sorry to come back on this issue with not even a little patch to help.
> I'm currently tracing eXist's calls to try to understand its indexing
> policy, but it seems that indexing code becomes more and more present
> throughout eXist's low-level classes.

The range index compares entire node values, it does not need 
tokenization. So the main class to be changed is NativeTextEngine. It 
uses the package org.exist.storage.analysis for tokenization, in 
particular, the Tokenizer interface. I think this is the point where 
Lucene's analyzer architecture would have to be plugged in, at least for 
a start.

Wolfgang

Re: [Exist-open] Bug? Case-sensitive searches with fn:matches() not working

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Bug? Case-sensitive searches with fn:matches() not working