From: Pierrick B. <pie...@fr...> - 2005-05-24 17:11:00
|
Hi all, Michael Beddow a =E9crit : > Wolfgang wrote: >=20 >=20 >>* fn:matches is now case sensitive by default - unless the flag "i" is >=20 > specified >=20 >>* strings are stored case sensitive in the range index, so fn:matches >>is index-based for case-sensitive pattern matching. Contrary to that, >>if the "i" flag is specified, fn:matches does NOT use the index. >=20 >=20 > This is a tricky one. I agree that previous situation where eXist's def= ault > behaviour was at odds with the spec was highly undesirable. But the pro= blem > now is that case insensitive searches are precisely what I need to do w= ith > range indexes, so I can't afford the performance hit of losing that > facility; and because of the mixed-content issue I can't switch to the > fulltext index for those searches. >=20 > Would it be possible to make case insensitivity of the range indexing a > configurable indexer option? Everybody has his own needs. IMHO, case insensitivity is foreign to the=20 XML spirit (despite the everyday usage of ISO-XXXX users... like me). I may admit however that Unicode case folding may be considered (see=20 Sjur's message). To make everybody happy, see=20 http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1069335&grou= p_id=3D17691&atid=3D367691=20 : the basic idea is to send a "stream" (element content, attribute=20 content, mixed-content, whatever in fact...) to an analyzer that, in=20 turn, generates positionned tokens in the index files (positionning is=20 important with ambiguous tokens, phrase queries...). Tokenization, transformation, filtering would so be the analyzer's job=20 and, thus, Lucene's contributers' one ;-) Sorry to come back on this issue with not even a little patch to help.=20 I'm currently tracing eXist's calls to try to understand its indexing=20 policy, but it seems that indexing code becomes more and more present=20 throughout eXist's low-level classes. Does anyone have documentation, diagrams that could help me to make more=20 concrete propositions ? Well, I hav to have a look at today's commits=20 (javadocs)... Don't be offended ; I'd really like eXist to have a smart=20 indexing/search engine... as we all do, don't we ? Cheers, p.b. |