AW: [Exist-open] full-text search problem

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Your right: the indexer is currently not very intelligent and treats
punctuation characters as word boundaries. I planned to change this in the
last release but finally I forgot. I will try to write a better tokenizer
class if I have some spare time today or tomorrow. However it's not always
easy to define a correct behaviour. Some punctuation characters actually
mark word boundaries while others do not as in 'Q.4068'. I would welcome any
suggestions about a good lexical analyzer we may use here.

Wolfgang

[snip]

>> Now if I run any of the following xpath queries I get an empty result:
>> collection('/test')/person[. &= 'My_city']
>> collection('/test')/person[. &= '5/46']
>> collection('/test')/person[. &= 'Q.4068']
>> collection('/test')/person[. &= '123\456']
>>
>> But if run one of the following then the document is returned:
>> collection('/test')/person[. &= 'My city']
>> collection('/test')/person[. &= '5 46']
>> collection('/test')/person[. &= 'Q 4068']
>> collection('/test')/person[. &= '123 456']
>>
>
>My data doesn't normally contain such characters, but I've created some
>items that do, and I can reproduce the problem. I guess the indexer is
>treating "punctuation" characters as word boundaries, so My_city is getting
>indexed as two separate words, hence the ability to retrieve it only when
>the underscore is left out. I fear this looks like a bug....
>
>Michael

AW: [Exist-open] full-text search problem

eXist-db is a feature rich Open Source native XML database

AW: [Exist-open] full-text search problem