Problem with UTF8 text containing "lock" symbol
Desktop search application
Brought to you by:
qforce
DocFetcher not indexing (not searching in) cyrillic text if it contains symbol 🔒 and encoded as UFT8.
For example create txt or html file that contains "мама папа hello 🔒 world" and try to index it.
DocFetcher will find "hello" and "world" but will not find "мама" and "папа".
Without this symbol everything works as expected
Anonymous
Hi,
in the Preferences, you can try some of the other word segmentation options. This might make the search work with this special character, but will also significantly impact the search results.
Regards
q:-) <= Quang
Hi.
I think that problem in codepage detection.
This "Lock symbol" makes indexer to work in wrong encoding.
Please look at my screenshot.
At the bottom you can see that found content has wrong codepage, not the same as indexed document.
Yes, it looks like the lock symbol causes DocFetcher to pick the wrong encoding. You can force it to use a particular encoding, but this will then apply to all text files. To force the encoding, open the file "program-conf.txt" and alter the setting "TextEncodingOverride". In this case, the following value works:
TextEncodingOverride = utf-8
Then save the file, restart the program, and rebuild all the relevant indexes.
Alternatively, you can try the commercial DocFetcher Pro. It seems to handle this case just fine, without any text encoding overrides.
Will be fixed in DocFetcher 1.1.27.