Problem with UTF8 text containing "lock" symbol

Desktop search application

Brought to you by: qforce

#2403 Problem with UTF8 text containing "lock" symbol

Milestone: v1.0_(example)

Status: closed

Owner: nobody

Labels: None

Priority: 1

Updated: 2026-01-07

Created: 2024-12-10

Creator: Anonymous

Private: No

DocFetcher not indexing (not searching in) cyrillic text if it contains symbol 🔒 and encoded as UFT8.
For example create txt or html file that contains "мама папа hello 🔒 world" and try to index it.
DocFetcher will find "hello" and "world" but will not find "мама" and "папа".
Without this symbol everything works as expected

1 Attachments

1.jpg

Discussion

Nam-Quang Tran - 2024-12-11

Hi,

in the Preferences, you can try some of the other word segmentation options. This might make the search work with this special character, but will also significantly impact the search results.

Regards
q:-) <= Quang

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2024-12-13
  
  Hi.
  I think that problem in codepage detection.
  This "Lock symbol" makes indexer to work in wrong encoding.
  Please look at my screenshot.
  At the bottom you can see that found content has wrong codepage, not the same as indexed document.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Nam-Quang Tran - 2024-12-13

Yes, it looks like the lock symbol causes DocFetcher to pick the wrong encoding. You can force it to use a particular encoding, but this will then apply to all text files. To force the encoding, open the file "program-conf.txt" and alter the setting "TextEncodingOverride". In this case, the following value works:

TextEncodingOverride = utf-8

Then save the file, restart the program, and rebuild all the relevant indexes.

Alternatively, you can try the commercial DocFetcher Pro. It seems to handle this case just fine, without any text encoding overrides.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Nam-Quang Tran - 2026-01-07

Will be fixed in DocFetcher 1.1.27.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous