[CLucene-dev] Inquiry about CLucene's UTF-8 support

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear developers,

I am using CLucene in my project and I would like to inquire about the
UTF-8 encoding support in the Standard Analyzer. Specifically, I would
like to know if the Standard Analyzer handles tokenization and text
processing correctly for non-Latin UTF-8 encoded text.

Could you please confirm if the Standard Analyzer in CLucene has
built-in support for UTF-8 encoded text? If not, are there any
recommended alternatives or additional analyzers that provide better
support for non-Latin UTF-8 text?

The below is the search results of few queries
Max Docs: 1
Num Docs: 1
Current Version: 1688707923968.0
Term count: 66

Enter query string: dignissimos
Searching for: dignissimos

0. /home/nonLatin100Rows.csv - 0.04746387

Search took: 0 ms.
Screen dump took: 0 ms.

Enter query string: διαχειριστής
Searching for:

Search took: 0 ms.
Screen dump took: 0 ms.
Thank you for your time.

- Achyuth Pramod