From: Achyuth P. <ach...@gm...> - 2023-07-10 12:32:47
|
Dear developers, I am using CLucene in my project and I would like to inquire about the UTF-8 encoding support in the Standard Analyzer. Specifically, I would like to know if the Standard Analyzer handles tokenization and text processing correctly for non-Latin UTF-8 encoded text. Could you please confirm if the Standard Analyzer in CLucene has built-in support for UTF-8 encoded text? If not, are there any recommended alternatives or additional analyzers that provide better support for non-Latin UTF-8 text? The below is the search results of few queries Max Docs: 1 Num Docs: 1 Current Version: 1688707923968.0 Term count: 66 Enter query string: dignissimos Searching for: dignissimos 0. /home/nonLatin100Rows.csv - 0.04746387 Search took: 0 ms. Screen dump took: 0 ms. Enter query string: διαχειριστής Searching for: Search took: 0 ms. Screen dump took: 0 ms. Thank you for your time. - Achyuth Pramod |