With the LuceneItalianTokenizer selected for an Italian to English translation, stemming does not work for glossary entries. For example, with paese and emettere added to a sample project glossary, the words paesi and emettono in a source segment are not recognized by the glossary function.
Switching to the Hunspell tokenizer resolved this issue for me but I wanted to flag this issue with the Lucene tokenizer.
Version: OmegaT-5.7.0_0_8ae1ecfb5
Platform: Mac OS X 10.16
Java: 1.8.0_312 x86_64
Memory: 445MiB total / 324MiB free / 3641MiB max
Confirmed.
It seems related to the tokenizer and those specific terms, as some glossary entries are recognized (traduzione/traduzioni, frutto/frutti, lavoro/lavori, for instance), but others are not.
Switching to the Hunspell tokenizer does lead to different results, but it does not resolve the issue for me: it recognizes still less entries (does not recognize paese/emettere and paesi/emettono, and neither are other terms that Lucene did recognize).
Version: OmegaT-5.7.0 (8ae1ecfb)
Platform: Ubuntu 20.04
Java: 11.01.13 (64 bits)
Memory: 142 MB; 47 MB free; 1.410 MB max
Since OmegaT 6.1 beta has upgraded to LanguageTool 6.5 and Lucene 8.11.4, have there been any improvements to the Italian glossary stemming issues reported in ticket #1088? Specifically, does the newer LanguageTool version better handle stemmed forms like paesi/paese and emettono/emettere when using the LuceneItalianTokenizer?
Last edit: Hiroshi Miura 2025-08-18
OmegaT uses
ItalianAnalyzerof Lucene library.ItalianAnalyeris implemented withItalianTokenizerandItalianLightStemFilter.ItalianLightStemFilterimplements a specific algorithm described in "Report on CLEF-2001 Experiments" by Jacques SavoyItalianLightStemmer, which is designed to be conservative and avoid over-stemming.REF: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/it/ItalianLightStemmer.html
REF: https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf
It is why Lucene does not stem "paesi/paese".
Ticket moved from /p/omegat/bugs/1088/
Start implementing
https://github.com/omegat-org/omegat/pull/1610