Menu

#1203 Update Lucene to 5.2.1

4.0
closed-fixed
None
5
2016-09-06
2016-03-24
No

LanguageTool requires Lucene 5.2.1 and we can't have two different versions, so for [#970] we must upgrade our tokenizers to 5.2.1 as well.

Items of note:

  • The Version API is no longer available, so it is no longer possible to specify a "tokenizer behavior" as in the past. The pulldown menus in Project Properties as well as the command line switches have been removed.
  • LuceneGermanTokenizer had an older behavior specified as default; this behavior has been reimplemented as a custom analyzer.
  • A few components within Lucene use features that are restricted when running under Java WebStart. This requires us to use a custom patched version.
  • Due to the removal of deprecated APIs, the following tokenizers have been removed:
    • LuceneChineseTokenizer (this was never any better than the language-agnostic behavior)
    • All Snowball*Tokenizers
  • At the same time I have removed these tokenizers:
    • TinySegmenterTokenizer (this was always worse than LuceneJapaneseTokenizer)
    • LuceneKoreanTokenizer (this was always broken and is incompatible with Lucene 5)

Related

Bugs: #813
Feature Requests: #970

Discussion

  • Didier Briel

    Didier Briel - 2016-09-06
    • status: open-fixed --> closed-fixed
     
  • Didier Briel

    Didier Briel - 2016-09-06

    Implemented in the released version 4.0 of OmegaT.

    Didier

     

Log in to post a comment.

MongoDB Logo MongoDB