Menu

#1245 Perform case folding when looking up dictionary entries

4.0
closed-fixed
None
5
2016-09-06
2016-05-12
No

Currently, OmegaT only find StarDict index entries when they are in lowercase. However, some dictionaries are in uppercase. For instance, in the French Academy (1935) dictionary (http://download.huzheng.org/fr/) all entries are in full uppercase.

Converting all index entries to lowercase would be a limited strategy, as sometimes case is significant (e.g., German).

Aaron proposes the following strategy:

  1. I think some kind of normalization is reasonable. For instance at the moment we aren't performing Unicode normalization on dictionary entries either, but really we should be.
  2. Lowercasing seems OK to me (yes, we should use the right locale). Regarding your point about German, even in English it would be better to distinguish between "post" (the physical object) and "POST" (the HTTP verb). It would take more memory (especially if a dictionary is entirely uppercase), but it seems like the smart thing to do would be to retain both the original key and the lowercased key when they differ.
  3. We should be doing the same normalization on the search words when doing lookup: (First, Unicode-normalize, then) look up the word as-is; if there are no hits then look up the lowercased word.

Didier

Discussion

  • Aaron Madlon-Kay

    • assigned_to: Aaron Madlon-Kay
     
  • Aaron Madlon-Kay

    We should be doing the same normalization on the search words when doing lookup: (First, Unicode-normalize, then) look up the word as-is; if there are no hits then look up the lowercased word.

    We were actually already doing that, in DictionariesManager.findWords().

     
  • Aaron Madlon-Kay

    • summary: Load StarDict dictionary when index entries are in uppercase --> Perform case folding when looking up dictionary entries
    • status: open --> open-fixed
    • Group: future --> 4.0
     
  • Aaron Madlon-Kay

    This is addressed in trunk for both StarDict and LingvoDSL.

     
  • Didier Briel

    Didier Briel - 2016-09-06
    • status: open-fixed --> closed-fixed
     
  • Didier Briel

    Didier Briel - 2016-09-06

    Implemented in the released version 4.0 of OmegaT.

    Didier

     

Log in to post a comment.