OmegaT - multiplatform CAT tool / Feature Requests / #1245 Perform case folding when looking up dictionary entries

The free computer aided translation (CAT) tool for professionals

#1245 Perform case folding when looking up dictionary entries

Milestone: 4.0

Status: closed-fixed

Owner: Aaron Madlon-Kay

Labels: None

Priority: 5

Updated: 2016-09-06

Created: 2016-05-12

Creator: Didier Briel

Private: No

Currently, OmegaT only find StarDict index entries when they are in lowercase. However, some dictionaries are in uppercase. For instance, in the French Academy (1935) dictionary (http://download.huzheng.org/fr/) all entries are in full uppercase.

Converting all index entries to lowercase would be a limited strategy, as sometimes case is significant (e.g., German).

Aaron proposes the following strategy:

I think some kind of normalization is reasonable. For instance at the moment we aren't performing Unicode normalization on dictionary entries either, but really we should be.

Lowercasing seems OK to me (yes, we should use the right locale). Regarding your point about German, even in English it would be better to distinguish between "post" (the physical object) and "POST" (the HTTP verb). It would take more memory (especially if a dictionary is entirely uppercase), but it seems like the smart thing to do would be to retain both the original key and the lowercased key when they differ.

We should be doing the same normalization on the search words when doing lookup: (First, Unicode-normalize, then) look up the word as-is; if there are no hits then look up the lowercased word.

Didier

Discussion

Aaron Madlon-Kay - 2016-05-13

assigned_to: Aaron Madlon-Kay
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2016-05-13

Prototype incorporating this, [#1124] and [#1242] available here: https://omegat.ci.cloudbees.com/job/omegat-prototype/42/

Related

Feature Requests: ~~#1124~~
Feature Requests: ~~#1242~~

Last edit: Aaron Madlon-Kay 2016-05-13

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2016-05-13

We should be doing the same normalization on the search words when doing lookup: (First, Unicode-normalize, then) look up the word as-is; if there are no hits then look up the lowercased word.

We were actually already doing that, in DictionariesManager.findWords().

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2016-05-18

summary: Load StarDict dictionary when index entries are in uppercase --> Perform case folding when looking up dictionary entries

status: open --> open-fixed

Group: future --> 4.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2016-05-18

This is addressed in trunk for both StarDict and LingvoDSL.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2016-09-06

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2016-09-06

Implemented in the released version 4.0 of OmegaT.

Didier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Perform case folding when looking up dictionary entries

The free computer aided translation (CAT) tool for professionals

Group

Searches

Help

#1245 Perform case folding when looking up dictionary entries

Discussion

Related