Re: [Dictionarymaker-devel] DM: import dictionary into project
Brought to you by:
bmcalister,
tfogwill
From: Thomas F. <tfo...@us...> - 2006-08-01 13:23:46
|
> >>> avrensbu <avr...@cs...> 08/01/06 11:06 AM >>> > I want to confirm functionality for importing dictionary into > project. My proposal follows. Please comment. > 3. Show list of conflict words+phonemes+status > 4. single/multi select word+phoneme+status and select Replace or > ignore This seems like a lot of work for the user. I think the system should do most of this work (the whole "sane default system behaviour" philosophy). The way I see it, we always want to preserve (in our current dictionary) any information that was explicitly provided by the user. This is important because the dictionary file being imported is not altered, but the file for the current dictionary is. If we overwrite any information, it is thus lost. Throughout the rest of this email, OW refers to the Old_Word (i.e. the word in our current dictionary), and NW refers to the New_Word (i.e. the word in the dictionary being imported). These are the cases to consider: 1. If OW == null (i.e. this word is not in our dictionary), we import NW verbatim (checking graphemes and phonemes - see below) 2. If OW != null (i.e. exists in current dict) && OW.status == CORRECT then skip NW completely (user marked this pronunciation as VALID, so we shouldn't mess with it) 3. OW != null && OW.status == UNVERIFIED: if NW.status == UNVERIFIED, skip NW, otherwise import NW verbatim (checking phonemes - see below). In this case, the user provided no info for OW, and the only info we have for this word is (possibly) a list of phonemes as predicted by the system. If NW has more info, we should use that info. 4. OW != null && OW.status == INVALID or AMBIGUOUS: the user has specified that this word is invalid or ambiguous. We must not lose this info, so we skip NW. 5. OW != null && OW.status == UNCERTAIN: the user said they didn't know what the correct pronunciation for this word was. If NW.status is "stronger" (i.e. CORRECT, INVALID, or AMBIGUOUS) then we import NW verbatim, otherwise we skip it. The above implies 3 levels for word statuses: High: CORRECT, AMBIGUOUS, INVALID Medium: UNCERTAIN Lowest: UNVERIFIED Existing words are only altered when the new word has a higher status level. When both words have the same status level, there are 3 possible approaches: * keep the existing word as is (my proposed approach, described above) * replace the existing word with the new one (I would not recommend this) * prompt user to make a selection (similar to what Andries proposed; I prefer the first approach, as it is less burdensome/tedious for the user) Whenever we add a word that was not previously in the dict, we must check that all graphemes are valid (and import any new ones). We must also check the phoneme list of any new words to be added, to ensure that all phonemes are valid. If there are invalid phonemes, we should either: * skip that word * import the phoneme (prompting for sound files, etc.) I'm not sure which approach we should follow. Thoughts? The phoneme list also needs to be checked when existing words are changed. m2c -- Thomas Fogwill <tfo...@us...> |