Re: [Dictionarymaker-devel] DM: import dictionary into project
Brought to you by:
bmcalister,
tfogwill
|
From: Thomas F. <tfo...@us...> - 2006-08-02 07:56:14
|
Hi
On Wed, 2006-08-02 at 08:39 +0200, Marelie Davel wrote:
> I preferred Andries's approach, with the addition that 3 only applies
> to verified words (which I sort of assumed). In general, this process
> will be used to combine two dictionaries that may come from very
> different sources - it won't just be used to add a dictionary to a
> bootstrapped project. Also, before adding a new dictionary, I would
> encourage a user to export the current dictionary and rather create a
> new project, which means both source dictionaries will continue to exist
> separate from the new combined version.
>
> My suggestion (from the user's perspective):
> > 1. File browse. Pick dictionary file (verify that it is in a *.dict
> format, whatever it is called)
> > 2. Add all words that does not already exist in dictionary (check for
>
> > missing graphemes)
> > 3. Show list of conflict words+phonemes+status, where conflict words
> are only those that have been verified on both sides, and conflict. If
> both are unverified, then just add the word (the pronunciation will be
> created through prediction based on the rule set, as usual.) If only
> one is verified, import that verdict, whatever it is. Also import the
> pronunciation if verdict==correct. (check for missing phonemes) Only for
> words that have conflicting verdicts, continue to 4.
> > 4. single/multi select word+phoneme+status and select Replace or
> > ignore
ok, cool. So, to check that we agree:
given OW and NW as before,
1. if OW == null (i.e. this word is not in our first dictionary):
import NW verbatim
2. if OW != null (i.e. exists in first dict) && OW.status ==
CORRECT and NW.status == CORRECT && NW.phonemes != OW.phonemes:
prompt.
3. if OW != null && OW.status == UNVERIFIED && NW.status !=
UNVERIFIED: import NW verbatim
4. if OW != null && OW.status == INVALID or AMBIGUOUS &&
NW.status != OW.status: prompt
5. if OW != null && OW.status == UNCERTAIN: && NW.status is
"stronger" (i.e. CORRECT, INVALID, or AMBIGUOUS): import NW
verbatim.
6. in all other cases, keep OW
> The above implies 3 levels for word statuses:
> High: CORRECT, AMBIGUOUS, INVALID
> Medium: UNCERTAIN
> Lowest: UNVERIFIED
So, in short (using the levels defined above),
* if NW.status.level > OW.status.level, import NW
* if NW.status.level < OW.status.level, keep OW
* if NW.status.level == OW.status.level
* if NW.status == OW.status && if NW.phonemes ==
OW.phonemes, keep OW
* else prompt
The following still applies:
> Whenever we add a word that was not previously in the dict, we must
> check that all graphemes are valid (and import any new ones).
>
> We must also check the phoneme list of any new words to be added, to
> ensure that all phonemes are valid. If there are invalid phonemes, we
> should either:
> * skip that word
> * import the phoneme (prompting for sound files, etc.)
>
> I'm not sure which approach we should follow. Thoughts?
>
> The phoneme list also needs to be checked when existing words are
> changed.
I agree that the prompting (i.e. for which word/pronunciation to keep),
should be done in "batch" mode.
Make sense?
Cheers
--
Thomas Fogwill <tfo...@us...>
|