Re: [Dictionarymaker-devel] DM: import dictionary into project
Brought to you by:
bmcalister,
tfogwill
From: Marelie D. <md...@cs...> - 2006-08-02 08:10:10
|
Sounds good! >>> Thomas Fogwill <tfo...@us...> 08/02/06 9:56 AM >>> Hi On Wed, 2006- 08- 02 at 08:39 +0200, Marelie Davel wrote: > I preferred Andries's approach, with the addition that 3 only applies > to verified words (which I sort of assumed). In general, this process > will be used to combine two dictionaries that may come from very > different sources - it won't just be used to add a dictionary to a > bootstrapped project. Also, before adding a new dictionary, I would > encourage a user to export the current dictionary and rather create a > new project, which means both source dictionaries will continue to exist > separate from the new combined version. > > My suggestion (from the user's perspective): > > 1. File browse. Pick dictionary file (verify that it is in a *.dict > format, whatever it is called) > > 2. Add all words that does not already exist in dictionary (check for > > > missing graphemes) > > 3. Show list of conflict words+phonemes+status, where conflict words > are only those that have been verified on both sides, and conflict. If > both are unverified, then just add the word (the pronunciation will be > created through prediction based on the rule set, as usual.) If only > one is verified, import that verdict, whatever it is. Also import the > pronunciation if verdict==correct. (check for missing phonemes) Only for > words that have conflicting verdicts, continue to 4. > > 4. single/multi select word+phoneme+status and select Replace or > > ignore ok, cool. So, to check that we agree: given OW and NW as before, 1. if OW == null (i.e. this word is not in our first dictionary): import NW verbatim 2. if OW != null (i.e. exists in first dict) && OW.status == CORRECT and NW.status == CORRECT && NW.phonemes != OW.phonemes: prompt. 3. if OW != null && OW.status == UNVERIFIED && NW.status != UNVERIFIED: import NW verbatim 4. if OW != null && OW.status == INVALID or AMBIGUOUS && NW.status != OW.status: prompt 5. if OW != null && OW.status == UNCERTAIN: && NW.status is "stronger" (i.e. CORRECT, INVALID, or AMBIGUOUS): import NW verbatim. 6. in all other cases, keep OW > The above implies 3 levels for word statuses: > High: CORRECT, AMBIGUOUS, INVALID > Medium: UNCERTAIN > Lowest: UNVERIFIED So, in short (using the levels defined above), * if NW.status.level > OW.status.level, import NW * if NW.status.level < OW.status.level, keep OW * if NW.status.level == OW.status.level * if NW.status == OW.status && if NW.phonemes == OW.phonemes, keep OW * else prompt The following still applies: > Whenever we add a word that was not previously in the dict, we must > check that all graphemes are valid (and import any new ones). > > We must also check the phoneme list of any new words to be added, to > ensure that all phonemes are valid. If there are invalid phonemes, we > should either: > * skip that word > * import the phoneme (prompting for sound files, etc.) > > I'm not sure which approach we should follow. Thoughts? > > The phoneme list also needs to be checked when existing words are > changed. I agree that the prompting (i.e. for which word/pronunciation to keep), should be done in "batch" mode. Make sense? Cheers -- Thomas Fogwill <tfo...@us...> ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Dictionarymaker- devel mailing list Dictionarymaker- de...@li... https://lists.sourceforge.net/lists/listinfo/dictionarymaker- devel -- This message is subject to the CSIR's copyright, terms and conditions and e-mail legal notice. Views expressed herein do not necessarily represent the views of the CSIR. CSIR E-mail Legal Notice http://mail.csir.co.za/CSIR_eMail_Legal_Notice.html CSIR Copyright, Terms and Conditions http://mail.csir.co.za/CSIR_Copyright.html For electronic copies of the CSIR Copyright, Terms and Conditions and the CSIR Legal Notice send a blank message with REQUEST LEGAL in the subject line to Hel...@cs.... This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. |