Menu

Sphinx4 FastDictionary Addenda?

Help
2008-02-11
2012-09-22
  • Chris Deering

    Chris Deering - 2008-02-11

    Hi Guys,

    Im currently using Sphinx4 as part of my final year project in Uni. Overall, I've found it a lot of fun to work with (once I got my head around a few things that is). I've been getting pretty good results from using hub4. However, part of what I want to do is to be able to deal gracefully with out-of-vocab words. For this, I will be providing UI methods to allow the user to enter a correction. I want to then incorporate these improvements in future recognition tasks. To do this I was thinking (well, my supervisor gave me the idea) to add the entries into the dictionary file. First of all, I want to confirm, would this have the desired result? If I add a word to the dictionary (with correct phoneme mapping), will it be recognised if I were to run the same file again (at least in theory), or have I totally misunderstood the purpose of the dictionary? If I have misunderstood this, how could I go about adding that functionality? Secondly, I figure its much better practice to keep user-added words separate from the main body of the hub4 cmu dictionary, so I was really pleased when I saw the Addenda feature in the FastDictionary class. However, I cant for the life of me figure out how to set it up!! I get various different errors such as "unregistered component" whatever I try! Is there something I am missing? I would be really grateful if someone could shed some light on it. Thanks for your time!

    Chris

     
    • Chris Deering

      Chris Deering - 2008-02-15

      Apologies for not replying sooner, I was away for a few days. Thanks, that fixed my problem!

       
    • Holger Brandl

      Holger Brandl - 2008-02-11

      Hi Chris,

      > add the entries into the dictionary file. First of all, I want to
      > confirm, would this have the desired result?
      It depends on the type grammar you're using: If it is a SimpleWordListGrammar adding the items to the dictionary should be sufficient. If this is not the case and you're using a rule-grammar, you should ensure that the added entries make it into one of the grammar-rules. Otherwise they will be just ignored. If you're using a statistical grammar (aka n-gram-model) your new items will get a kind of 'default'-appearance-probability but which is quite low.

      When changing the dictionary online you need to ensure in any case that the search-space is either assembled online or that it becomes rebuild after you've added the new words.

      > addenda feature in the FastDictionary class. However, I cant for
      > the life of me figure out how to set it up!! I get various different
      > errors such as "unregistered component"
      Could you post the according snippet of your configuration-file?

      Best regards,
      Holger

       
    • Chris Deering

      Chris Deering - 2008-02-11

      Hi Holger,

      Thanks for the quick response! I'm using an n-gram model (LargeTrigramModel) for my grammar, so would there be any way to increase this default appearance probability? I'd be happy getting my hands dirty in the S4 interiors if that were required.

      "When changing the dictionary online you need to ensure in any case that the search-space is either assembled online or that it becomes rebuild after you've added the new words."

      So, I assume I could do this with allocate()?

      Here is the extract from my config. The bit that doesnt work is obviously the "addenda" property. I'm probably just going about this in the wrong way.

      <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
          <property name="dictionaryPath" value="file:/C:/workspace/SemiAutoTranscription/dict/cmudict.06d"/>
          <property name="fillerPath" value="file:/C:/workspace/SemiAutoTranscription/dict/fillerdict"/>
          <property name="addSilEndingPronunciation" value="false"/>
          <property name="wordReplacement" value="<sil>"/>
          <property name="allowMissingWords" value="true"/>
          <property name="unitManager" value="unitManager"/>
      <property name="addenda" value="file:/C:/workspace/SemiAutoTranscription/dict/userdict.06d"/>
      </component>
      

      The resulting Exception is: "Property Exception component:'dictionary' property:'addenda' - Attempt to set unregistered property"

      Thanks again!

      Chris

       
      • Holger Brandl

        Holger Brandl - 2008-02-11

        Hi Chris,

        > would there be any way to increase this default appearance probability
        sure, but I think it's anything but trivial. I would recommend to try the default-probabilities first.

        > 'addenda' - Attempt to set unregistered property"
        No idea yet. The configuration seems to be OK. Are you working with beta1 or the latest svn-version?

        -Holger

         
    • Chris Deering

      Chris Deering - 2008-02-11

      >> 'addenda' - Attempt to set unregistered property"
      >No idea yet. The configuration seems to be OK. Are you working with beta1 or the latest svn-version?

      Well, I was using a nightly snapshot from about 4 months ago, so I updated to the latest, but I still get the same problem. I will try adding the addenda setting to a property sheet, and then adding that to the dictionary at run-time as a temporary workaround.

       
    • Chris Deering

      Chris Deering - 2008-02-12

      Ok, I think I have identified my problem. Could it be that I'm using an older version of hub4? Initially when I first started using Sphinx about 6 months ago, I was using beta1, then I started using a nightly build about 4 months ago, but I (for whatever reason... cant remember why!) linked the hub4 in the beta1 project folder. When I run it with that, I get the aforemention "Attempt to set unregistered property" exception, it works fine without the addenda property though. However, if I link it to the hub4 jar I have in my more very recent nightly build project folder, I get the following...

      Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class edu.cmu.sphinx.util.props.PropertySheet, but interface was expected
      at edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model.newProperties(Model.java:158)
      at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
      at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
      at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.newProperties(LexTreeLinguist.java:241)
      at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
      at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
      at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:173)
      at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
      at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
      at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:42)
      at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:31)
      at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
      at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
      at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:79)
      at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
      at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:144)

      Is it possible the latest version of S4 has some incompatibility with the Model.class file in the hub4 jar?

      I'm getting a bit confused myself now! :D

       
      • Holger Brandl

        Holger Brandl - 2008-02-12

        Hi Chris,

        > Is it possible the latest version of S4 has some incompatibility
        > with the Model.class file in the hub4 jar?
        Yep, we've rewritten the complete configuration backend since beta1. Mixing old and new code won't work in any case. It's confusing and a new release should end such problems.

        -Holger

         
      • Nickolay V. Shmyrev

        Yes, you must extract model from jar and rebuild it:

        http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html

        Search forum, this topic was discussed already.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.