Menu

How to start - dictionary recognition

Help
Ruben
2009-07-31
2012-09-22
  • Ruben

    Ruben - 2009-07-31

    Hello,

    I have been a for days reading documentation. I am completely newbie on speech recognition and now I have some knowledge about how Sphinx 4 and speech recognition works.

    I want to create a dictionary recognition program - that means, you say one word, system just recognize it.

    The problem is that I am short of time. I have read a lot I am not sure where start. I would really appreciate if someone can guide me a little bit to start creating that program - What example may I use as base, what documentation should I read to customize it for that purpose.

    Thanks in advance,

     
    • Ruben

      Ruben - 2009-08-04

      I am making some progress, but I am not sure if this is the best way to do this.

      I use as config base hellongram.config.xml, modified SimpleNGrammar by large.LargeTrigramModel and using models/language/wsj/wsj5kc.Z.DMP

      All of other values are default ones from hellongram. Could anyone please say me how can I make some better speech recognition ? Normally the sound is going to be one word, can I use this to make recognition better ?

      Thanks in advance,
      Ruben Rubio Rey

       
    • Nate

      Nate - 2009-07-31

      How comprehensive is this dictionary intended to be?

      Fundamentally, you need an acoustic model, a language model (or grammar for simple applications, hence my complexity question), and a dictionary file that maps each word represented in the language model/grammar to its set of acoustic phonemes.

      Probably the best thing to do is to check out the <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/README.html">Hello World!</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/README.html">Hello NGram</a> demos in the /apps folder of your sphinx download. These demos recognize phrases contained in their grammars and echo them back. For any of the demo programs you play with, pay special attention to their config.xml files, which specify which models are used, the dictionary, etc.

      Also make sure you read the <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html">Programmer's Guide</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html">FAQ</a>.

      Hopefully this helps. I can't and shouldn't really provide anything more concrete since I also am a sphinx noob. :)

       
    • Nate

      Nate - 2009-07-31

      Apparently those HTML tags were not required. >.<

       
    • Ruben

      Ruben - 2009-08-01

      Thaks for you response !!! Finally I am making some progress.

      I am working with voicedict http://personales.ya.com/javiercl/voicedict/index.html

      I have two problems, both related.

      • In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?

      • VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".

      Next lines are the config.xml. Thanks in advance!


      <?xml version="1.0" encoding="UTF-8"?>

      <!--
      Sphinx-4 Configuration file
      -->

      <!-- ******** -->
      <!-- an4 configuration file -->
      <!-- ******** -->

      <config>

      &lt;!-- ******************************************************** --&gt;
      &lt;!-- frequently tuned properties                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;
      
      &lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;-1&quot;/&gt;
      &lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-80&quot;/&gt;
      &lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
      &lt;property name=&quot;languageWeight&quot;     value=&quot;8&quot;/&gt;
      
      &lt;property name=&quot;frontend&quot; value=&quot;epFrontEnd&quot;/&gt;
      &lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
      &lt;property name=&quot;showCreations&quot; value=&quot;false&quot;/&gt;
      
      &lt;component name=&quot;activeList&quot;
               type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
          &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;trivialPruner&quot;
                  type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;
      
      &lt;component name=&quot;threadedScorer&quot;
                  type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
          &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
          &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
          &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
          &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
          &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- word recognizer configuration                            --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;lettersRecognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
          &lt;property name=&quot;decoder&quot; value=&quot;lettersDecoder&quot;/&gt;
          &lt;propertylist name=&quot;monitors&quot;&gt;
              &lt;!--&lt;item&gt;accuracyTracker &lt;/item&gt;--&gt;
              &lt;item&gt;speedTracker &lt;/item&gt;
              &lt;item&gt;memoryTracker &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordsRecognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
          &lt;property name=&quot;decoder&quot; value=&quot;wordsDecoder&quot;/&gt;
          &lt;propertylist name=&quot;monitors&quot;&gt;
              &lt;!--&lt;item&gt;accuracyTracker &lt;/item&gt;--&gt;
              &lt;item&gt;speedTracker &lt;/item&gt;
              &lt;item&gt;memoryTracker &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordListRecognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
          &lt;property name=&quot;decoder&quot; value=&quot;wordListDecoder&quot;/&gt;
          &lt;propertylist name=&quot;monitors&quot;&gt;
              &lt;item&gt;accuracyTracker &lt;/item&gt;
              &lt;item&gt;speedTracker &lt;/item&gt;
              &lt;item&gt;memoryTracker &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Decoder   configuration                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;lettersDecoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
          &lt;property name=&quot;searchManager&quot; value=&quot;lettersSearchManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;lettersSearchManager&quot;
          type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;linguist&quot; value=&quot;lettersLinguist&quot;/&gt;
          &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
          &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
          &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordsDecoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
          &lt;property name=&quot;searchManager&quot; value=&quot;wordsSearchManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordsSearchManager&quot;
          type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;linguist&quot; value=&quot;wordsLinguist&quot;/&gt;
          &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
          &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
          &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordListDecoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
          &lt;property name=&quot;searchManager&quot; value=&quot;wordListSearchManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordListSearchManager&quot;
          type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;linguist&quot; value=&quot;wordListLinguist&quot;/&gt;
          &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
          &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
          &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The linguist  configuration                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;lettersLinguist&quot;
                  type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;grammar&quot; value=&quot;lettersGrammar&quot;/&gt;
          &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
          &lt;property name=&quot;wordInsertionProbability&quot; 
                  value=&quot;${wordInsertionProbability}&quot;/&gt;
          &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordsLinguist&quot;
                  type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;grammar&quot; value=&quot;wordsGrammar&quot;/&gt;
          &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
          &lt;property name=&quot;wordInsertionProbability&quot;
                  value=&quot;${wordInsertionProbability}&quot;/&gt;
          &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordListLinguist&quot;
                  type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;languageModel&quot; value=&quot;ngramLanguageModel&quot;/&gt;
          &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
          &lt;property name=&quot;wordInsertionProbability&quot;
                  value=&quot;${wordInsertionProbability}&quot;/&gt;
          &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Language Model configuration                         --&gt;
      &lt;!-- ******************************************************** --&gt;
      &lt;component name=&quot;ngramLanguageModel&quot;
                  type=&quot;edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel&quot;&gt;
          &lt;property name=&quot;location&quot;
                  value=&quot;resource:/edu.cmu.sphinx.voicedict.VoiceDict!/edu/cmu/sphinx/voicedict/voicedict_unigram.lm&quot;/&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
          &lt;property name=&quot;maxDepth&quot; value=&quot;1&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Grammar  configuration                               --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;lettersGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryLetters&quot;/&gt;
          &lt;property name=&quot;grammarLocation&quot; 
               value=&quot;resource:/edu.cmu.sphinx.voicedict.VoiceDict!/edu/cmu/sphinx/voicedict/&quot;/&gt;
          &lt;property name=&quot;grammarName&quot; value=&quot;voicedictLetters&quot;/&gt;
      &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordsGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
          &lt;property name=&quot;grammarLocation&quot;
               value=&quot;resource:/edu.cmu.sphinx.voicedict.VoiceDict!/edu/cmu/sphinx/voicedict/&quot;/&gt;
          &lt;property name=&quot;grammarName&quot; value=&quot;voicedictSentences&quot;/&gt;
      &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wordListGrammar&quot;
              type=&quot;edu.cmu.sphinx.linguist.language.grammar.SimpleWordListGrammar&quot;&gt;
          &lt;property name=&quot;path&quot;
              value=&quot;voicedict.wordlist&quot;/&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
          &lt;property name=&quot;isLooping&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Dictionary configuration                            --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;dictionaryLetters&quot;
          type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
          &lt;property name=&quot;dictionaryPath&quot;
       value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/alpha.dict&quot;/&gt;
          &lt;property name=&quot;fillerPath&quot; 
       value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
          &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;dictionaryWords&quot;
          type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
          &lt;property name=&quot;dictionaryPath&quot;
       value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d&quot;/&gt;
          &lt;property name=&quot;fillerPath&quot;
       value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
          &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;true&quot;/&gt;
          &lt;property name=&quot;allowMissingWords&quot; value=&quot;true&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The acoustic model configuration                         --&gt;
      &lt;!-- ******************************************************** --&gt;
      &lt;component name=&quot;wsj&quot; 
        type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
          &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;wsjLoader&quot;
                 type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The unit manager configuration                           --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;unitManager&quot; 
          type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The frontend configuration                               --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;frontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
          &lt;propertylist name=&quot;pipeline&quot;&gt;
              &lt;item&gt;microphone &lt;/item&gt;
              &lt;item&gt;premphasizer &lt;/item&gt;
              &lt;item&gt;windower &lt;/item&gt;
              &lt;item&gt;fft &lt;/item&gt;
              &lt;item&gt;melFilterBank &lt;/item&gt;
              &lt;item&gt;dct &lt;/item&gt;
              &lt;item&gt;liveCMN &lt;/item&gt;
              &lt;item&gt;featureExtraction &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The live frontend configuration                          --&gt;
      &lt;!-- ******************************************************** --&gt;
      &lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
          &lt;propertylist name=&quot;pipeline&quot;&gt;
              &lt;item&gt;microphone &lt;/item&gt;
              &lt;item&gt;speechClassifier &lt;/item&gt;
              &lt;item&gt;speechMarker &lt;/item&gt;
              &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
              &lt;item&gt;premphasizer &lt;/item&gt;
              &lt;item&gt;windower &lt;/item&gt;
              &lt;item&gt;fft &lt;/item&gt;
              &lt;item&gt;melFilterBank &lt;/item&gt;
              &lt;item&gt;dct &lt;/item&gt;
              &lt;item&gt;liveCMN &lt;/item&gt;
              &lt;item&gt;featureExtraction &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The frontend pipelines                                   --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;speechClassifier&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
          &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;nonSpeechDataFilter&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;
      
      &lt;component name=&quot;speechMarker&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
          &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;component name=&quot;premphasizer&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
      
      &lt;component name=&quot;windower&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;fft&quot; 
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;melFilterBank&quot; 
          type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;dct&quot; 
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
      
      &lt;component name=&quot;liveCMN&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;
      
      &lt;component name=&quot;featureExtraction&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
      
      &lt;component name=&quot;microphone&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
          &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************* --&gt;
      &lt;!--  monitors                                               --&gt;
      &lt;!-- ******************************************************* --&gt;
      
      &lt;!--- Does not work compile
      &lt;component name=&quot;accuracyTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.AccuracyTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;lettersRecognizer&quot;/&gt;
          &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      --&gt;
      
      &lt;component name=&quot;accuracyTracker&quot;
                  type=&quot;edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
          &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;memoryTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;lettersRecognizer&quot;/&gt;
      &lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
      &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;speedTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;lettersRecognizer&quot;/&gt;
          &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
      &lt;property name=&quot;showSummary&quot; value=&quot;true&quot;/&gt;
      &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      
      &lt;!-- ******************************************************* --&gt;
      &lt;!--  Miscellaneous components                               --&gt;
      &lt;!-- ******************************************************* --&gt;
      
      &lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
          &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
          &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      

      </config>

       
      • Nickolay V. Shmyrev

        > In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
        > VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".

        When you report about problems always try to provide:

        1. Versions of the software you are using.
        2. Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.
        3. Describe the expected results
        4. Describe the results you want to get.

        In particular, sphinx4-beta2 doesn't work with a microphone. If you are using nightly build it should be better.

         
    • Ruben

      Ruben - 2009-08-02

      > When you report about problems always try to provide:

      Sorry!. You have the reason. I ll try to explain myself better.

      > In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?

      > 1. Versions of the software you are using:
      sphinx4-1.0beta2
      > 2. Ways to reproduce your problem, test case for example.

      If you run any test, for example "HelloNGram.jar". Speak using the microphone. I am using two, the laptop built-in microphone and an external microphone that you can attach.

      > 3. Describe the expected results

      If speaking sentences that are in http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/hellongram.test file, system should recognize the sentences in a % of the cases.

      > 4. Describe the results you want to get.

      System recognize sentences in almost 0% of the cases. I tried with three people in two microphones, one of them native English speaker. The most of cases it match any of the words of the sentence (it use to be some output) but never the complete phrase.

      > VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".

      I am using voicedict as an code base because it is very similar at we needto do.

      > 1. Versions of the software you are using.
      sphinx4-1.0beta2 and voicedict that you can download from here: http://personales.ya.com/javiercl/voicedict/index.html

      I had to modify a bit the java source code, due it was not compiling, but with very small problems. And I also modified it to just use the wordListRecognizer recognizer. (As system does not recognize me very well I cannot use the original "whats the meaning of" phrase.

      > 2. Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.

      Program has three recognizers. The interesting one is wordListRecognizer that is which can recognize any word in the dictionary.

      > 3. Describe the expected results
      When you speak, you would like the system to recognize your voice. If say any word, that word must appear.

      > 4. Describe the results you want to get.
      Always empty string.

      I really don't know how to fix any of the problems. Any king of help will be appreciated.

       
      • Nickolay V. Shmyrev

        > sphinx4-1.0beta2

        So start with the upgrade to the nightly build/svn trunk

         
      • Nickolay V. Shmyrev

        > sphinx4-1.0beta2

        So start with the upgrade to the nightly build/svn trunk

         
    • Ruben

      Ruben - 2009-08-02

      > So start with the upgrade to the nightly build/svn trunk

      So sweet! The demos works now!

      Only the second problem left.

      The main goal is to have a system that is able to understand any word in a big dictionary. To do so, I found a similar program called VoiceDict.

      > 1. Versions of the software you are using:
      svn code from a few hours ago

      > 2. Ways to reproduce your problem
      Download VoiceDict from http://personales.ya.com/javiercl/voicedict/index.html
      If you execute jar, you will have this error:

      java -jar VoiceDict.jar
      main(); Couldn't start the recognizers.Property Exception component:'accuracyTracker' property:'null' - Can't instantiate class class edu.cmu.sphinx.instrumentation.AccuracyTracker
      edu.cmu.sphinx.util.props.InternalConfigurationException: java.lang.InstantiationException
      Exception in thread "main" java.lang.NullPointerException
      at demo.sphinx.voicedict.VoiceDict.main(VoiceDict.java:155)

      To go over it, comment <!-- <item>accuracyTracker </item>--> from monitors in the recognizers at voicedict.config.xml

      Execute the program, and try to get working the recognizer "wordListRecognizer" (you have to say "what's the meaning of" and then after a beep the word)

      > 3. Describe the expected results
      The program using recognizer "wordListRecognizer" should be able to understand the word in a big dictionary.

      > 4. Describe the results you want to get.
      What I get is always an empty string.

      Thanks in advance!!!

       
      • Nickolay V. Shmyrev

        To be honest the way voice-dictionary works makes me thing it will never produce reliable results. Is your goal to make it work or to implement some other thing?

         
    • Ruben

      Ruben - 2009-08-03

      My goal is make dictionary work. I need to input some sound files, and output the text. Each sound file will be one word.

       
      • Nickolay V. Shmyrev

        The accuracy will be 60% max on a 60k words, is it ok?

         
    • Ruben

      Ruben - 2009-08-03

      But anything else (with the demo examples and JavaDocs) I think I ll have no problem, but the dictionary recognition I don't know yet how to do it.

       
    • Ruben

      Ruben - 2009-08-03

      Should be enough (as higher, as better, but enough). Could you say me some details about how to do that ?

      This program will be for a telephone application. is maybe possible to "transform" the original sound file before process it to get better results?

      Thanks in advance!!

       
      • eliasmajic

        eliasmajic - 2009-08-03

        So you want to write a telephone app that just uses simple grammars. You do not need sphinx. Google for voicexml or twilio. Both do exactly what you want from what I can tell.

        If you still choose to use sphinx, you will need a pbx. A simple asterisk setup w/ sphinx can be found by searching for scribblej sphinx. If you plan on larger scale stuff, you will want some sort of MRCP such as Zanzibar that uses sphinx4.

         
    • Ruben

      Ruben - 2009-08-03

      It is likely we may also try with the "Medium (1000 words - RM1)", we ll get better results, but we have to study if is enough. Anyway, if you could provide us any kind of way to try, we ll change between both dictionaries and test.

       
    • Ruben

      Ruben - 2009-08-03

      It is not so complex, not pbx, not asteriisk. It is a custom application. It will only process wav files, and get the content. It is gonna be applied on telephone applications but the implementation it is clear. We only need the dictionary recognition. I ll have a look to voicexml.

       
    • Ruben

      Ruben - 2009-08-03

      eliasmajic I really do not think we need voiceXML. The dictionary recognition is perfect for our purposes, feeding the application with a sound file, and returning the text.

      I just need some guidance to achieve this goal.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.