Menu

No result (WavFile demo with WSJ and Ngram)

Help
Zach Poff
2008-06-27
2012-09-22
  • Zach Poff

    Zach Poff - 2008-06-27

    Hi. I've been trying to get my head around Sphinx4 for days. (recent SVN) Absolutely bewildering for noobs like me to get simple dictation working, but that's 100% my problem, not Sphinx!

    I copied some config.xml options from HelloNGram to WavFile, hoping to increase the vocabulary. I am using the WSJ5K LM and the WSJ acoustic model + dictionary. For now, I'm sticking with the included wav file and stream settings. Classpath, etc have been updated and it doesn't throw any errors. It doesn't produce any results though. The process finishes and it prints "RESULT:"

    I first tried converting to the WSJ acoustic model while sticking with the original grammar file. It worked. I was able to add words to the grammar and test new soundfiles and it was fine. Then I jumped into the NGram stuff and the WSJ5K model and here we are. I've tried adjusting beamwidths as per previous posts, and switching the LM to HUB4, but I think there's something much more elemental that I'm missing.

    Any ideas?

    my config file:

    <?xml version="1.0" encoding="UTF-8"?>

    <!--
    Sphinx-4 Configuration file
    -->

    <!-- ******** -->
    <!-- wavFile configuration file (converted to WSJ + NGRAM) -->
    <!-- ******** -->

    <config>

    &lt;!-- ******************************************************** --&gt;
    &lt;!-- frequently tuned properties                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;500&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;1E-80&quot;/&gt;
    &lt;property name=&quot;absoluteWordBeamWidth&quot; value=&quot;20&quot;/&gt;
    &lt;property name=&quot;relativeWordBeamWidth&quot; value=&quot;1E-60&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;7&quot;/&gt;
    &lt;property name=&quot;silenceInsertionProbability&quot; value=&quot;.1&quot;/&gt;
    &lt;property name=&quot;skip&quot; value=&quot;0&quot;/&gt;
    &lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;
    
    &lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
    &lt;property name=&quot;linguist&quot; value=&quot;flatLinguist&quot;/&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;mfcFrontEnd&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Recognizer configuration                             --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;recognizer&quot; 
                          type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
        &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
        &lt;propertylist name=&quot;monitors&quot;&gt;
            &lt;item&gt;accuracyTracker &lt;/item&gt;
            &lt;item&gt;speedTracker &lt;/item&gt;
            &lt;item&gt;memoryTracker &lt;/item&gt;
            &lt;item&gt;recognizerMonitor &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Decoder   configuration                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
        &lt;property name=&quot;searchManager&quot; value=&quot;wordPruningSearchManager&quot;/&gt;
        &lt;property name=&quot;featureBlockSize&quot; value=&quot;50&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Search Manager                                       --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;wordPruningSearchManager&quot; 
    type=&quot;edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;linguist&quot; value=&quot;lexTreeLinguist&quot;/&gt;
        &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
        &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
        &lt;property name=&quot;activeListManager&quot; value=&quot;activeListManager&quot;/&gt;
        &lt;property name=&quot;growSkipInterval&quot; value=&quot;0&quot;/&gt;
        &lt;property name=&quot;checkStateOrder&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;buildWordLattice&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;acousticLookaheadFrames&quot; value=&quot;1.7&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Active Lists                                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;activeListManager&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.SimpleActiveListManager&quot;&gt;
        &lt;propertylist name=&quot;activeListFactories&quot;&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;wordActiveListFactory&lt;/item&gt;
        &lt;item&gt;wordActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
    &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;standardActiveListFactory&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;wordActiveListFactory&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteWordBeamWidth}&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeWordBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Pruner                                               --&gt;
    &lt;!-- ******************************************************** --&gt; 
    &lt;component name=&quot;trivialPruner&quot; 
                type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- TheScorer                                                --&gt;
    &lt;!-- ******************************************************** --&gt; 
    &lt;component name=&quot;threadedScorer&quot; 
                type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
        &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
        &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
        &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
        &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The linguist  configuration                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;lexTreeLinguist&quot; 
                type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
        &lt;property name=&quot;languageModel&quot; value=&quot;trigramModel&quot;/&gt;
        &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
        &lt;property name=&quot;addFillerWords&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;fillerInsertionProbability&quot; value=&quot;1E-10&quot;/&gt;
        &lt;property name=&quot;generateUnitStates&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;wantUnigramSmear&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;unigramSmearWeight&quot; value=&quot;1&quot;/&gt;
        &lt;property name=&quot;wordInsertionProbability&quot; 
                value=&quot;${wordInsertionProbability}&quot;/&gt;
        &lt;property name=&quot;silenceInsertionProbability&quot; 
                value=&quot;${silenceInsertionProbability}&quot;/&gt;
        &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Dictionary configuration                            --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;dictionary&quot; 
        type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
        &lt;property name=&quot;dictionaryPath&quot;
                  value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d&quot;/&gt;
        &lt;property name=&quot;fillerPath&quot; 
              value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
        &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Language Model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;trigramModel&quot; 
        type=&quot;edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel&quot;&gt;
        &lt;property name=&quot;location&quot; value=&quot;/WSJ/wsj5k.DMP&quot;/&gt;
    

    <!-- value="/HUB4/language_model.arpaformat.DMP"/> -->

        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
        &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
        &lt;property name=&quot;unigramWeight&quot; value=&quot;.7&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The acoustic model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;wsj&quot;
               type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
        &lt;property name=&quot;loader&quot; value=&quot;sphinx3Loader&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;sphinx3Loader&quot;
               type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The unit manager configuration                           --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;unitManager&quot;
               type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The frontend configuration                               --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;mfcFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
        &lt;propertylist name=&quot;pipeline&quot;&gt;
            &lt;item&gt;streamDataSource&lt;/item&gt;
            &lt;item&gt;premphasizer&lt;/item&gt;
            &lt;item&gt;windower&lt;/item&gt;
            &lt;item&gt;fft&lt;/item&gt;
            &lt;item&gt;melFilterBank&lt;/item&gt;
            &lt;item&gt;dct&lt;/item&gt;
            &lt;item&gt;batchCMN&lt;/item&gt;
            &lt;item&gt;featureExtraction&lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;streamDataSource&quot;
               type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
        &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
        &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
        &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;premphasizer&quot;
               type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
    
    &lt;component name=&quot;windower&quot;
               type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;/&gt;
    
    &lt;component name=&quot;fft&quot;
               type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;/&gt;
    
    &lt;component name=&quot;melFilterBank&quot;
               type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;/&gt;
    
    &lt;component name=&quot;dct&quot;
               type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
    
    &lt;component name=&quot;batchCMN&quot;
               type=&quot;edu.cmu.sphinx.frontend.feature.BatchCMN&quot;/&gt;
    
    &lt;component name=&quot;featureExtraction&quot;
               type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
    
    &lt;!-- ******************************************************* --&gt;
    &lt;!--  monitors                                               --&gt;
    &lt;!-- ******************************************************* --&gt;
    
    &lt;component name=&quot;accuracyTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;memoryTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;speedTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;recognizerMonitor&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.RecognizerMonitor&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;propertylist name=&quot;allocatedMonitors&quot;&gt;
            &lt;item&gt;configMonitor &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;configMonitor&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.ConfigMonitor&quot;&gt;
        &lt;property name=&quot;showConfig&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************* --&gt;
    &lt;!--  Miscellaneous components                               --&gt;
    &lt;!-- ******************************************************* --&gt;
    
    &lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
        &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
        &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    

    </config>

     
    • Nickolay V. Shmyrev

      Provide recording you are trying to decode please.

       
    • Zach Poff

      Zach Poff - 2008-06-27

      The audio file is the one included with the demo (1234.WAV) I'm not adding any extra arguments at runtime (except increasing RAM) so it refers to the default file hardcoded in the class.
      -ZLP

       
      • Nickolay V. Shmyrev

        With the following header:

        <property name="absoluteBeamWidth" value="5000"/>
        <property name="relativeBeamWidth" value="1E-120"/>
        <property name="absoluteWordBeamWidth" value="100"/>
        <property name="relativeWordBeamWidth" value="1E-60"/>
        <property name="wordInsertionProbability" value="0.7"/>
        <property name="languageWeight" value="8.5"/>
        <property name="silenceInsertionProbability" value=".1"/>
        <property name="skip" value="0"/>
        <property name="logLevel" value="WARNING"/>

        Released version sphinx4-beta works perfectly:

        RESULT: one two three four five

        Trunk is quite broken actually. I promise a beer to one who will fix it :)

         
        • Nickolay V. Shmyrev

          Ok, I found the problem, it's the same StartDataSignal in scorer. To make it work properly with trunk you need to add useStreamSignals=false in scorer properties:

          <component name="threadedScorer"
          type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
          <property name="frontend" value="${frontend}"/>
          <property name="isCpuRelative" value="true"/>
          <property name="numThreads" value="0"/>
          <property name="minScoreablesPerThread" value="10"/>
          <property name="scoreablesKeepFeature" value="true"/>
          <property name="useStreamSignals" value="false"/>
          </component>

          RESULT: one two three four five

          But it looks more like a workaround than a solution.

           
          • Zach Poff

            Zach Poff - 2008-07-01

            Thanks! That solved my problem. I put the beamwidth back up to a sane value (5k) and now the WSJ5K LM is running great. I'm grateful that you gave it a look. - ZLP

             

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.