Menu

Wav transcription very poor results

Help
2008-02-22
2012-09-22
  • Einstein Mic

    Einstein Mic - 2008-02-22

    Hi,

    I finally was able to run a modified version of the transcriber demo in order to use HUB4 model for transcription.
    Now I have some English newscast programmes and I'm trying to transcribe them, but results are very poor: large parts of speech seem not to be transcribed and the transcribed part is not so good.
    Do you think that there are parameters that can be tuned in order to have better performances or the problems is linked to the model?

    Here is my configuration file:

    <config>
    <!-- ******** -->
    <!-- frequently tuned properties -->
    <!-- ******** -->
    <property name="absoluteBeamWidth" value="5000"/>
    <property name="relativeBeamWidth" value="1E-80"/>
    <property name="absoluteWordBeamWidth" value="80"/>
    <property name="relativeWordBeamWidth" value="1E-60"/>
    <property name="wordInsertionProbability" value="1E-16"/>
    <property name="languageWeight" value="7.0"/>
    <property name="silenceInsertionProbability" value=".1"/>
    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>

    &lt;!-- ******************************************************** --&gt;
    &lt;!-- word recognizer configuration                            --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;recognizer&quot; 
                          type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
        &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
        &lt;propertylist name=&quot;monitors&quot;&gt;
            &lt;item&gt;accuracyTracker &lt;/item&gt;
            &lt;item&gt;speedTracker &lt;/item&gt;
            &lt;item&gt;memoryTracker &lt;/item&gt;
            &lt;item&gt;recognizerMonitor &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Decoder   configuration                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
        &lt;property name=&quot;searchManager&quot; value=&quot;wordPruningSearchManager&quot;/&gt;
        &lt;property name=&quot;featureBlockSize&quot; value=&quot;50&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Search Manager                                       --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;wordPruningSearchManager&quot; 
    type=&quot;edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;linguist&quot; value=&quot;lexTreeLinguist&quot;/&gt;
        &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
        &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
        &lt;property name=&quot;activeListManager&quot; value=&quot;activeListManager&quot;/&gt;
        &lt;property name=&quot;growSkipInterval&quot; value=&quot;0&quot;/&gt;
        &lt;property name=&quot;checkStateOrder&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;buildWordLattice&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;acousticLookaheadFrames&quot; value=&quot;1.7&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Active Lists                                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;activeListManager&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.SimpleActiveListManager&quot;&gt;
        &lt;propertylist name=&quot;activeListFactories&quot;&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;wordActiveListFactory&lt;/item&gt;
        &lt;item&gt;wordActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
    &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;standardActiveListFactory&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;wordActiveListFactory&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteWordBeamWidth}&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeWordBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Pruner                                               --&gt;
    &lt;!-- ******************************************************** --&gt; 
    &lt;component name=&quot;trivialPruner&quot; 
                type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- TheScorer                                                --&gt;
    &lt;!-- ******************************************************** --&gt; 
    &lt;component name=&quot;threadedScorer&quot; 
                type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
        &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
        &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
        &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
        &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The linguist  configuration                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;lexTreeLinguist&quot; 
                type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;acousticModel&quot; value=&quot;hub4&quot;/&gt;
        &lt;property name=&quot;languageModel&quot; value=&quot;trigramModel&quot;/&gt;
        &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
        &lt;property name=&quot;addFillerWords&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;fillerInsertionProbability&quot; value=&quot;1E-10&quot;/&gt;
        &lt;property name=&quot;generateUnitStates&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;wantUnigramSmear&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;unigramSmearWeight&quot; value=&quot;1&quot;/&gt;
        &lt;property name=&quot;wordInsertionProbability&quot; 
                value=&quot;${wordInsertionProbability}&quot;/&gt;
        &lt;property name=&quot;silenceInsertionProbability&quot; 
                value=&quot;${silenceInsertionProbability}&quot;/&gt;
        &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Dictionary configuration                            --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;dictionary&quot; 
        type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
        &lt;property name=&quot;dictionaryPath&quot;
                  value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/dict/cmudict.06d&quot;/&gt;
        &lt;property name=&quot;fillerPath&quot; 
              value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/dict/fillerdict&quot;/&gt;
        &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;streamDataSource&quot;
                type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
        &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
        &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
        &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;bytesPerRead&quot; value=&quot;320&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Language Model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;trigramModel&quot;
          type=&quot;edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel&quot;&gt;
        &lt;property name=&quot;unigramWeight&quot; value=&quot;.5&quot;/&gt;
        &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
        &lt;property name=&quot;location&quot;
                  value=&quot;/home/vigilante/Sphinx/hub4/language/language_model.arpaformat.DMP&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The acoustic model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;hub4&quot;
               type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model&quot;&gt;
        &lt;property name=&quot;loader&quot; value=&quot;hub4Loader&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;hub4Loader&quot; type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.ModelLoader&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The acoustic model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;wsj&quot;
               type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
        &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;wsjLoader&quot; type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The unit manager configuration                           --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;unitManager&quot; 
        type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The frontend configuration                               --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;mfcFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
        &lt;propertylist name=&quot;pipeline&quot;&gt;
            &lt;item&gt;microphone &lt;/item&gt;
            &lt;item&gt;premphasizer &lt;/item&gt;
            &lt;item&gt;windower &lt;/item&gt;
            &lt;item&gt;fft &lt;/item&gt;
            &lt;item&gt;melFilterBank &lt;/item&gt;
            &lt;item&gt;dct &lt;/item&gt;
            &lt;item&gt;liveCMN &lt;/item&gt;
            &lt;item&gt;featureExtraction &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The live frontend configuration                          --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
        &lt;propertylist name=&quot;pipeline&quot;&gt;
            &lt;item&gt;streamDataSource &lt;/item&gt;
            &lt;item&gt;speechClassifier &lt;/item&gt;
            &lt;item&gt;speechMarker &lt;/item&gt;
            &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
            &lt;item&gt;premphasizer &lt;/item&gt;
            &lt;item&gt;windower &lt;/item&gt;
            &lt;item&gt;fft &lt;/item&gt;
            &lt;item&gt;melFilterBank &lt;/item&gt;
            &lt;item&gt;dct &lt;/item&gt;
            &lt;item&gt;liveCMN &lt;/item&gt;
            &lt;item&gt;featureExtraction &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;microphone&quot; 
                type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
        &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;dataBlocker&quot; type=&quot;edu.cmu.sphinx.frontend.DataBlocker&quot;/&gt;
    
    &lt;component name=&quot;speechClassifier&quot;
                type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
        &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;nonSpeechDataFilter&quot; 
                type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;
    
    &lt;component name=&quot;speechMarker&quot; 
                type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot;&gt;
        &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;premphasizer&quot; 
        type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
    
    &lt;component name=&quot;windower&quot; 
    type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;/&gt;
    
    &lt;component name=&quot;fft&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;/&gt;
    
    &lt;component name=&quot;melFilterBank&quot; 
        type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;/&gt;
    
    &lt;component name=&quot;dct&quot; 
            type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
    
    &lt;component name=&quot;liveCMN&quot; 
                type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;
    
    &lt;component name=&quot;featureExtraction&quot; 
        type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
    
    &lt;!-- ******************************************************* --&gt;
    &lt;!--  monitors                                               --&gt;
    &lt;!-- ******************************************************* --&gt;
    
    &lt;component name=&quot;accuracyTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;memoryTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;speedTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;recognizerMonitor&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.RecognizerMonitor&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;propertylist name=&quot;allocatedMonitors&quot;&gt;
            &lt;item&gt;configMonitor &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;configMonitor&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.ConfigMonitor&quot;&gt;
        &lt;property name=&quot;showConfig&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************* --&gt;
    &lt;!--  Miscellaneous components                               --&gt;
    &lt;!-- ******************************************************* --&gt;
    
    &lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
        &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
        &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    

    </config>

     
    • Nickolay V. Shmyrev

      I'm not sure how do you transcribe newscast since your config works with microphone. If you are decoding in a batch mode take care about sampling rate first of all. If everything is setup correctly first of all you have to train a language model for your domain.

      P.S. Please don't start a new topic for every question.

       
    • Nickolay V. Shmyrev

      if you need quick help for such simple questions you can join #cmusphinx irc channel on freenode.

       
    • Einstein Mic

      Einstein Mic - 2008-02-25

      Thanks for your help.
      There were some useless parts in the configuration file (the microphone part was there but was not used) but the frontend uses a streamDataSource.
      I've changed from LiveCMN to BatchCMN and the results are now a little bit better. Do you think that changing parameters (absoluteBeanWidth, absoluteWordBeanWidth and so on) can improve performances? I've seen that speech is not correctly recognized when the voice is not completely clear or there are noises in the background. I think this problem is due to the acoustic model used. I mean, probably the HUB4 acoustic model was created starting from audio recorded by speakers reading sentences. Have you any idea about which materials where used for creating the HUB4 acoustic model? Probably I have to create my own acoustic model that is nearest to the context in which I want to use it (newscast programmes).
      Unfortunatly I'm not able to join irc channels at work (security reasons).

       
      • Nickolay V. Shmyrev

        Hub4 is fine as well as wsj

        I suggest you to prepare a small demo of your work and upload it somewhere. We'll look and suggest you something. Most probably you'll have to use your own language model and probably adaptation. But you could make much more silly mistake.

         
    • Adiilah

      Adiilah - 2008-02-25

      Hi,
      i thnk the recognition speed is slow when absoluteBeamWidth value="-1". -1(mean infiniti in sphinx4 config term)is for a small task like digit or alphabet or simple command. For big task, the value should around 1000-5000.

      For the acoustic model i dn't have any idea ..am using the HUB4 itself..can u please tell me how much memory you allocate to your sphinx4 application??Since my pc memory is very low it's taking lotz of time to decode and mostly due to hub4 size!!.

       
    • Pingu

      Pingu - 2008-08-06

      I'm experiencing essentially the same problem.

      I'm trying to convert short podcasts with different speakers from 16k mono 256kbps wav to text, but the accuracy is very poor.

      Increasing the beamwidth values doesn't seem to improve the accuracy much.

      Following the steps of another forum post, and since the speakers are random/unknown, I modified the Transcriber demo config file to use the Wall Street Journal language model (wsj5k.DMP).

      If there's a better solution or there's anything obviously wrong with the config file posted below, please let me know.

      <?xml version="1.0" encoding="UTF-8"?>

      <!--

      Sphinx-4 Configuration file

      -->

      <!-- ******** -->

      <!-- an4 configuration file -->

      <!-- ******** -->

      <config>

      <!--*** -->

      <!-- frequently tuned properties -->

      <!-- ******** -->

      <property name="logLevel" value="OFF"/>

      <!-- <property name="relativeBeamWidth" value="1E-10" /> -->

      <property name="absoluteWordBeamWidth" value="5000" />

      <property name="relativeWordBeamWidth" value="1E-200" />

      <property name="wordInsertionProbability" value="1E-36" />

      <property name="languageWeight" value="8" />

      <property name="silenceInsertionProbability" value=".3" />

      <property name="acousticLookahead" value="1.7" />

      <property name="absoluteBeamWidth" value="5000" />

      <property name="relativeBeamWidth" value="1E-230" />

      <property name="frontend" value="epFrontEnd"/>

      <property name="recognizer" value="recognizer"/>

      <property name="showCreations" value="false"/>

      <!-- ******** -->

      <!-- word recognizer configuration -->

      <!-- ******** -->

      <component name="recognizer"

      type="edu.cmu.sphinx.recognizer.Recognizer">

      <property name="decoder" value="decoder"/>

      <propertylist name="monitors">

      <item>accuracyTracker </item>

      <item>speedTracker </item>

      <item>memoryTracker </item>

      </propertylist>

      </component>

      <component name="lexTreeLinguist"

      type="edu.cmu.sphinx.linguist.lextree.LexTreeLinguist">

      <property name="silenceInsertionProbability"

      value="${silenceInsertionProbability}" />

      <property name="wantUnigramSmear" value="true" />

      <property name="fillerInsertionProbability" value=".02" />

      <property name="addFillerWords" value="true" />

      <property name="acousticModel" value="wsj" />

      <property name="languageModel" value="trigramModel" />

      <property name="wordInsertionProbability"

      value="${wordInsertionProbability}" />

      <property name="languageWeight" value="14" />

      <property name="logMath" value="logMath" />

      <property name="dictionary" value="dictionary" />

      <property name="unigramSmearWeight" value="1" />

      <property name="cacheSize" value="0" />

      <property name="generateUnitStates" value="false" />

      <property name="unitManager" value="unitManager" />

      </component>

      <!-- ******** -->

      <!-- The Decoder configuration -->

      <!-- ******** -->

      <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">

      <property name="searchManager" value="wordPruningSearchManager"/>

      </component>

      • <!-- ********

      -->

      • <!-- wordPruningSearchManager

      -->

      • <!-- ********

      -->

      • <component name="wordPruningSearchManager"

      type="edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager">

      <property name="scorer" value="threadedScorer" />

      <property name="pruner" value="trivialPruner" />

      <property name="acousticLookaheadFrames" value="2.0" />

      <property name="logMath" value="logMath" />

      <property name="activeListManager" value="activeListManager" />

      <property name="buildWordLattice" value="true" />

      <property name="maxLatticeEdges" value ="50" />

      <property name="relativeBeamWidth" value="1E-60" />

      <property name="growSkipInterval" value="8" />

      <property name="linguist" value="lexTreeLinguist" />

      <property name="checkStateOrder" value="false" />

      <property name="keepAllTokens" value="true" />

      </component>

      <component name="activeList"

      type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">

      <property name="logMath" value="logMath"/>

      <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>

      <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>

      </component>

      <component name="trivialPruner"

      type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>

      <component name="threadedScorer"

      type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">

      <property name="frontend" value="${frontend}"/>

      <property name="isCpuRelative" value="true"/>

      <property name="numThreads" value="0"/>

      <property name="minScoreablesPerThread" value="10"/>

      <property name="scoreablesKeepFeature" value="true"/>

      </component>

      <!-- ******** -->

      <!-- acoustic model -->

      <!-- ******** -->

      <component name="wsj"

      type="edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model">

      <property name="loader" value="wsjLoader" />

      <property name="unitManager" value="unitManager" />

      </component>

      <!-- ******** -->

      <!-- sphinx3Loader -->

      <!-- ******** -->

      <component name="wsjLoader"

      type="edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader">

      <property name="logMath" value="logMath" />

      <property name="unitManager" value="unitManager" />

      </component>

      <!-- ******** -->

      <!-- trigramModel -->

      <!-- ******** -->

      <component name="trigramModel"

      type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel">

      <property name="unigramWeight" value=".5" />

      <property name="maxDepth" value="3" />

      <property name="logMath" value="logMath" />

      <property name="dictionary" value="dictionary" />

      <property name="location" value="/usr/src/wsj5k.DMP" />

      </component>

      <!-- ******** -->

      <!-- The Dictionary configuration -->

      <!-- ******** -->

      <component name="dictionary"

      type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">

      <property name="dictionaryPath"

      value="resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d"/>

      <property name="fillerPath"

      value="resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict"/>

      <property name="addSilEndingPronunciation" value="false"/>

      <property name="wordReplacement" value="&lt;sil&gt;"/>

      <property name="allowMissingWords" value="false"/>

      <property name="unitManager" value="unitManager"/>

      </component>

      <!-- ******** -->

      <!-- The unit manager configuration -->

      <!-- ******** -->

      <component name="unitManager"

      type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>

      <!-- ******** -->

      <!-- The live frontend configuration -->

      <!-- ******** -->

      <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">

      <propertylist name="pipeline">

      <item>streamDataSource </item>

      <item>speechClassifier </item>

      <item>speechMarker </item>

      <item>nonSpeechDataFilter </item>

      <item>premphasizer </item>

      <item>windower </item>

      <item>fft </item>

      <item>melFilterBank </item>

      <item>dct </item>

      <item>liveCMN </item>

      <item>featureExtraction </item>

      </propertylist>

      </component>

      <!-- ******** -->

      <!-- The frontend pipelines -->

      <!-- ******** -->

      <component name="streamDataSource"

      type="edu.cmu.sphinx.frontend.util.StreamDataSource">

      <property name="sampleRate" value="16000"/>

      <property name="bitsPerSample" value="16"/>

      <property name="bigEndianData" value="false"/>

      <property name="signedData" value="true"/>

      <property name="bytesPerRead" value="320"/>

      </component>

      <component name="speechClassifier"

      type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier">

      <property name="threshold" value="13"/>

      </component>

      <component name="nonSpeechDataFilter"

      type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>

      <component name="speechMarker"

      type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" >

      <property name="speechTrailer" value="50"/>

      </component>

      <component name="premphasizer"

      type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>

      <component name="windower"

      type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">

      </component>

      <component name="fft"

      type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">

      </component>

      <component name="melFilterBank"

      type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">

      </component>

      <component name="dct"

      type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>

      <component name="liveCMN"

      type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>

      <component name="featureExtraction"

      type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>

      <!-- ********* -->

      <!-- monitors -->

      <!-- ********* -->

      <component name="accuracyTracker"

      type="edu.cmu.sphinx.instrumentation.AccuracyTracker">

      <property name="recognizer" value="${recognizer}"/>

      <property name="showAlignedResults" value="false"/>

      <property name="showRawResults" value="false"/>

      </component>

      <component name="memoryTracker"

      type="edu.cmu.sphinx.instrumentation.MemoryTracker">

      <property name="recognizer" value="${recognizer}"/>

      <property name="showSummary" value="false"/>

      <property name="showDetails" value="false"/>

      </component>

      <component name="speedTracker"

      type="edu.cmu.sphinx.instrumentation.SpeedTracker">

      <property name="recognizer" value="${recognizer}"/>

      <property name="frontend" value="${frontend}"/>

      <property name="showSummary" value="true"/>

      <property name="showDetails" value="false"/>

      </component>

      • <!-- ********

      -->

      • <!-- activeListManager

      -->

      • <!-- ********

      -->

      • <component name="activeListManager"

      type="edu.cmu.sphinx.decoder.search.SimpleActiveListManager">

      • <propertylist name="activeListFactories">

      <item>unitExitActiveList</item>

      <item>wordActiveList</item>

      <item>wordActiveList</item>

      <item>activeList</item>

      <item>activeList</item>

      <item>activeList</item>

      </propertylist>

      </component>

      <!-- ********

      -->

      • <!-- unitExitActiveList

      -->

      • <!-- ********

      -->

      • <component name="unitExitActiveList"

      type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">

      <property name="absoluteBeamWidth" value="-1" />

      <property name="logMath" value="logMath" />

      <property name="relativeBeamWidth" value="${relativeBeamWidth}" />

      </component>

      • <!-- ********

      -->

      • <!-- wordActiveList

      -->

      • <!-- ********

      -->

      • <component name="wordActiveList"

      type="edu.cmu.sphinx.decoder.search.WordActiveListFactory">

      <property name="absoluteBeamWidth" value="21" />

      <property name="logMath" value="logMath" />

      <property name="relativeBeamWidth" value="1E-25" />

      </component>

      • <!-- ********

      -->

      • <!-- recognizerMonitor

      -->

      • <!-- ********

      -->

      • <component name="recognizerMonitor"

      type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">

      <property name="recognizer" value="${recognizer}" />

      • <propertylist name="allocatedMonitors">

      <item>configMonitor</item>

      </propertylist>

      </component>

      <!-- ********* -->

      <!-- Miscellaneous components -->

      <!-- ********* -->

      <component name="logMath" type="edu.cmu.sphinx.util.LogMath">

      <property name="logBase" value="1.0001"/>

      <property name="useAddTable" value="true"/>

      </component>

      <component name="confidenceScorer"

      type="edu.cmu.sphinx.result.MAPConfidenceScorer">

      <property name="languageWeight" value="${languageWeight}"/>

      </component>

      </config>

       
      • Nickolay V. Shmyrev

        > I modified the Transcriber demo config file to use the Wall Street Journal language model (wsj5k.DMP).

        This is obviously wrong.

         
    • Pingu

      Pingu - 2008-08-06

      Hi,

      Could you please elaborate?

      I followed the config in the first post of this thread, which you said looks otherwise ok.

      http://sourceforge.net/forum/forum.php?thread_id=1935440&forum_id=5471

       
    • Adiilah

      Adiilah - 2008-02-23

      Hello,
      The recognition is not done live..thnk you should use BatchCMN instead of LiveCMN.Am not sure abt this?can any1 please clear this out??what's the difference between these 2 and what if we use Batch instead of Live??

       
    • Einstein Mic

      Einstein Mic - 2008-02-26

      These are the parameters I'm using for my application: -Xmx1024M -Xms512M With these parameters I can transcribe a wav file of 27 minutes in 2 hours and a half.
      I have to find an host where I can place the demo.
      When you speak of adaptation do you mean adaptation of the acoustic model after the train? I have read that something like this is done by some commercial software of speech recognition.

       
      • Nickolay V. Shmyrev

        > I can transcribe a wav file of 27 minutes in 2 hours and a half.

        Beams could be smaller, it will work faster

        > I have to find an host where I can place the demo.

        Use mediafire.com for example. It shouldn't be two hours, just a few minutes of speech

        > I have read that something like this is done by some commercial software of speech recognition.

        Not only commercial software :) Sphinx4 doesn't support mllr but sphinx3 does:

        http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/AcousticModelAdaptation

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.