Menu

Tips for improving performance (Sphinx4)?

Help
d_h_benson
2008-09-03
2012-09-22
  • d_h_benson

    d_h_benson - 2008-09-03

    Hi,

    I'm using Sphinx4 to analyze audio files with a highly restrictive grammar. I'm not getting terribly useful results, however. I wonder if any of you might be able to have a quick look at the data and configuration file and suggest how to improve the performance. Any assistance would be greatly appreciated.

    Here's a bit of background:

    The files I'm analyzing are responses from a psychological experiment investigating reading speed. Subjects were presented with a screen full of letters, and were asked to read them back as quickly as possible. As a result, the audio files contain just a few letters -- J, H, F, U and V -- repeated several times in a random order.

    I'm trying to extract both the sequence of letters that were read, and the timings of the utterances.

    The program I wrote is based on the HelloWorld demo and the Wav transcription demo. Since the set of possible utterances is so small, though, I wrote a simple new grammar definition as well.

    Below are links to relevant files and a complete copy of the configuration file I'm using. If you need anything else, or would prefer these in a different format, I'll gladly oblige.

    If any of you have any bright ideas, please send them my way!

    Many thanks,

    Dave Benson

    >>

    Here's the grammar definition:
    http://www.mediafire.com/file/vyvm0grn8ny/letters.gram

    Here's the main class of the application:
    http://www.mediafire.com/file/jzwt9dvurbw/WordTimings.java

    Here's a sample audio file:
    http://www.mediafire.com/file/7baafegaaaz/test_file.wav

    Here's an audacity project showing the audio file and the resulting analysis:
    http://www.mediafire.com/file/jq3ndhkcna7/test_file_audacity_project.zip

    Here's the config file:
    http://www.mediafire.com/file/zchocuucgcw/wordtimings_letters.config.xml

    >>
    <?xml version="1.0" encoding="UTF-8"?>

    <!--
    Sphinx-4 Configuration file
    -->

    <!-- ******** -->
    <!-- an4 configuration file -->
    <!-- ******** -->

    <config>

    &lt;!-- ******************************************************** --&gt;
    &lt;!-- frequently tuned properties                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;
    
    &lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;-1&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-80&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
    

    <!-- <property name="silenceInsertionProbability" value="0.1"/> -->
    <property name="languageWeight" value="8"/>

    &lt;property name=&quot;frontend&quot; value=&quot;epFrontEnd&quot;/&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
    &lt;property name=&quot;showCreations&quot; value=&quot;false&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- word recognizer configuration                            --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;recognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
        &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
        &lt;propertylist name=&quot;monitors&quot;&gt;
            &lt;item&gt;accuracyTracker &lt;/item&gt;
            &lt;item&gt;speedTracker &lt;/item&gt;
            &lt;item&gt;memoryTracker &lt;/item&gt;
        &lt;/propertylist&gt;
    

    </component>

    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Decoder   configuration                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
        &lt;property name=&quot;searchManager&quot; value=&quot;searchManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;searchManager&quot; 
        type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;linguist&quot; value=&quot;flatLinguist&quot;/&gt;
        &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
        &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
        &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;activeList&quot; 
             type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
        &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;trivialPruner&quot; 
                type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;
    
    &lt;component name=&quot;threadedScorer&quot; 
                type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
        &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
        &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
        &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
        &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The linguist  configuration                              --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;flatLinguist&quot; 
                type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;grammar&quot; value=&quot;jsgfGrammar&quot;/&gt;
        &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
        &lt;property name=&quot;wordInsertionProbability&quot; 
                value=&quot;${wordInsertionProbability}&quot;/&gt;
        &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Grammar  configuration                               --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;jsgfGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
        &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;!--     &lt;property name=&quot;grammarLocation&quot; 
             value=&quot;resource:/demo.sphinx.helloworld.HelloWorld!/demo/sphinx/helloworld/&quot;/&gt;
    --&gt;
        &lt;property name=&quot;grammarLocation&quot; 
             value=&quot;bin/demo/sphinx/wordtimings/&quot;/&gt;
        &lt;property name=&quot;grammarName&quot; value=&quot;letters&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Dictionary configuration                            --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;dictionary&quot; 
        type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
        &lt;property name=&quot;dictionaryPath&quot;
     value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d&quot;/&gt;
        &lt;property name=&quot;fillerPath&quot; 
     value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
        &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;true&quot;/&gt;
    

    <!-- <property name="wordReplacement" value="&lt;sil&gt;"/>
    <property name="allowMissingWords" value="false"/> -->
    <property name="unitManager" value="unitManager"/>
    </component>

    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The acoustic model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;wsj&quot; 
      type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
        &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;wsjLoader&quot;
               type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The unit manager configuration                           --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;unitManager&quot; 
        type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The live frontend configuration                          --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
        &lt;propertylist name=&quot;pipeline&quot;&gt;
            &lt;!-- &lt;item&gt;microphone &lt;/item&gt; --&gt;
            &lt;item&gt;streamDataSource &lt;/item&gt;
            &lt;item&gt;speechClassifier &lt;/item&gt;      
            &lt;item&gt;speechMarker &lt;/item&gt;          
            &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;   
            &lt;item&gt;premphasizer &lt;/item&gt;
            &lt;item&gt;windower &lt;/item&gt;
            &lt;item&gt;fft &lt;/item&gt;
            &lt;item&gt;melFilterBank &lt;/item&gt;
            &lt;item&gt;dct &lt;/item&gt;
            &lt;item&gt;liveCMN &lt;/item&gt;
            &lt;item&gt;featureExtraction &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The frontend without endpointer configuration                          --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;nonEpFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
        &lt;propertylist name=&quot;pipeline&quot;&gt;
            &lt;!-- &lt;item&gt;microphone &lt;/item&gt; --&gt;
            &lt;item&gt;streamDataSource &lt;/item&gt;
            &lt;item&gt;premphasizer &lt;/item&gt;
            &lt;item&gt;windower &lt;/item&gt;
            &lt;item&gt;fft &lt;/item&gt;
            &lt;item&gt;melFilterBank &lt;/item&gt;
            &lt;item&gt;dct &lt;/item&gt;
            &lt;item&gt;liveCMN &lt;/item&gt;
            &lt;item&gt;featureExtraction &lt;/item&gt;
        &lt;/propertylist&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The frontend pipelines                                   --&gt;
    &lt;!-- ******************************************************** --&gt;
    
    &lt;component name=&quot;streamDataSource&quot;
                type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
        &lt;property name=&quot;sampleRate&quot; value=&quot;44100&quot;/&gt;
        &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
        &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
        &lt;property name=&quot;bytesPerRead&quot; value=&quot;320&quot;/&gt;
    &lt;/component&gt;
    

    <!-- <component name="microphone"
    type="edu.cmu.sphinx.frontend.util.Microphone">
    <property name="msecPerRead" value="10"/>
    <property name="closeBetweenUtterances" value="false"/>
    </component>
    -->

    &lt;component name=&quot;speechClassifier&quot; 
               type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
        &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;nonSpeechDataFilter&quot; 
               type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;
    
    &lt;component name=&quot;speechMarker&quot; 
               type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
        &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;premphasizer&quot; 
               type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
    
    &lt;component name=&quot;windower&quot; 
               type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;fft&quot; 
            type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;melFilterBank&quot; 
        type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;dct&quot; 
            type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
    
    &lt;component name=&quot;liveCMN&quot; 
               type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;
    
    &lt;component name=&quot;featureExtraction&quot; 
               type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
    
    &lt;!-- ******************************************************* --&gt;
    &lt;!--  monitors                                               --&gt;
    &lt;!-- ******************************************************* --&gt;
    
    &lt;component name=&quot;accuracyTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.AccuracyTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;memoryTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;speedTracker&quot; 
                type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
        &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
        &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;showSummary&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
    &lt;/component&gt;
    
    &lt;!-- ******************************************************* --&gt;
    &lt;!--  Miscellaneous components                               --&gt;
    &lt;!-- ******************************************************* --&gt;
    
    &lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
        &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
        &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
    &lt;/component&gt;
    

    </config>

     
    • d_h_benson

      d_h_benson - 2008-09-03

      As a secondary question, in some segments Sphinx4 fails to recognize any words at all. Do you know what settings I might be able to change to avoid this?

       
    • Nickolay V. Shmyrev

      (00:55:25) nshm: d_h_benson: <property name="sampleRate" value="44100"/>  <- this will not work for sure
      (00:55:33) nshm: samplerate must be 16000
      (00:55:48) nshm: but you can use AudioStreamDataSource that will do conversion for you

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.