CMU Sphinx / Forums / Help: Transcription Problems

HI-

Let me preface this by saying that I have attempted to read everything on the website and as much of the forums as I can before I posted my problem.

I have an mp3 file that I would like to transcribe.

I convert the mp3 file to wav using mp3.jar (javaworld article code on how to do this) and jl1.0.jar (more open source java mp3 libraries). That all seems to go well, I get a wav file with:

WAVE (.wav) file, byte length: 789322, data format: PCM_SIGNED 16000.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian, frame length: 197313

Good right? I have 16khz, 16bit, little-endian, everything should be well.

So I am then using the following config file, pinched from transcriber and modified as the readme mentions (change grammar, linguist, and language model):

<?xml version="1.0" encoding="UTF-8"?>

&lt;!-- ******************************************************** --&gt;
&lt;!-- frequently tuned properties                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;

&lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;-1&quot;/&gt;
&lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-80&quot;/&gt;
&lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
 &lt;property name=&quot;silenceInsertionProbability&quot; value=&quot;.1&quot;/&gt;
&lt;property name=&quot;languageWeight&quot;     value=&quot;8&quot;/&gt;
&lt;property name=&quot;frontend&quot; value=&quot;epFrontEnd&quot;/&gt;
&lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
&lt;property name=&quot;showCreations&quot; value=&quot;false&quot;/&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- word recognizer configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;recognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
    &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
    &lt;propertylist name=&quot;monitors&quot;&gt;
        &lt;item&gt;accuracyTracker &lt;/item&gt;
        &lt;item&gt;speedTracker &lt;/item&gt;
        &lt;item&gt;memoryTracker &lt;/item&gt;
    &lt;/propertylist&gt;

</component>

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Decoder   configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;searchManager&quot;/&gt;
&lt;/component&gt;

 &lt;!-- ******************************************************** --&gt;
&lt;!-- The Active Lists                                         --&gt;
&lt;!-- ******************************************************** --&gt;



 &lt;component name=&quot;activeList&quot; 
         type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
&lt;/component&gt;




&lt;component name=&quot;trivialPruner&quot; 
            type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;

&lt;component name=&quot;threadedScorer&quot; 
            type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
    &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
    &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The linguist  configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;lexTreeLinguist&quot; 
           type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;hub4&quot;/&gt;
    &lt;property name=&quot;languageModel&quot; value=&quot;trigramModel&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;addFillerWords&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;fillerInsertionProbability&quot; value=&quot;1E-10&quot;/&gt;
    &lt;property name=&quot;generateUnitStates&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;wantUnigramSmear&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;unigramSmearWeight&quot; value=&quot;1&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; 
            value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;silenceInsertionProbability&quot; 
            value=&quot;${silenceInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Dictionary configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;dictionary&quot;
      type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
    &lt;property name=&quot;dictionaryPath&quot; value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/cmudict.06d&quot;/&gt;
    &lt;property name=&quot;fillerPath&quot; value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/fillerdict&quot;/&gt;
    &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

 &lt;!-- ************************************************** --&gt;
&lt;!-- trigramModel                                       --&gt;
&lt;!-- ************************************************** --&gt;

&lt;component name=&quot;trigramModel&quot;
      type=&quot;edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel&quot;&gt;
    &lt;property name=&quot;unigramWeight&quot; value=&quot;.5&quot;/&gt;
    &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;location&quot;
              value=&quot;C:/eclipse/workspace/PodCastSearch/lab/speech/sphinx4/data/hub4_model/language_model.arpaformat.DMP&quot;/&gt;
&lt;/component&gt;

&lt;!-- ************************************************** --&gt;
&lt;!-- flatUnigramModel                                   --&gt;
&lt;!-- ************************************************** --&gt;
&lt;component name=&quot;flatUnigramModel&quot; 
            type=&quot;edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel&quot;&gt;
    &lt;property name=&quot;location&quot; 
               value=&quot;hub4.flat_unigram.lm&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;maxDepth&quot; value=&quot;1&quot;/&gt;
    &lt;property name=&quot;unigramWeight&quot; value=&quot;.7&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;hub4&quot;
    type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;sphinx3Loader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;sphinx3Loader&quot;
           type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.ModelLoader&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;wsj&quot;
           type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wsjLoader&quot; type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The unit manager configuration                           --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;unitManager&quot; 
    type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The live frontend configuration                          --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;streamDataSource &lt;/item&gt;
        &lt;item&gt;speechClassifier &lt;/item&gt;
        &lt;item&gt;speechMarker &lt;/item&gt;
        &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
        &lt;item&gt;premphasizer &lt;/item&gt;
        &lt;item&gt;windower &lt;/item&gt;
        &lt;item&gt;fft &lt;/item&gt;
        &lt;item&gt;melFilterBank &lt;/item&gt;
        &lt;item&gt;dct &lt;/item&gt;
        &lt;item&gt;liveCMN &lt;/item&gt;
        &lt;item&gt;featureExtraction &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend pipelines                                   --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;streamDataSource&quot;
            type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
    &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
    &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
    &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;bytesPerRead&quot; value=&quot;320&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speechClassifier&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
    &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;nonSpeechDataFilter&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;

&lt;component name=&quot;speechMarker&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
    &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;premphasizer&quot; 
           type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;

&lt;component name=&quot;windower&quot; 
           type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;fft&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;melFilterBank&quot; 
    type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;dct&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;

&lt;component name=&quot;liveCMN&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;

&lt;component name=&quot;featureExtraction&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;


&lt;!-- ******************************************************* --&gt;
&lt;!--  monitors                                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;accuracyTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.AccuracyTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;memoryTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
&lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
&lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speedTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
&lt;property name=&quot;showSummary&quot; value=&quot;true&quot;/&gt;
&lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************* --&gt;
&lt;!--  Miscellaneous components                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
    &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
    &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

</config>

So as you can see I am using the hub4 models (and dicts) to attempt my decoding. Also note where I specify the wav file format, I think that is right, though the bytesPerRead I am less sure of (though I read a recent post saying that what I have is correct, I think, and does being in stereo matter?).

So my problem is that on a 10 second wav file, the runtime is extraordinary (over 20 minutes so far on a 3.2GHz dual-core machine with 4 GB of ram) and the transcription is really terrible.

Does anyone have any config files for transcribing text that work okay, that I could see? This seems like a fairly common usage, maybe we could add it to the wiki....

Thanks much in advance, and apologies for my incompetence.

James

Transcription Problems

Speech Recognition Toolkit

Forums

Help

Transcription Problems document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Transcription Problems