James - 2006-04-16

HI-

Let me preface this by saying that I have attempted to read everything on the website and as much of the forums as I can before I posted my problem.

I have an mp3 file that I would like to transcribe.

I convert the mp3 file to wav using mp3.jar (javaworld article code on how to do this) and jl1.0.jar (more open source java mp3 libraries). That all seems to go well, I get a wav file with:

WAVE (.wav) file, byte length: 789322, data format: PCM_SIGNED 16000.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian, frame length: 197313

Good right? I have 16khz, 16bit, little-endian, everything should be well.

So I am then using the following config file, pinched from transcriber and modified as the readme mentions (change grammar, linguist, and language model):

<?xml version="1.0" encoding="UTF-8"?>

<!--
Sphinx-4 Configuration file
-->

<!-- ******** -->
<!-- an4 configuration file -->
<!-- ******** -->

<config>

&lt;!-- ******************************************************** --&gt;
&lt;!-- frequently tuned properties                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;

&lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;-1&quot;/&gt;
&lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-80&quot;/&gt;
&lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
 &lt;property name=&quot;silenceInsertionProbability&quot; value=&quot;.1&quot;/&gt;
&lt;property name=&quot;languageWeight&quot;     value=&quot;8&quot;/&gt;
&lt;property name=&quot;frontend&quot; value=&quot;epFrontEnd&quot;/&gt;
&lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
&lt;property name=&quot;showCreations&quot; value=&quot;false&quot;/&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- word recognizer configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;recognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
    &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
    &lt;propertylist name=&quot;monitors&quot;&gt;
        &lt;item&gt;accuracyTracker &lt;/item&gt;
        &lt;item&gt;speedTracker &lt;/item&gt;
        &lt;item&gt;memoryTracker &lt;/item&gt;
    &lt;/propertylist&gt;

</component>

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Decoder   configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;searchManager&quot;/&gt;
&lt;/component&gt;

<component name="searchManager"
type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
<property name="logMath" value="logMath"/>
<property name="linguist" value="lexTreeLinguist"/>
<property name="pruner" value="trivialPruner"/>
<property name="scorer" value="threadedScorer"/>
<property name="activeListFactory" value="activeList"/>
</component>

 &lt;!-- ******************************************************** --&gt;
&lt;!-- The Active Lists                                         --&gt;
&lt;!-- ******************************************************** --&gt;

 &lt;component name=&quot;activeList&quot; 
         type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;trivialPruner&quot; 
            type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;

&lt;component name=&quot;threadedScorer&quot; 
            type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
    &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
    &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The linguist  configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;lexTreeLinguist&quot; 
           type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;hub4&quot;/&gt;
    &lt;property name=&quot;languageModel&quot; value=&quot;trigramModel&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;addFillerWords&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;fillerInsertionProbability&quot; value=&quot;1E-10&quot;/&gt;
    &lt;property name=&quot;generateUnitStates&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;wantUnigramSmear&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;unigramSmearWeight&quot; value=&quot;1&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; 
            value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;silenceInsertionProbability&quot; 
            value=&quot;${silenceInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Dictionary configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;dictionary&quot;
      type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
    &lt;property name=&quot;dictionaryPath&quot; value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/cmudict.06d&quot;/&gt;
    &lt;property name=&quot;fillerPath&quot; value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/fillerdict&quot;/&gt;
    &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

 &lt;!-- ************************************************** --&gt;
&lt;!-- trigramModel                                       --&gt;
&lt;!-- ************************************************** --&gt;

&lt;component name=&quot;trigramModel&quot;
      type=&quot;edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel&quot;&gt;
    &lt;property name=&quot;unigramWeight&quot; value=&quot;.5&quot;/&gt;
    &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;location&quot;
              value=&quot;C:/eclipse/workspace/PodCastSearch/lab/speech/sphinx4/data/hub4_model/language_model.arpaformat.DMP&quot;/&gt;
&lt;/component&gt;

&lt;!-- ************************************************** --&gt;
&lt;!-- flatUnigramModel                                   --&gt;
&lt;!-- ************************************************** --&gt;
&lt;component name=&quot;flatUnigramModel&quot; 
            type=&quot;edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel&quot;&gt;
    &lt;property name=&quot;location&quot; 
               value=&quot;hub4.flat_unigram.lm&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;maxDepth&quot; value=&quot;1&quot;/&gt;
    &lt;property name=&quot;unigramWeight&quot; value=&quot;.7&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;hub4&quot;
    type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;sphinx3Loader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;sphinx3Loader&quot;
           type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.ModelLoader&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;wsj&quot;
           type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wsjLoader&quot; type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The unit manager configuration                           --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;unitManager&quot; 
    type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The live frontend configuration                          --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;streamDataSource &lt;/item&gt;
        &lt;item&gt;speechClassifier &lt;/item&gt;
        &lt;item&gt;speechMarker &lt;/item&gt;
        &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
        &lt;item&gt;premphasizer &lt;/item&gt;
        &lt;item&gt;windower &lt;/item&gt;
        &lt;item&gt;fft &lt;/item&gt;
        &lt;item&gt;melFilterBank &lt;/item&gt;
        &lt;item&gt;dct &lt;/item&gt;
        &lt;item&gt;liveCMN &lt;/item&gt;
        &lt;item&gt;featureExtraction &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend pipelines                                   --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;streamDataSource&quot;
            type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
    &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
    &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
    &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;bytesPerRead&quot; value=&quot;320&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speechClassifier&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
    &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;nonSpeechDataFilter&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;

&lt;component name=&quot;speechMarker&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
    &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;premphasizer&quot; 
           type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;

&lt;component name=&quot;windower&quot; 
           type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;fft&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;melFilterBank&quot; 
    type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;dct&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;

&lt;component name=&quot;liveCMN&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;

&lt;component name=&quot;featureExtraction&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;

&lt;!-- ******************************************************* --&gt;
&lt;!--  monitors                                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;accuracyTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.AccuracyTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;memoryTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
&lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
&lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speedTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
&lt;property name=&quot;showSummary&quot; value=&quot;true&quot;/&gt;
&lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************* --&gt;
&lt;!--  Miscellaneous components                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
    &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
    &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

</config>

So as you can see I am using the hub4 models (and dicts) to attempt my decoding. Also note where I specify the wav file format, I think that is right, though the bytesPerRead I am less sure of (though I read a recent post saying that what I have is correct, I think, and does being in stereo matter?).

So my problem is that on a 10 second wav file, the runtime is extraordinary (over 20 minutes so far on a 3.2GHz dual-core machine with 4 GB of ram) and the transcription is really terrible.

Does anyone have any config files for transcribing text that work okay, that I could see? This seems like a fairly common usage, maybe we could add it to the wiki....

Thanks much in advance, and apologies for my incompetence.

James