CMU Sphinx / Forums / Help: Decoding News video

Hi all,

I am working on decoding news videos of about 4 minutes long. Currently I am using this one: http://www.youtube.com/watch?v=GrxzWWkZlr0. First I retrieve the video, extract and convert the sound in the suitable wav format.

Then I am using the HUB4 acoustic model and Gigaword language model (http://www.inference.phy.cam.ac.uk/kv227/lm_giga/). I have got so far really low accuracy rate, not to say close to 0%. I have tried to tune the general properties in different ways as well as the frontend but I still don't have any success.

Could it be explained by the duration of the video and its content?

Any tip would be really welcome. Thank you by advance.

I enclose the main parts of my configuration file:

<component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="decoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
<item>memoryTracker </item>
<item>recognizerMonitor </item>
</propertylist>
</component>

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Decoder configuration --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;wordPruningSearchManager&quot;/&gt;     
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Search Manager --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;wordPruningSearchManager&quot; type=&quot;edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;linguist&quot; value=&quot;lexTreeLinguist&quot;/&gt;
    &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
    &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
    &lt;property name=&quot;activeListManager&quot; value=&quot;activeListManager&quot;/&gt;
    &lt;property name=&quot;growSkipInterval&quot; value=&quot;0&quot;/&gt;
    &lt;property name=&quot;keepAllTokens&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;checkStateOrder&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;buildWordLattice&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;maxLatticeEdges&quot; value=&quot;3&quot;/&gt;
    &lt;property name=&quot;acousticLookaheadFrames&quot; value=&quot;2.0&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Active Lists --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;activeListManager&quot; type=&quot;edu.cmu.sphinx.decoder.search.SimpleActiveListManager&quot;&gt;
    &lt;propertylist name=&quot;activeListFactories&quot;&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;wordActiveListFactory&lt;/item&gt;
        &lt;item&gt;wordActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
        &lt;item&gt;standardActiveListFactory&lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;component name=&quot;standardActiveListFactory&quot; type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordActiveListFactory&quot; type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteWordBeamWidth}&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeWordBeamWidth}&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Pruner --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;trivialPruner&quot; type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Scorer --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;threadedScorer&quot; type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;isCpuRelative&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;numThreads&quot; value=&quot;1&quot;/&gt;
    &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
    &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;
&lt;!-- ******************************************************** --&gt;
&lt;!-- The linguist configuration --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;lexTreeLinguist&quot; type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;hub4&quot;/&gt;
    &lt;property name=&quot;languageModel&quot; value=&quot;gigaModel&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryHUB4&quot;/&gt;
    &lt;!-- &lt;property name=&quot;addFillerWords&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;fillerInsertionProbability&quot; value=&quot;1E-10&quot;/&gt; --&gt;
    &lt;property name=&quot;generateUnitStates&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;wantUnigramSmear&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;unigramSmearWeight&quot; value=&quot;1&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;silenceInsertionProbability&quot; value=&quot;${silenceInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Dictionary configuration HUB4                        --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;dictionaryHUB4&quot; 
    type=&quot;edu.cmu.sphinx.linguist.dictionary.FullDictionary&quot;&gt;
    &lt;property name=&quot;dictionaryPath&quot;
              value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/cmudict.06d&quot;/&gt;
    &lt;property name=&quot;fillerPath&quot; 
          value=&quot;resource:/edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model!/edu/cmu/sphinx/model/acoustic/HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz/fillerdict&quot;/&gt;
    &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
    &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;        
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Language Model configuration HUB4 --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;hub4Model&quot; 
    type=&quot;edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel&quot;&gt;        
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;        
    &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
    &lt;property name=&quot;unigramWeight&quot; value=&quot;.5&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryHUB4&quot;/&gt;
    &lt;property name=&quot;location&quot;
        value=&quot;D:/lectures/Master_Soft_Eng/Thesis_Work(exd950)/ExperimentalSystem/AutoSubGen/models/language/HUB4_trigram_lm/language_model.arpaformat.DMP&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Language Model configuration GIGA --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;gigaModel&quot; 
    type=&quot;edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel&quot;&gt;        
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;        
    &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
    &lt;property name=&quot;unigramWeight&quot; value=&quot;.5&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryHUB4&quot;/&gt;
    &lt;property name=&quot;location&quot;
        value=&quot;D:/lectures/Master_Soft_Eng/Thesis_Work(exd950)/ExperimentalSystem/AutoSubGen/models/language/lm_giga_64k_vp_3gram.DMP&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration HUB4                    --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;hub4&quot;
           type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;hub4Loader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;hub4Loader&quot; type=&quot;edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.ModelLoader&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The unit manager configuration --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;unitManager&quot; type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend configuration --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;wavFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;streamDataSource&lt;/item&gt;           
        &lt;item&gt;speechClassifier&lt;/item&gt;
        &lt;item&gt;speechMarker&lt;/item&gt;
        &lt;item&gt;nonSpeechDataFilter&lt;/item&gt;
        &lt;item&gt;premphasizer&lt;/item&gt;
        &lt;item&gt;windower&lt;/item&gt;
        &lt;item&gt;fft&lt;/item&gt;
        &lt;item&gt;melFilterBank&lt;/item&gt;
        &lt;item&gt;dct&lt;/item&gt;
        &lt;item&gt;liveCMN&lt;/item&gt; &lt;!-- batchCMN for batch mode --&gt;
        &lt;item&gt;featureExtraction&lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;component name=&quot;streamDataSource&quot; type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
    &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
    &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
    &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;bytesPerRead&quot; value=&quot;320&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speechClassifier&quot; type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
    &lt;property name=&quot;threshold&quot; value=&quot;12&quot;/&gt;       
&lt;/component&gt;

&lt;component name=&quot;nonSpeechDataFilter&quot; type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;&gt;        
&lt;/component&gt;

&lt;component name=&quot;speechMarker&quot; type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot;&gt;
    &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;premphasizer&quot; type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;&gt; 
    &lt;property name=&quot;factor&quot; value=&quot;0.9&quot;/&gt; 
&lt;/component&gt;

&lt;component name=&quot;windower&quot; type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt; 
    &lt;!-- &lt;property name=&quot;windowSizeInMs&quot; value=&quot;25&quot;/&gt; --&gt; 
&lt;/component&gt;

&lt;component name=&quot;fft&quot; type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;/&gt;

&lt;component name=&quot;melFilterBank&quot; type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt; 
    &lt;!-- &lt;property name=&quot;numberFilters&quot; value=&quot;40&quot;/&gt; 
    &lt;property name=&quot;minimumFrequency&quot; value=&quot;130.0&quot;/&gt; 
    &lt;property name=&quot;maximumFrequency&quot; value=&quot;6800.0&quot;/&gt; --&gt; 
&lt;/component&gt;

&lt;component name=&quot;dct&quot; type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;

&lt;component name=&quot;liveCMN&quot; type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;

&lt;component name=&quot;batchCMN&quot; type=&quot;edu.cmu.sphinx.frontend.feature.BatchCMN&quot;/&gt;

&lt;component name=&quot;featureExtraction&quot; type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;

Been a while since I've looked at Sphinx but I seem to remember that the audio used as your input has to be the same format and quality as that with which the AM was trained. Although you are converting to the correct WAV format, the actualy spectral audio features which Sphinx relies on to identify speech components, are modified through the compression that was applied when the original recording was transcoded to FLV. Through this process, vital information is lost and cannot simply be recreated by transcoding or upsampling to the format that is expected.

So in other words, if I record a WAV containing speech data and transcode to MP3 or FLV or anything that involves lossy compression, and then transcode back to the original WAV format, it is not exactly the same file... It has basically reduced to the quality of the compressed format, and is therefore not going to work well with Sphinx, as you have found.

See if you can recordings from other sources which have not compressed the audio track. FM radio might be a good bet (record directly to WAV from an FM radio source).

Regards,
Chris

Decoding News video - Low accuracy

Speech Recognition Toolkit

Forums

Help

Decoding News video - Low accuracy

Decoding News video - Low accuracy

Speech Recognition Toolkit

Forums

Help

Decoding News video - Low accuracy document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Decoding News video - Low accuracy