Menu

FLV => FFMPEG => SPHINX4 problem

Help
2009-09-09
2012-09-22
  • Ben Norbury

    Ben Norbury - 2009-09-09

    Hello

    I am just getting to grips with Sphinx 4 and it is looking quite promising for my needs.

    This works:
    I record from my machine's webcam an audio clip using windows sound recorder at 22Khz. I upload and pass it through FFMPEG to reduce it down to 16Khz. I then run it through Sphinx 4 and it gives me back some words, great!

    Doesn't work:
    I record my sound using the same webcam but it is now streamed to the server and saved as an FLV file at 22Khz. I do the same, pass it through FFMPEG and reduce it to 16Khz. I now pass it through Sphinx 4 at there are no matches.

    The clips look the same when I view the audio details:

    Works:
    WAVE (.wav) file, byte length: 3260104, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 1630030

    Doesn't Work:
    WAVE (.wav) file, byte length: 723744, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 361850

    BTW when I say it doesn't work, the file is accepted but nothing is matched.

    I am running 1.0Beta3 on Linux.

    Any help would be really appreciated.

    Thank you
    Ben


    If it helps, I can post it all if needed...

    <property name="absoluteBeamWidth" value="5000"/>
    <property name="relativeBeamWidth" value="1E-120"/>
    <property name="absoluteWordBeamWidth" value="200"/>
    <property name="relativeWordBeamWidth" value="1E-80"/>
    <property name="wordInsertionProbability" value="0.2"/>
    <property name="languageWeight" value="10.5"/>
    <property name="silenceInsertionProbability" value=".1"/>
    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>

    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Dictionary configuration                            --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;dictionary&quot; 
        type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
        &lt;property name=&quot;dictionaryPath&quot;
                  value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d&quot;/&gt;
        &lt;property name=&quot;fillerPath&quot; 
              value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
        &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
        &lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The Language Model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;trigramModel&quot; 
        type=&quot;edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel&quot;&gt;
        &lt;property name=&quot;location&quot; 
            value=&quot;resource:/edu.cmu.sphinx.demo.transcriber.Transcriber!/edu/cmu/sphinx/demo/transcriber/transcriber.trigram.lm&quot;/&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
        &lt;property name=&quot;maxDepth&quot; value=&quot;3&quot;/&gt;
        &lt;property name=&quot;unigramWeight&quot; value=&quot;.7&quot;/&gt;
    &lt;/component&gt;
    
    
    &lt;!-- ******************************************************** --&gt;
    &lt;!-- The acoustic model configuration                         --&gt;
    &lt;!-- ******************************************************** --&gt;
    &lt;component name=&quot;wsj&quot;
               type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
        &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
    &lt;component name=&quot;wsjLoader&quot; type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
        &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
        &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
    &lt;/component&gt;
    
     
    • Ben Norbury

      Ben Norbury - 2009-09-10

      Thanks for the info Nickolay.

       
    • Nickolay V. Shmyrev

      This is not going to work because of audio compression used. It's often ADPCM or CELP at 8 kHz. You need to take a telephone models and adapt them to the audio gone through the channel.

      I would better avoid using FLV for transmission/storage.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.