Menu

pocketsphinx_batch works well on a trained ".cont." model but pocketsphinx_continuous gives wrong output

Help
2015-01-05
2018-06-29
  • Yuval Karon

    Yuval Karon - 2015-01-05

    Hello,

    I have trained a model and use it with pocketsphinx_batch with good results.
    When I tried it with pocketsphinx_continuous the results were not good.

    The log files (batch and continuous), audio, mfc, model and training configuration are in:
    https://www.dropbox.com/s/7i0fdwwx06zr3t5/n4.zip?dl=0

    The mfc file was created by the command:

     sphinx_fe.exe -i noam-bug4-sil.wav -o noam-bug4-sil.mfc -nist no -raw no -mswav yes -samprate 16000 -nfilt 40 -lowerf 133.3334 -upperf 6855.4976
    

    The differences are substantial, yet the sample rate in the continuous run is 16,000
    and the CMN values look similar (I copied the batch CMN values into the log of the
    continuous run to make comparison easy).

    What could be the reason for the different outputs?

    I tried with the sphinx5-prealpha binaries (from "cmusphinx/files", not the trunk)
    and got the same phenomena: batch was good but continuous was not good.

    The models were trained using sphinxtrain-0.8 so I used "-remove_noise no" in the
    sphinx5-prealpha runs - as advised in another discussion.

    What could be the reason for the difference?

     Many thanks,
       Yuval
    
     
    • Nickolay V. Shmyrev

      Hello Yuval

      Sorry, there is no dictionary and lm in your archive so it's hard to reproduce the results but most likely it's due to CMN values. Try -cmninit and it should be better.

      As for the same values printed, continuous cmn is computed after the utterance so they will be used for the next utterance while batch cmn is computed before utterance, this is the difference. So despite values are similar, initial value still has more effect.

       
  • Yuval Karon

    Yuval Karon - 2015-01-05

    It worked, thanks!

    The first CMN value was '13.92', I got the correct recognition with -cmninit values of 12 to 17 .

    Under what conditions can I expect ps-continuous to give similar (even if not identical)
    results to ps-batch?
    does it have to do with properties of the training data?

    or should I use the CMN values from the log of the sphinxtrain decode phase?

     

    Last edit: Yuval Karon 2015-01-05
    • Nickolay V. Shmyrev

      Proper CMN estimation is important and it's not trivial how to do that in continuous setting. Batch CMN has it's own disadvantages actually because it also can not properly estimate CMN from short utterances.

      In sphinx4 we implemented combined method where we read first few seconds of audio and process it in batch mode and then switch to live continuous mode. In pocketsphinx this approach is not implemented yet.

      Initial mean estimation affects only the first utterance. It's ok to use initial estimation from sphinxtrain, on second utterance it wouldn't be important.

       
  • Yuval Karon

    Yuval Karon - 2015-01-06

    Thank you!

    Forgive my ignorance, does the computation of CMN values involve the LM and dictionary or are they inherent properties of the audio?

    I would like to use ps-continuous for keyword search. If the CMN are properties of the audio alone, perhaps I could estimate them with a batch run - even if the audio contains
    words missing in the dictionary? and use for kws? does it make sense? (sounds too good...)

    Is the continuous CMN estimation more reliable than batch in short utterances?
    (then, if the input contains several utterances, would it be better to calculate the CMN
    values from concatenation of the utterances?)

    Yuval

     
  • Nickolay V. Shmyrev

    Forgive my ignorance, does the computation of CMN values involve the LM and dictionary or are they inherent properties of the audio?

    CMN is property of the audio. Essentially it's volume in different frequency bands.

    I would like to use ps-continuous for keyword search. If the CMN are properties of the audio alone, perhaps I could estimate them with a batch run - even if the audio contains words missing in the dictionary? and use for kws? does it make sense? (sounds too good...)

    Like I wrote above, intelligent CMN algorithm could be implemented. For example you might estimate CMN from first 5 seconds of the speech and then proceed with that estimation in live mode. If you have a whole audio you can also estimate CMN for the whole at once.

    Is the continuous CMN estimation more reliable than batch in short utterances?
    (then, if the input contains several utterances, would it be better to calculate the CMN values from concatenation of the utterances?)

    This is correct.

     
  • Yuval Karon

    Yuval Karon - 2015-01-07

    I see, thank you,

    Yuval

     
  • Mainak Biswas

    Mainak Biswas - 2018-06-19

    Nickolay, as you wrote above that we can estimate CMN for the audio file at once, how can I do this ?
    pocketsphinx_batch can give the CMN values for the audio files. Is this what you were talking about ?
    If possible is there any example that can explain exact difference between pockesphinx_batch and pocketsphinx_continuous ?

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.