CMU Sphinx / Forums / Help: Different results on the same utterance

Dino The Dinosaur - 2017-07-26

Hello!

I posted this question previously in another thread, but it will be better to create a separate thread for the question.

So, a matter of interest is different recognized results on the same audio. I used Python implementation for pocketsphinx and sphinxbase to check the fact, discovered the cmn parameter, which stands for adaptation to the device, on which the recognition is performed. After that I researched on cmn values and the modes of their calculation. As far as I understood, in batch mode the cmn values renew after every recognized utterance. With that knowledge I tried to organize my code so that decoder is configured each time before recognizing an utterance, having, in addition, customized -cmninit value. This way an utterance should have been recognized with the set cmn values and before any more data is recognized with changed cmn values the script creates a new decoder with the same -cmninit. But it didn't work: the results still differ.

What could be the cause for this randomness? The -dither parameter was disabled.

Thanks in advance,
Olya Yakovenko

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-26
  
  I posted this question previously in another thread, but it will be better to create a separate thread for the question.
  
  It is better to provide more technical details instead
  
  As far as I understood, in batch mode the cmn values renew after every recognized utterance.
  
  Not really, in batch mode CMN is calcualted for every utterance separately before the utterance, not after it.
  
  I tried to organize my code so that decoder is configured each time before recognizing an utterance, having, in addition, customized -cmninit value.
  
  That should be very slow to reconfigure decoder every time. cminit is only useful in live mode, not in batch mode.
  
  But it didn't work: the results still differ.
  
  Differ from what? From other runs, from other utterances? From the same run with the same utterance with different parameters? You are not quite clear here.
  
  There is also a noise estimation which is updated continuously. If you decode second time, the noise estimation might change so the result. You can disable noise estimation with -remove_noise no.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dino The Dinosaur - 2017-07-27

That should be very slow to reconfigure decoder every time.

It is slow :) But it is not a matter of consern since this is done for educational purposes.

Differ from what? From other runs, from other utterances? From the same run with the same utterance with different parameters? You are not quite clear here.

Other runs on the same utterance.

Not really, in batch mode CMN is calcualted for every utterance separately before the utterance, not after it. <...> cminit is only useful in live mode, not in batch mode. <...> There is also a noise estimation which is updated continuously. If you decode second time, the noise estimation might change so the result. You can disable noise estimation with -remove_noise no.

I see! This is very informative, thank you :)

Strangely, when I tried to configure my decoder with the parameter -remove_noise console returned a segmentation fault.

I'm using the latest versions of pocketsphinx, sphinxbase and the wrapper for python

config = Decoder.default_config() config.set_string('-hmm', hmm) config.set_string('-lm', lm) config.set_string('-dict', dic) config.set_float('-samprate', samprate) config.set_string('-remove_noise', 'no') config.set_string('-logfn', logfn) decoder = Decoder(config)

Last edit: Nickolay V. Shmyrev 2017-07-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-27
  
  config.set_string('-remove_noise', 'no')
  
  Remove_noise is boolean, not string, you should use
  
  config.set_boolean('-remove_noise', False)
  
  You can also reset noise stats with decoder.start_stream() actually, there is no need to disable noise cancellation.
  
  Last edit: Nickolay V. Shmyrev 2017-07-27
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dino The Dinosaur - 2017-07-27
    
    Oh! Yes, I see.
    Thank you!
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Different results on the same utterance

Speech Recognition Toolkit

Forums

Help

Different results on the same utterance document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Different results on the same utterance