Energy Levels for Effective Word Recognition

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Energy Levels for Effective Word Recognition

Forum: Help

Creator: Serotonergic

Created: 2014-07-03

Updated: 2014-07-04

Serotonergic - 2014-07-03

Is there a recommended energy threshold required for sphinx speech recognition to work effectively (specifically for pocketsphinx_continuous, sphinx3_align and sphinx_pitch)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-07-03
  
  I'm not sure what energy threshold are you asking about, there is no such thing in pocketsphinx.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Serotonergic - 2014-07-03

Thanks for the response. I mean is there a recommended threshold, e.g., SNR value, signal amplitude value, etc., for an input speech file, we know that pocketsphinx or sphinx3_align will struggle to pick up the words. We have noticed that pocketsphinx is not "recognizing" words if their energy values in the speech are "low". Any recommendation in this regard would be very helpful.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Pranav Jawale - 2014-07-04
  
  @Serotonergic
  
  Amplitude is effectively normalized during CMN. So it shouldn't be a
  problem, as long as SNR is good. Assuming your training data is clean
  speech, higher the SNR better it is.
  
  (relevant threads,
  http://sourceforge.net/p/cmusphinx/mailman/message/31705208/
  
  http://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/d3d10c7e/
  
  Last edit: Pranav Jawale 2014-07-04
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Serotonergic - 2014-07-04

Thanks Pranav. From the second post I would be interested to know what the "high" and "low" values are (recommended). Yes, my data is clean.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Pranav Jawale - 2014-07-04
  
  There is nothing recommended (AFAIK), apart from avoiding clipping. As you
  can see even in WSJ, which is a standard database, they have done recording
  at as low as 0.05 amplitude (and there is no noise there).
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.