Keyword spotting in noise areas

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Keyword spotting in noise areas

Forum: Help

Creator: ArcadeBit

Created: 2014-05-26

Updated: 2014-05-27

ArcadeBit - 2014-05-26

Hi there,

I am a real nooby in the case of ASR. To provide you as much data as possible, here the long story: I got a topic for my Bachelor Thesis. I have to provide a akustik method to detect "turmoil - like" situation. Part of the topic is to detect a few keywords like police, help or fire. The location of the keyword spotting is inside and near local public transportation (subway). Due to noisy enviroment, i am not sure if it is possible to detect keywords without clear gaps between the words. Even if I use a method to reduce the backround noise. And it has to run on a Rasbperry PI.

I played around a bit with pocketsphinx, i createt my Dictionary, a simple JSGF and used the acoustic model from voxforge. The WER was ok. My Biggest problem was: different words or noise were often detected as Fire, Police or Help, even with a filler dictionary.

With all the stuff i have read so far, i am not sure if this is possible for me. I mean: in an adequate amount of time or with a satisfactory result.

Now to my Questions:
How do I implement the noise reduction?
Would a specific acoustic model reduce the amount of "noise = fire/police/help"?
Would a specific acoustic model reduce my WER?
Does the keyword spotting work with noise gaps between words?
Is there a golden road to my goal?

kind regards

Last edit: ArcadeBit 2014-05-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-05-27

Due to noisy enviroment, i am not sure if it is possible to detect keywords without clear gaps between the words.

It is possible

Part of the topic is to detect a few keywords like police, help or fire.

For reliable detection keyword must have at least 3 syllables. "fire" is too short for keyword.

I played around a bit with pocketsphinx, i createt my Dictionary, a simple JSGF and used the acoustic model from voxforge.

For keyword spotting there is a specific keyword spotting search mode specified with "-kws" option. It also has option to tune the threshold (-kws_threshold) for detection/false alarm rate

Voxforge model is too inaccurate. Our most accurate model is en-us generic acoustic model.

How do I implement the noise reduction?

Noise reduction is already implemented in development version in subversion trunk

Does the keyword spotting work with noise gaps between words?

No, it should work in continuous stream too.

Is there a golden road to my goal?

Create a test set and evaluate it to get best performance point.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yuval Karon - 2017-02-21

Hello,

About three years after the original post...

In another post, about noise robustness, you recommend adding the following to sphinx_train.cfg:
~~~~~~~~~~~~
$CFG_WAVFILE_SRATE = 16000.0;
$CFG_NUM_FILT = 25; # For wideband speech it's 25, for telephone 8khz reasonable value is 15
$CFG_LO_FILT = 130; # For telephone 8kHz speech value is 200
$CFG_HI_FILT = 6800; # For telephone 8kHz speech value is 3500
$CFG_TRANSFORM = "dct"; # Previously legacy transform is used, but dct is more accurate
$CFG_LIFTER = "22"; # Cepstrum lifter is smoothing to improve recognition
$CFG_VECTOR_LENGTH = 13; # 13 is usually enough
~~~~~~~~~~~~~~

Is this a general recommendation for noisy speech?

The default feature set is 1s_c_d_dd. Would you recommend a different feature set
for noisy input? where can I read about the naming of feature sets?

Thanks, Yuval
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-03-16
  
  Is this a general recommendation for noisy speech?
  
  It is simply a default
  
  The default feature set is 1s_c_d_dd. Would you recommend a different feature set
  for noisy input?
  
  No.
  
  where can I read about the naming of feature sets?
  
  In the source code.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.