Keyword Spotting is not working with Microphone

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Keyword Spotting is not working with Microphone

Forum: Help

Creator: Charishma

Created: 2017-03-02

Updated: 2017-03-02

Charishma - 2017-03-02

Hi,

I have implemented keyword spotting with a keyphrase "speech interface". It is working perfectly fine for synthesized audio file testSpeech.wav, pocketsphinx_continuous -infile testSpeech.wav -keyphrase "SPEECH INTERFACE" -kws_threshold 1e-30 -time yes -hmm ..\..\..\model\en-us\en-us ..\..\..\model\en-us\cmudict-en-us.dict -logfn test.log but it is giving an error like below if iam using a microphone. I am using same acoustic model n dictionary. I couldn't figure out where i am doing wrong.

ERROR: "dict.c", line 195: Line 3: Phone '(2)' is mising in the acoustic model; word 'INTERFACE' ignored

please help me..

I am providing a link in which there are my logcats, dictionary model and acoustic model. Kindly let me know if anything are required.

https://drive.google.com/open?id=0Bx7zZtnMgeIlaG1EU1VHUm1yMU0

Awaiting for your response..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-02
  
  You should drop space before (2) in your 5041.dic
  
  It should be:
  
  SPEECH S P IY CH INTERFACE IH N T ER F EY S INTERFACE(2) IH N ER F EY S
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charishma - 2017-03-03

I have changed it.. Thank you.. :)

Still it is not recognizing.. there is no error in logcat.. but it is not recognizing..Should I change the threshold value? what will be correct threshold value for this?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-03
  
  Reducing threshold leads to better spotting but more false alarms
  You should try values from 1e-1 to 1e-100
  
  It also depends on the microphone, can be affected by accent, noise, etc.
  
  If you are not english native speaker, you can also check if more alternative pronunciations should be added to dictionary
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charishma - 2017-03-03

Thank you for the fast reply. It means alot.. :)

I have checked by varying threshold values. It is somewhat better.. It is recognizing some voices only.

My locale was Indian English. How can I add more alternative pronounciations ?? That means you were saying me to extend the dictionary as in the tutorial ??

If I add the pronunciations, I need to train the acoustic model also right ?

Is there any possibility to add all kinds of English accents to Dictionary n acoustic model ??

Please guide me how to proceed further..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-03
  
  Indian English is heavily accented. You likely need a special acoustic model (trained on quite large data set of Indian English). I think such stuff is not freely available...
  
  But as you need just one phrase, somehow a hack that probably somehow might help is to try extending dictionary a little bit by adding pronunciations sounding closer to your accent. It is really a hack and probably will not work well.
  
  It is like in your example the word interface has 2 pronunciatioons (with T omited and present). You can add more pronunciations, but you can only use sounds from the acoustic model phone set
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charishma - 2017-03-03

Thank you so much for taking time to guide a noob like me :)

I'll try out and get back to you if any further help needed..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charishma - 2017-03-07

Hi,

Can you please explain the significance of Threshold Value? I read somewhere like it is used to reduce the false detections. Is that right ??

And previously you told me that, reducing threshold leads to better spotting but more false alarms. what exactly false alarms mean??

Coming to my project,
The word "speech" is spotting good at threshold 30 and "interface" at 50 for me. But I couldn't figure out threshold value for the phrase "speech interface". kindly, could you help me out?

Link for my Dictation and acoustic model,
https://drive.google.com/drive/u/0/folders/0Bx7zZtnMgeIlaG1EU1VHUm1yMU0

Thanks in Advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-07
  
  you usually need smaller thresholds for phrases.
  
  false alarms means if you say something weird like "blah" instead of "speech interface", the system will still say it detected "speech interface". This will happen if the threshold is too low
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.