CMU Sphinx / Forums / Help: Keyword listening accuracy degrades overtime?

Speech Recognition Toolkit

Keyword listening accuracy degrades overtime?

Forum: Help

Creator: Selwyn

Created: 2017-04-23

Updated: 2017-04-23

Selwyn - 2017-04-23

Hello guys, a while ago I released this app in the Google Playstore:

Playstore: https://play.google.com/store/apps/details?id=nl.selwyn420.vast&hl=en

User instruction video: http://www.youtube.com/watch?v=ADUs0mGagU4&t=1m18s

it is an enhanced stopwatch so to say, which is can be controlled using voicecommands.

It uses two search modes, keyword spotting mode to wake up, and once woken up grammar search for the actual command.

Overall it works pretty good but there is only one issue I couldnt seem to resolve and that is that after a while of usage the wakeup keyword doesnt seem to get easily detected. But the strange thing is, once it has been detected then I could easily retry 10 times with a 100% succes rate untill a new long silence occurs. There is also a difference between complete silence and noise filled silence (mainly keyboard and mouse clicks) After a few weeks of debugging my feeling says it has something to with noise filled silences being detected as speech input.

This might sound a little confusing so let me sketch the current scenario's:

Comming from complete silence -> wakeupword -> near 100% detection

Comming from noise filled silence (keyboard and mouseclicks) -> wakeupword -> first few times no detection

Comming from a noise filled silence (after getting a detection) -> wakeupword -> near 100% detection again

Does this even make any sense? I tried playing arround with the setkeyword threshold parameter and -dither parameter without succes. Unfortunately I dont really know much about the Sphinx library I only did the Java implementation so I dont really know where to start looking.

Can anybody give me any pointers on what could be the cause?

Thanks in advance!

Selwyn de Jonckheere
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-04-24
  
  Nice application.
  
  It counts clicks as speech and shifts CMN level. You can increase VAD threshold in order to avoid processing small noise as speech. Something like -vad_threshold 3.0.
  
  You can also implement more intelligent VAD.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Keyword listening accuracy degrades overtime?

Speech Recognition Toolkit

Forums

Help

Keyword listening accuracy degrades overtime? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Keyword listening accuracy degrades overtime?