Menu

Keyword listening accuracy degrades overtime?

Help
Selwyn
2017-04-23
2017-04-23
  • Selwyn

    Selwyn - 2017-04-23

    Hello guys, a while ago I released this app in the Google Playstore:

    Playstore: https://play.google.com/store/apps/details?id=nl.selwyn420.vast&hl=en

    User instruction video: http://www.youtube.com/watch?v=ADUs0mGagU4&t=1m18s

    it is an enhanced stopwatch so to say, which is can be controlled using voicecommands.

    It uses two search modes, keyword spotting mode to wake up, and once woken up grammar search for the actual command.

    Overall it works pretty good but there is only one issue I couldnt seem to resolve and that is that after a while of usage the wakeup keyword doesnt seem to get easily detected. But the strange thing is, once it has been detected then I could easily retry 10 times with a 100% succes rate untill a new long silence occurs. There is also a difference between complete silence and noise filled silence (mainly keyboard and mouse clicks) After a few weeks of debugging my feeling says it has something to with noise filled silences being detected as speech input.

    This might sound a little confusing so let me sketch the current scenario's:

    1. Comming from complete silence -> wakeupword -> near 100% detection
    2. Comming from noise filled silence (keyboard and mouseclicks) -> wakeupword -> first few times no detection
    3. Comming from a noise filled silence (after getting a detection) -> wakeupword -> near 100% detection again

    Does this even make any sense? I tried playing arround with the setkeyword threshold parameter and -dither parameter without succes. Unfortunately I dont really know much about the Sphinx library I only did the Java implementation so I dont really know where to start looking.

    Can anybody give me any pointers on what could be the cause?

    Thanks in advance!

    Selwyn de Jonckheere

     
    • Nickolay V. Shmyrev

      Nice application.

      It counts clicks as speech and shifts CMN level. You can increase VAD threshold in order to avoid processing small noise as speech. Something like -vad_threshold 3.0.

      You can also implement more intelligent VAD.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.