Menu

Stop detection with intermediary result when using grammar

Help
2020-02-04
2020-02-07
  • christian schuch

    I've built a recognotion system for german digits (short words) using pocketsphinx (because I need to run it on an embedded system). So far it recognizes the digits and rejects false positives with a certainty of about 80%. The setup is an asterisk 13 and a plugin to connect the speech recognition core with a client which in turn connects to a small server communicating with pocketsphinx and communicating with the plugin over an internal port. I can see the intermediary results in standard out as well as the final verdict with the recognition result. The model is setup using a jsfg grammar to ensure false positive recjection. When using a grammar instead of a .lm the probabillites are not readable, although there obviously is happening some sort of evaluation which finally reaches a certain threshold which in turn makes PS hand over a result. My problem is, that sometimes the recognition core needs more than one try of the utterance to deliver a result, although monitoring the internal results clearly shows, that the correct interpretation is already found. Is it possible to force PS to hand over results below that (maybe imaginary) threshold? How is the internal recognition set up when it is not using p(x)?

     
    • Nickolay V. Shmyrev

      I'm sorry, it is hard to understand the purpose of the system and give you the advise. You simply experience accuracy issues.

      Asterisk is bad choice here since it only limits to 8khz which is less accurate than 16khz.

      Another thing would be to have more accurate system based on neural networks. https://github.com/alphacep/vosk-api should work on RPi if that is your embedded system. German model is here: https://github.com/alphacep/kaldi-android-demo/releases/download/2020-01/alphacep-model-android-de-zamia-0.3.tar.gz

      You can explain your system in more details to get better advise.

       
      • christian schuch

        The purpose of the system is a IVR-menu.
        Unfortunately only pocketsphinx is possible because of system / contract reasons, and so far it works very good (so far I've got an accuracy of about 80% with only about one hour of recorded material. Let me phrase the setup a little bit more concise: On the asterisk side I use a plugin, witch works with the resspeech.... - core. This one takes the audio stream and hands it over via an internal port to a server, that maintains a connection to an open thread of pocketsphinx, and hands over the audio data to the PS-core for interpretation (both the plugin and the server are based on astsphinx). The PS-core than prints out several lines of intermediary results (see attached jpg). I just want to add a timeout, after that the core returns the intermediary result, or if that is not possible a fault (empty string). Is that possible?

         

        Last edit: christian schuch 2020-02-05
        • Nickolay V. Shmyrev

          Ok then. Timeout is perfectly possible, you just have to implement it yourself in your code. I do not see any problem here, just count the bytes processed and return if you got enough bytes.

           
  • christian schuch

    Or what else do you want to know about my setup?

     
  • christian schuch

    Ok, with what API- command or data-structure can I read out the intermidiary results?

     
  • christian schuch

    thx very much

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.