Menu

Sphinx4 performance and accuracy

rupy
2014-05-16
2014-05-21
  • rupy

    rupy - 2014-05-16

    There are probably a million posts about this already, but why is there not a JNI port so we can improve speed and performance for Java integrations, right now all 4 cores are spinning up to 50% to figure out I said "hi" and it takes 10 seconds and only works like 1 time out of 4?! Completely unusable... or am I doing something wrong?

    What does Google and Apple use for voice recognition?

    To me it seems grammar is completely useless, the whole point is to be able to figure out all words, so why is Sphinx so complicated to setup and configure for just that: detect all words!

    Also is there a way to specify to run Sphinx in fewer threads so it doesent stall my OpenGL rendering?

     

    Last edit: rupy 2014-05-16
  • Nickolay V. Shmyrev

    so why is Sphinx so complicated to setup and configure for just that: detect all words!

    It is not that complicated as you might think. If you have issues with accuracy with some file, share it to get an advice.

     
  • rupy

    rupy - 2014-05-16

    I got it working with the default stuff, then I replaced that with woxforge, the accuracy is ok for very short single words like "one" or "okay" but totally unusable for sentences.

    That combined with the performance issues makes the library completely unusable, so the only recourse I have is to mess with googles speech API which does not tempt me at all for a million reasons.

    But there obviously are speech recognition that works in existence (just try googles, it's amazing!!!) such a shame it's all military grade license closed source! :(

     

    Last edit: rupy 2014-05-16
    • Nickolay V. Shmyrev

      then I replaced that with woxforge, the accuracy is ok for very short single words like "one" or "okay" but totally unusable for sentences.

      The current most accurate model is en-us generic model available in downloads. Voxforge model is pretty inaccurate and might have other issues. Again, you can share the sample to get help on accuracy.

      I have is to mess with googles speech API which does not tempt me at all for a million reasons.

      Well, you are welcome to switch to CMUSphinx

       
      • rupy

        rupy - 2014-05-17

        There are 3 settings:

        I use LanguageModel en-us.lm.dmp, that I had to compile.

        But for the AcousticModel i use voxforge.

        Dictionary I currently use cmudict.0.6d but I also have cmudict.0.7a from woxforge.

        All these files should be included in the download and the API should not be a simplification that doesn't even use the real classes.

        Good you got rid of XML config (on the surface) but still it's an over engineered mess. I can only imagine how the voice recognition code looks like...

         
  • rupy

    rupy - 2014-05-17

    Well I don't have a sample, because I talk into the microphone. But it doesn't matter how long I speak, Sphinx4 always recognizes the sentence as one, max two, words and most of the time they are wrong.

    So is there a way to make Sphinx perform like this: https://www.google.com/intl/en/chrome/demos/speech.html

     
  • Nickolay V. Shmyrev

    So is there a way to make Sphinx perform like this: https://www.google.com/intl/en/chrome/demos/speech.html

    Sure, please see the description and video here

    http://grasch.net/node/22

     
    • rupy

      rupy - 2014-05-17

      Is there an online demo as good as this based on Sphinx?

       

      Last edit: rupy 2014-05-17
      • Nickolay V. Shmyrev

        Is there an online demo as good as this based on Sphinx?

        Nobody ever worked on it

         
  • rupy

    rupy - 2014-05-20

    Can somebody throw anything together?

    I spent maybe half a day trying to get sphinx4 to work properly... it's too long. I need a working proof of concept if I'm going to waste more time on this.

    I need a zip or a site, that lets me test cutting edge sphinx with configuration done and all dependencies included, just unzip and doubleclick, preferrably with java but that is cherry on top... right now im convinced the code is not working at all because there are million ways to configure this...

     

    Last edit: rupy 2014-05-20
    • Horia Cucu

      Horia Cucu - 2014-05-20

      If you know Romanian, you can test this online demo
      http://speed.pub.ro/speech-to-text
      I wouldn't say it's cutting edge S4 configuration, but it works fine for
      news-like speech.
      Horia

      On 20 May 2014 20:52, rupy rupy@users.sf.net wrote:

      Can somebody throw anything together?

      I spent maybe half a day trying to get sphinx4 to work properly... it's
      too long. I need a proof of concept if I'm going to waste more time on this.

      I need a zip or a site, that lets me test cutting edge sphinx4
      configuration done!


      Sphinx4 performance and accuracyhttps://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/01efe533/?limit=25#7af7

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cmusphinx/discussion/sphinx4/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • rupy

        rupy - 2014-05-21

        Can you make a page with an english version, I would be very grateful.

        Also, is this open source?

         

Log in to post a comment.

MongoDB Logo MongoDB