There are probably a million posts about this already, but why is there not a JNI port so we can improve speed and performance for Java integrations, right now all 4 cores are spinning up to 50% to figure out I said "hi" and it takes 10 seconds and only works like 1 time out of 4?! Completely unusable... or am I doing something wrong?
What does Google and Apple use for voice recognition?
To me it seems grammar is completely useless, the whole point is to be able to figure out all words, so why is Sphinx so complicated to setup and configure for just that: detect all words!
Also is there a way to specify to run Sphinx in fewer threads so it doesent stall my OpenGL rendering?
Last edit: rupy 2014-05-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I got it working with the default stuff, then I replaced that with woxforge, the accuracy is ok for very short single words like "one" or "okay" but totally unusable for sentences.
That combined with the performance issues makes the library completely unusable, so the only recourse I have is to mess with googles speech API which does not tempt me at all for a million reasons.
But there obviously are speech recognition that works in existence (just try googles, it's amazing!!!) such a shame it's all military grade license closed source! :(
Last edit: rupy 2014-05-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
then I replaced that with woxforge, the accuracy is ok for very short single words like "one" or "okay" but totally unusable for sentences.
The current most accurate model is en-us generic model available in downloads. Voxforge model is pretty inaccurate and might have other issues. Again, you can share the sample to get help on accuracy.
I have is to mess with googles speech API which does not tempt me at all for a million reasons.
Well, you are welcome to switch to CMUSphinx
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I use LanguageModel en-us.lm.dmp, that I had to compile.
But for the AcousticModel i use voxforge.
Dictionary I currently use cmudict.0.6d but I also have cmudict.0.7a from woxforge.
All these files should be included in the download and the API should not be a simplification that doesn't even use the real classes.
Good you got rid of XML config (on the surface) but still it's an over engineered mess. I can only imagine how the voice recognition code looks like...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well I don't have a sample, because I talk into the microphone. But it doesn't matter how long I speak, Sphinx4 always recognizes the sentence as one, max two, words and most of the time they are wrong.
I spent maybe half a day trying to get sphinx4 to work properly... it's too long. I need a working proof of concept if I'm going to waste more time on this.
I need a zip or a site, that lets me test cutting edge sphinx with configuration done and all dependencies included, just unzip and doubleclick, preferrably with java but that is cherry on top... right now im convinced the code is not working at all because there are million ways to configure this...
Last edit: rupy 2014-05-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you know Romanian, you can test this online demo http://speed.pub.ro/speech-to-text
I wouldn't say it's cutting edge S4 configuration, but it works fine for
news-like speech.
Horia
There are probably a million posts about this already, but why is there not a JNI port so we can improve speed and performance for Java integrations, right now all 4 cores are spinning up to 50% to figure out I said "hi" and it takes 10 seconds and only works like 1 time out of 4?! Completely unusable... or am I doing something wrong?
What does Google and Apple use for voice recognition?
To me it seems grammar is completely useless, the whole point is to be able to figure out all words, so why is Sphinx so complicated to setup and configure for just that: detect all words!
Also is there a way to specify to run Sphinx in fewer threads so it doesent stall my OpenGL rendering?
Last edit: rupy 2014-05-16
It is not that complicated as you might think. If you have issues with accuracy with some file, share it to get an advice.
I got it working with the default stuff, then I replaced that with woxforge, the accuracy is ok for very short single words like "one" or "okay" but totally unusable for sentences.
That combined with the performance issues makes the library completely unusable, so the only recourse I have is to mess with googles speech API which does not tempt me at all for a million reasons.
But there obviously are speech recognition that works in existence (just try googles, it's amazing!!!) such a shame it's all military grade license closed source! :(
Last edit: rupy 2014-05-16
The current most accurate model is en-us generic model available in downloads. Voxforge model is pretty inaccurate and might have other issues. Again, you can share the sample to get help on accuracy.
Well, you are welcome to switch to CMUSphinx
There are 3 settings:
I use LanguageModel en-us.lm.dmp, that I had to compile.
But for the AcousticModel i use voxforge.
Dictionary I currently use cmudict.0.6d but I also have cmudict.0.7a from woxforge.
All these files should be included in the download and the API should not be a simplification that doesn't even use the real classes.
Good you got rid of XML config (on the surface) but still it's an over engineered mess. I can only imagine how the voice recognition code looks like...
Well I don't have a sample, because I talk into the microphone. But it doesn't matter how long I speak, Sphinx4 always recognizes the sentence as one, max two, words and most of the time they are wrong.
So is there a way to make Sphinx perform like this: https://www.google.com/intl/en/chrome/demos/speech.html
Sure, please see the description and video here
http://grasch.net/node/22
Is there an online demo as good as this based on Sphinx?
Last edit: rupy 2014-05-17
Nobody ever worked on it
Can somebody throw anything together?
I spent maybe half a day trying to get sphinx4 to work properly... it's too long. I need a working proof of concept if I'm going to waste more time on this.
I need a zip or a site, that lets me test cutting edge sphinx with configuration done and all dependencies included, just unzip and doubleclick, preferrably with java but that is cherry on top... right now im convinced the code is not working at all because there are million ways to configure this...
Last edit: rupy 2014-05-20
If you know Romanian, you can test this online demo
http://speed.pub.ro/speech-to-text
I wouldn't say it's cutting edge S4 configuration, but it works fine for
news-like speech.
Horia
On 20 May 2014 20:52, rupy rupy@users.sf.net wrote:
Can you make a page with an english version, I would be very grateful.
Also, is this open source?