Hello, I'm using pocket sphinx on Android, so far, the english dictionary with the acustic model provided by default works like a charm, so I tried to add the voxforge German dictionary with the language model and the PTM acustic model, so far, the accuracy is poor, so I have multiple questions to point me out to the right direction.
I was told that the voxforge German model is trained at 16kHz, and that Android phone's microphone won't record at more than 8kHz, I decide to ignore that, and this German model works on my Nexus 5X, as I said, not the best accuracy but it works. However, I encorauged some friend to test it over their phones, and results were pretty bad. So the question is, is this statement true?, this German model is trained at 16kHz and most Android phone won't record at that?
I saw that pocketsphinx generate a bunch of .raw log files with are used to debug the results of a recording, I can see that my app generate this .raw files, but I don't really have a clue on how to open them, help there?
I have seen other speech recognition engines, but after much work I want to stick with pocketsphinx, is this a good choice?, even if I stated that english works pretty well, my goal is to have a good speech recognition of German over Android, so I don't want to lose more time if even the best result on pocketsphinx won't compare to the english model.
If the answer to the previus question is yes, so how much I would have to train an acoustic model at 8kHz to have at least 80% of accuracy?, talking about recognizing keywords
Any other hint outside of this question will be really appreciated!, I cannot share raw logs right now but I do have them from my Android application that uses pocketsphinx
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Short keywords in continuous speech are very hard to recognize. You need to build false alarm/detection rate curve and tune the threshold and even for a short word. Even with the best technology, for a short word the rate will be high. Thats why companies are using long activation phrases like "ok google", not simply "ok".
Voice control with short command does not make much sense overall, thats why no commercial company built it. You need to consider some other way to use voice in your app.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Let me ask you one more question, as you seen pretty active here, I have notice something, using pocketsphinx_continuous on Ubuntu, I set my german model for quick testing, now according to the pocketsphinx tutorial, the option -kws cannot be together with -jsgf and -lm, so, if I se the -kws option along with a keyword list and -kws_threshold I cannot get a single hit, not even one (I'm just trying with "ich brauche hilfe"), no matter what I try (theshold set at 1e-1, 1e-10, 1e-20, 1e-30, etc), but if I create a grammar file it gets recognize pretty easy, almost without fail, but grammar is not suitable for me as I cannot generate a grammar file from programming code, and if I use -lm with my language model, it does recognize the keyphrase, but with a 60% of accuracy more or less.
So any clue why -kws might now be working?, this is the command I'm using:
Hello, I'm using pocket sphinx on Android, so far, the english dictionary with the acustic model provided by default works like a charm, so I tried to add the voxforge German dictionary with the language model and the PTM acustic model, so far, the accuracy is poor, so I have multiple questions to point me out to the right direction.
Any other hint outside of this question will be really appreciated!, I cannot share raw logs right now but I do have them from my Android application that uses pocketsphinx
It depens a lot on application you want to build, how many keywords do you want to recognize?
The keywords are goign to be use as commands, I won't use more than 10, and most on my words will be very common, nothing complex in that side
Short keywords in continuous speech are very hard to recognize. You need to build false alarm/detection rate curve and tune the threshold and even for a short word. Even with the best technology, for a short word the rate will be high. Thats why companies are using long activation phrases like "ok google", not simply "ok".
Voice control with short command does not make much sense overall, thats why no commercial company built it. You need to consider some other way to use voice in your app.
That's an interesting point, so if my keywords were instead phrases of two or three words the recognition would be better?, I can work with that
Let me ask you one more question, as you seen pretty active here, I have notice something, using pocketsphinx_continuous on Ubuntu, I set my german model for quick testing, now according to the pocketsphinx tutorial, the option -kws cannot be together with -jsgf and -lm, so, if I se the -kws option along with a keyword list and -kws_threshold I cannot get a single hit, not even one (I'm just trying with "ich brauche hilfe"), no matter what I try (theshold set at 1e-1, 1e-10, 1e-20, 1e-30, etc), but if I create a grammar file it gets recognize pretty easy, almost without fail, but grammar is not suitable for me as I cannot generate a grammar file from programming code, and if I use -lm with my language model, it does recognize the keyphrase, but with a 60% of accuracy more or less.
So any clue why -kws might now be working?, this is the command I'm using:
pocketsphinx_continuous -inmic yes -hmm Desktop/GermanModel/cmusphinx-de-ptm-voxforge-5.2/cmusphinx-ptm-voxforge-de-r20171217/model_parameters/de-ger-ptm -dict Desktop/GermanModel/cmusphinx-voxforge-de.dict -keyphrase "ich brauche hilfe" -kws_threshold 1e-30
You need to provide an audio file in order to get help on it.