I have this little dictionary which contains the letters of the alphabet and extra commands (for example escape and backspace) so I can type using my voice.
I notice however that the more commands I add, the worse recognition becomes. For instance 9 times out of 10 the word "escape" gets recognized as the letter "k"
I've created a sentences files which contains stuff like
<s> a >/s>
<s> escape </s>
<s> k </s>
I've created the dictionary files with entries from cmu_sphinx.dict
and I've run quicklm.pl on these.
I'm using the parameters of sphinx2-demo (tty-continous).
How can I increase recognition for this particular use?
Or is sphinx4 better suited for this kind of isolated word recognition?
Please advice
Regards
W.P. van Paassen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I would say it is nearly impossible to recognize the single letters of the alphabet with a good error rate, because a lot of the letters sound almost same.
Training the parameters for the decoder makes a big recognition difference for me (using sphinx3), but i dont think that helps a lot in your case.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I agree with what Shiosai said and I am glad that he got good results using SphinxTrain.
In your question though, this might be something you could do without touching training.
For example, have you try to tune the beam sizes of the sphinx2 decoder? There are several parameters you could use to make the recognizer better.
In your particular problem, I would also advise to listen to the particular waveform and see whether the waveform can be recognized by your ear. If yes, there could possible mean, the phone insertion penalty could be too high. That is why, words with more phones are less easy to be recognized.
At the end, fixing the above search errors may not help you that much. So, at the very least, you could train another model.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, this was a misunderstanding because i didnt use SphinxTrain.
It was rather difficult for me to find good parameters (like -beam -vqeval and so on) for the livedecode engine, because there are too much for me to try out (i think around 35 usable).
So i tried to find the ranges of the parameters with try and error, recorded some training data adapted for my sentence file and trained the parameters with a genetic algorithm.
These paramters are very specialized now, but working much better than the default ones flying around.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Shiosai,
I see.
This is partially my fault, one thing that Hieroglyphs (a documentation for Sphinx) has recorded is a recommeded sequence of parameter tuning. It was unfortunately that I haven't spreaded the document widely last year. Mainly because its first draft was just recently finished.
Please keep an eye on this forum because likely by the end of this year, Hieroglyph will be officialy published. I hope that that could give you a better time next time. :-)
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the advice.
I'll continue to experiment with new parameters and try the prerecorded waveform scheme you used to improve recognition. I'll post my results in a while
Kind Regards,
Peter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
I have this little dictionary which contains the letters of the alphabet and extra commands (for example escape and backspace) so I can type using my voice.
I notice however that the more commands I add, the worse recognition becomes. For instance 9 times out of 10 the word "escape" gets recognized as the letter "k"
I've created a sentences files which contains stuff like
<s> a >/s>
<s> escape </s>
<s> k </s>
I've created the dictionary files with entries from cmu_sphinx.dict
and I've run quicklm.pl on these.
I'm using the parameters of sphinx2-demo (tty-continous).
How can I increase recognition for this particular use?
Or is sphinx4 better suited for this kind of isolated word recognition?
Please advice
Regards
W.P. van Paassen
I would say it is nearly impossible to recognize the single letters of the alphabet with a good error rate, because a lot of the letters sound almost same.
Training the parameters for the decoder makes a big recognition difference for me (using sphinx3), but i dont think that helps a lot in your case.
I agree with what Shiosai said and I am glad that he got good results using SphinxTrain.
In your question though, this might be something you could do without touching training.
For example, have you try to tune the beam sizes of the sphinx2 decoder? There are several parameters you could use to make the recognizer better.
In your particular problem, I would also advise to listen to the particular waveform and see whether the waveform can be recognized by your ear. If yes, there could possible mean, the phone insertion penalty could be too high. That is why, words with more phones are less easy to be recognized.
At the end, fixing the above search errors may not help you that much. So, at the very least, you could train another model.
Arthur
Sorry, this was a misunderstanding because i didnt use SphinxTrain.
It was rather difficult for me to find good parameters (like -beam -vqeval and so on) for the livedecode engine, because there are too much for me to try out (i think around 35 usable).
So i tried to find the ranges of the parameters with try and error, recorded some training data adapted for my sentence file and trained the parameters with a genetic algorithm.
These paramters are very specialized now, but working much better than the default ones flying around.
Hi Shiosai,
I see.
This is partially my fault, one thing that Hieroglyphs (a documentation for Sphinx) has recorded is a recommeded sequence of parameter tuning. It was unfortunately that I haven't spreaded the document widely last year. Mainly because its first draft was just recently finished.
Please keep an eye on this forum because likely by the end of this year, Hieroglyph will be officialy published. I hope that that could give you a better time next time. :-)
Arthur
Hi Shiosai, Arthur,
Thanks for the advice.
I'll continue to experiment with new parameters and try the prerecorded waveform scheme you used to improve recognition. I'll post my results in a while
Kind Regards,
Peter
You are always welcomed and don't hestitate to ask any question in this forum. -Arthur