I am using the French acoustic model, a dictionary with 5 words, and either no language model or the French language model (250+MB). And the recognition is TERRIBLE. I have also tried with the english models provided by pocketsphinx, without any better results.
Is my board really not powerful enough? It is true that only one core is used by pocketsphinx, but when I inspect the output of top, it does look that overloaded.
I compared with the same setup, on my laptop (intel i7 & 4GB ram), and the results are as expected : very good.
Do you have any suggestions?
Regards,
qrthur
Last edit: qrthur sfz 2015-08-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please provide information about pocketsphinx version you are using.
Please provide exact command line and output.
French language model is too large probably, if you just want 5 words, you need to create domain specific grammar or langauge model as described in our tutorial
I tried the new acoustic model, and now, it works PERFECTLY on my ARM board.
I am stuck with that issue for a long week, I am very lucky to ask for help the day they release the new french models.
Thank you very much for the tip.
To make sure it is indeed the acoustic model that solved my issue, I followed you advices anyway, and I created a custom language model (1.2KB), I tried with and without it, as long as I am using the new acoustic model, i don't get any Input overrun. I also tried my new language model with the old acoustic model, and it fails as before.
A side question : performance wise, whats hurts most, the size of the language model or the size of the dictionary? or is it a stupid question and they are both related to each other? And for small dictionary, when I don't use any language model, it works very well too. Is this ok not to use a language model at all?
A side question : performance wise, whats hurts most, the size of the language model or the size of the dictionary?
Size of the dictionary does not matter because dictionary is just used to map words to phonetic pronunciation. The thing that matter is a vocabulary size of the language model since only the words from the language model will be used in search. The approach is described in detail in our tutorial
Beside that there are many things affecting decoding speed and language model size is just one of them, in order to debug performance issue you need to understand the search algorithm applied and the components of it like acoustic scoring, beam search and so on. There could be some slowdowns on ARM caused by architecture, you might want to compare ARM execution time with desktop execution time in order to get an idea what is going on. Often you just need to compile software with more strict optimization (-O3). While doing speed optimization it is also important to track decoding accuracy since it is all related.
-ds 2 -topn 2
Those options reduce result accuracy, it is better to avoid them. You can decode with defaults.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have performance issues when running pocketsphinx_continuous. A lot of warnings show :
Input overrun, read calls are too rare
I am using a Nvidia Jetson TK1 (specs here http://elinux.org/Jetson_TK1 basically arm quad core 2.3GHz & 2GB RAM), and I have followed most of the instructions in here http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds
I am using the French acoustic model, a dictionary with 5 words, and either no language model or the French language model (250+MB). And the recognition is TERRIBLE. I have also tried with the english models provided by pocketsphinx, without any better results.
Is my board really not powerful enough? It is true that only one core is used by pocketsphinx, but when I inspect the output of
top, it does look that overloaded.I compared with the same setup, on my laptop (intel i7 & 4GB ram), and the results are as expected : very good.
Do you have any suggestions?
Regards,
qrthur
Last edit: qrthur sfz 2015-08-10
Please provide information about pocketsphinx version you are using.
Please provide exact command line and output.
French language model is too large probably, if you just want 5 words, you need to create domain specific grammar or langauge model as described in our tutorial
http://cmusphinx.sourceforge.net/wiki/tutoriallm
Updated French acoustic model was released yesterday, please try it.
I tried the new acoustic model, and now, it works PERFECTLY on my ARM board.
I am stuck with that issue for a long week, I am very lucky to ask for help the day they release the new french models.
Thank you very much for the tip.
To make sure it is indeed the acoustic model that solved my issue, I followed you advices anyway, and I created a custom language model (1.2KB), I tried with and without it, as long as I am using the new acoustic model, i don't get any
Input overrun. I also tried my new language model with the old acoustic model, and it fails as before.A side question : performance wise, whats hurts most, the size of the language model or the size of the dictionary? or is it a stupid question and they are both related to each other? And for small dictionary, when I don't use any language model, it works very well too. Is this ok not to use a language model at all?
For the record:
I am using version 5prealpha.
Command line arguments =
-hmm
/path/to/fr-fr-5.2
-lm
/path/to/mylm [Size = 1.2KB]
-dict
/path/to/myDict [Size = 82B]
-samprate
16000
-adcdev
default
-ds
2
-topn
2
-inmic
yes
Last edit: qrthur sfz 2015-08-11
Size of the dictionary does not matter because dictionary is just used to map words to phonetic pronunciation. The thing that matter is a vocabulary size of the language model since only the words from the language model will be used in search. The approach is described in detail in our tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialconcepts
Beside that there are many things affecting decoding speed and language model size is just one of them, in order to debug performance issue you need to understand the search algorithm applied and the components of it like acoustic scoring, beam search and so on. There could be some slowdowns on ARM caused by architecture, you might want to compare ARM execution time with desktop execution time in order to get an idea what is going on. Often you just need to compile software with more strict optimization (-O3). While doing speed optimization it is also important to track decoding accuracy since it is all related.
Those options reduce result accuracy, it is better to avoid them. You can decode with defaults.
Thank for you insights. I will see if I can tune performance a bit, but this is not a priority anymore as it is working wonders :-)