I've gotten far enough that I can run the sphinx2-simple script and it will run me through the turtle dictionary. This works fine. At the same time, sphinx2-demo does not show more than [silence] and [audio], which may be related to my problems.
I have had no luck generating my own dictionary. Using the tool located at http://www.speech.cs.cmu.edu/tools/lmtool.html and feeding it a file (this file has 933 lines and 7897 words, many of which are repeated frequently) I get back a tarball with the .dic, .lm, and other files. None of the files generated in this way seem to work, however, when I run a modified script based on sphinx2-simple that substitutes my directory for the TASK and DICT variables (and changing any reference to 'turtle'), one of two things happens:
Either the code will crash (seg fault) and dump core, or at best, it will
listen but never return anything, as:
In the first cases where the code would crash, I had left some punctuation in the sentence file, and I ended up with a significant .dic file. After I cleaned the punctuation out of the sentence file and tried again, the resulting .dic file is empty. In this case, sphinx does not crash, but it doesn't seem to recognize anything, either.
And of course, mixing and matching dictionary files with .lm files crashes things badly.
So is it normal to have an empty .dic file? There were no other errors on the lmtool process.
Further, any guesses as to why one turtle dictionary demo works right out of the box, while the other one fails? This is basically the same problem I am having with my own dictionary files. The audio settings are all the same, so it may be something about the .dic and .lm files, but I am quickly running out of ideas. The turtle demo has a large .dic file, and many of the words there are pretty common, so I am puzzled.
Incidentally, the host system is RedHat 6.0 running kernel 2.2.X, Athlon 600, 384MB RAM (sphinx shows up at or under 5% of both CPU and RAM). The sound card is a SB Live, and I grabbed the
source right out of the CVS tree (as the tarball seems to give SB Live users problems).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've gotten far enough that I can run the sphinx2-simple script and it will run me through the turtle dictionary. This works fine. At the same time, sphinx2-demo does not show more than [silence] and [audio], which may be related to my problems.
I have had no luck generating my own dictionary. Using the tool located at http://www.speech.cs.cmu.edu/tools/lmtool.html and feeding it a file (this file has 933 lines and 7897 words, many of which are repeated frequently) I get back a tarball with the .dic, .lm, and other files. None of the files generated in this way seem to work, however, when I run a modified script based on sphinx2-simple that substitutes my directory for the TASK and DICT variables (and changing any reference to 'turtle'), one of two things happens:
Either the code will crash (seg fault) and dump core, or at best, it will
listen but never return anything, as:
READY....
Listening...
Stopped listening, please wait...
323:
READY....
Listening...
Stopped listening, please wait...
316:
READY....
In the first cases where the code would crash, I had left some punctuation in the sentence file, and I ended up with a significant .dic file. After I cleaned the punctuation out of the sentence file and tried again, the resulting .dic file is empty. In this case, sphinx does not crash, but it doesn't seem to recognize anything, either.
And of course, mixing and matching dictionary files with .lm files crashes things badly.
So is it normal to have an empty .dic file? There were no other errors on the lmtool process.
Further, any guesses as to why one turtle dictionary demo works right out of the box, while the other one fails? This is basically the same problem I am having with my own dictionary files. The audio settings are all the same, so it may be something about the .dic and .lm files, but I am quickly running out of ideas. The turtle demo has a large .dic file, and many of the words there are pretty common, so I am puzzled.
Incidentally, the host system is RedHat 6.0 running kernel 2.2.X, Athlon 600, 384MB RAM (sphinx shows up at or under 5% of both CPU and RAM). The sound card is a SB Live, and I grabbed the
source right out of the CVS tree (as the tarball seems to give SB Live users problems).
Can you mail me the source file you used for the LM generation? lenzo@cs.cmu.edu ... I'll try to diagnose it. -- kevin