I noticed that Sphinx2 only has the "turtle" Language model and Sphinx3 has "an4". An4 seems to have an entire dictionary of words where as turtle just has a few basic commands and numbers.
Is there a complete dictionary for sphinx2? Or can I copy it from sphinx3?
From what I understand, if the words are not in the dictionary (in binary format of course), sphinx2 will not recognize it. If this is correct, why isn't there a "webster" dictionary of words compiled for sphinx2?
Any effort of help is much appreciated. TIA.
-Trode
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Follow the link to "download the CMU Dictionary", but choose cmudict.0.6d rather than cmudict.0.6 (which appears to be older). Note the adminition that this file is not quite in the form needed for Sphinx2; you'll need to use the scripts/stress2sphinx perl script in the sphinx2-0.4 distribution to convert it.
It is true that if Sphinx2 loads a language model in which words are missing from the dictionary already loaded, those words will simply be ignored and unrecognizable. You probably don't want to load the entire 129K-word cmudict.0.6d dictionary when you run Sphinx; rather use a subset of just the words needed for your language model -- the LM-tool does that for you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-06-04
I understand it better now. Thank you.
Just out of curiosity. Can the entire dictionary be used in the LM? How much memory is it known to need for that type of load?
Thanks for the quick response. It is much appreciated.
-Trode
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I noticed that Sphinx2 only has the "turtle" Language model and Sphinx3 has "an4". An4 seems to have an entire dictionary of words where as turtle just has a few basic commands and numbers.
Is there a complete dictionary for sphinx2? Or can I copy it from sphinx3?
From what I understand, if the words are not in the dictionary (in binary format of course), sphinx2 will not recognize it. If this is correct, why isn't there a "webster" dictionary of words compiled for sphinx2?
Any effort of help is much appreciated. TIA.
-Trode
The CMU Sphinx website has (almost) what you seek. See http://www.speech.cs.cmu.edu/cgi-bin/cmudict .
Follow the link to "download the CMU Dictionary", but choose cmudict.0.6d rather than cmudict.0.6 (which appears to be older). Note the adminition that this file is not quite in the form needed for Sphinx2; you'll need to use the scripts/stress2sphinx perl script in the sphinx2-0.4 distribution to convert it.
It is true that if Sphinx2 loads a language model in which words are missing from the dictionary already loaded, those words will simply be ignored and unrecognizable. You probably don't want to load the entire 129K-word cmudict.0.6d dictionary when you run Sphinx; rather use a subset of just the words needed for your language model -- the LM-tool does that for you.
I understand it better now. Thank you.
Just out of curiosity. Can the entire dictionary be used in the LM? How much memory is it known to need for that type of load?
Thanks for the quick response. It is much appreciated.
-Trode