Hello. Is there an inseparable relation between Acoustic Model and Linguistic Model in SPhinx4? I mean could I use a large acoustic model as HUB4 but a very reduced Linguistic Model (10/20) words? This reduced Language model will be built using the CMU language toolkit.
How will be affected the Sphinx functionality, taking into account that we want to use Sphinx as a Word Spotting tool (that is, find important and relevant words inside large raw files).
Thanks in advanced
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The acoustic models and the language model are quite decoupled. You could indeed use a large acoustic model such as HUB4 with a small language model. The link between the two is the Dictionary. The dictionary is used to map words to acoustic units (typically, but not necessarily phonemes). The dictionary should (obviously) contain all of the words in your active vocabulary and the phones that the dictionary contains should match the phone set used by the acoustic model.
One of the difficulties that you may have using S4 in a word spotting application is that the decoder will try very hard to match the vocabulary specified by the language model even if the words spoken are not in the vocabulary at all. We are beginning to look at some of the techniques to address this type of problem. Stay tuned ...
Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello. Is there an inseparable relation between Acoustic Model and Linguistic Model in SPhinx4? I mean could I use a large acoustic model as HUB4 but a very reduced Linguistic Model (10/20) words? This reduced Language model will be built using the CMU language toolkit.
How will be affected the Sphinx functionality, taking into account that we want to use Sphinx as a Word Spotting tool (that is, find important and relevant words inside large raw files).
Thanks in advanced
Jos:
The acoustic models and the language model are quite decoupled. You could indeed use a large acoustic model such as HUB4 with a small language model. The link between the two is the Dictionary. The dictionary is used to map words to acoustic units (typically, but not necessarily phonemes). The dictionary should (obviously) contain all of the words in your active vocabulary and the phones that the dictionary contains should match the phone set used by the acoustic model.
One of the difficulties that you may have using S4 in a word spotting application is that the decoder will try very hard to match the vocabulary specified by the language model even if the words spoken are not in the vocabulary at all. We are beginning to look at some of the techniques to address this type of problem. Stay tuned ...
Paul