I'm trying to set up an online keyword spotting system using PocketSphinx 0.4.1 (recognize keywords but ignore all other spoken speech). I'd like to start with these two approaches:
1) Have a garbage model that is trained on all phones (the generic phone) and include a word that consists of this phone in the noise dictionary.
2) Include a set of one-phone words (one for each phone in the phoneset) in the noise dictionary
I will eventually do my own training of phone models and a garbage model, but for now I want to use existing acoustic models to try the two approaches above.
My Questions are:
1) Which of the two approaches above (or another method) do you think will work better?
2) Does PocketSphinx use the same model format as Sphinx2? Sphinx3? Is there a converter between model formats?
3) Is there an existing acoustic model I can use that has a trained +GARBAGE+ model? I recall seeing a noisedict file with +GARBAGE+ one of the models found here: http://www.speech.cs.cmu.edu/sphinx/models/ (can't remember which) but that model was in a format that PocketSphinx can't read.
Thanks for any help you can give on this
Alex Stupakov
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm trying to set up an online keyword spotting system using PocketSphinx 0.4.1 (recognize keywords but ignore all other spoken speech). I'd like to start with these two approaches:
1) Have a garbage model that is trained on all phones (the generic phone) and include a word that consists of this phone in the noise dictionary.
2) Include a set of one-phone words (one for each phone in the phoneset) in the noise dictionary
I will eventually do my own training of phone models and a garbage model, but for now I want to use existing acoustic models to try the two approaches above.
My Questions are:
1) Which of the two approaches above (or another method) do you think will work better?
2) Does PocketSphinx use the same model format as Sphinx2? Sphinx3? Is there a converter between model formats?
3) Is there an existing acoustic model I can use that has a trained +GARBAGE+ model? I recall seeing a noisedict file with +GARBAGE+ one of the models found here: http://www.speech.cs.cmu.edu/sphinx/models/ (can't remember which) but that model was in a format that PocketSphinx can't read.
Thanks for any help you can give on this
Alex Stupakov