Alex S - 2007-09-28

Hi,

I'm trying to set up an online keyword spotting system using PocketSphinx 0.4.1 (recognize keywords but ignore all other spoken speech). I'd like to start with these two approaches:
1) Have a garbage model that is trained on all phones (the generic phone) and include a word that consists of this phone in the noise dictionary.
2) Include a set of one-phone words (one for each phone in the phoneset) in the noise dictionary

I will eventually do my own training of phone models and a garbage model, but for now I want to use existing acoustic models to try the two approaches above.

My Questions are:
1) Which of the two approaches above (or another method) do you think will work better?
2) Does PocketSphinx use the same model format as Sphinx2? Sphinx3? Is there a converter between model formats?
3) Is there an existing acoustic model I can use that has a trained +GARBAGE+ model? I recall seeing a noisedict file with +GARBAGE+ one of the models found here: http://www.speech.cs.cmu.edu/sphinx/models/ (can't remember which) but that model was in a format that PocketSphinx can't read.

Thanks for any help you can give on this

Alex Stupakov