I want to develop an acoustic model for my command and control application.
Pocketsphinx will be used in the project with a normal dictionary and trigram language model.
I have 190 unique words used in different commands.
I'm trying to collect some speech data for my acoustic model and I wonder should I put only those unique words in my training sentences and record those sentences or should I include other words in my language which has same phones with commands but not used in any command?
Also should command words be in the same order in training sentences as they are in commands or could I choose random words with a script from the unique word list and create training sentences that have 5 or 6 words in each?
Best regards,
Berker
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But only those unique words in my training sentences and record those sentences or should I include other words in my language which has same phones with commands but not used in any command?
If you are going to expand your system it's worth to include other words just to make a database more generic. And not only the words included in your current command set, just some common words in your language. The thing is that you might want to expand the command set in the future and you must be able to do that without recording additional data. Additional words included will not hurt.
For example you can record database of two parts - one of the commands you will use for sure, one for generic phrases.
Also should command words be in the same order in training sentences as they are in commands or could I choose random words with a script from the unique word list and create training sentences that have 5 or 6 words in each?
Command words should be in the same order, it's better to keep commands separately to make database match the audio which will be recognized.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What do you think about the language model?
Should I use JSGF grammar instead of trigram/bigram language model,
in my case there are nearly 150 commands in which words are differentiating between 1-5.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure what your commands will be so can't advise you anything on that. JSGF grammar imposes more restriction on word combinations, trigram lm is more flexible. It doesn't really matter what LM to use, most accuracy comes from the proper acoustic model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all,
I want to develop an acoustic model for my command and control application.
Pocketsphinx will be used in the project with a normal dictionary and trigram language model.
I have 190 unique words used in different commands.
I'm trying to collect some speech data for my acoustic model and I wonder should I put only those unique words in my training sentences and record those sentences or should I include other words in my language which has same phones with commands but not used in any command?
Also should command words be in the same order in training sentences as they are in commands or could I choose random words with a script from the unique word list and create training sentences that have 5 or 6 words in each?
Best regards,
Berker
If you are going to expand your system it's worth to include other words just to make a database more generic. And not only the words included in your current command set, just some common words in your language. The thing is that you might want to expand the command set in the future and you must be able to do that without recording additional data. Additional words included will not hurt.
For example you can record database of two parts - one of the commands you will use for sure, one for generic phrases.
Command words should be in the same order, it's better to keep commands separately to make database match the audio which will be recognized.
Hello Nickolay, thanks for your reply,
What do you think about the language model?
Should I use JSGF grammar instead of trigram/bigram language model,
in my case there are nearly 150 commands in which words are differentiating between 1-5.
I'm not sure what your commands will be so can't advise you anything on that. JSGF grammar imposes more restriction on word combinations, trigram lm is more flexible. It doesn't really matter what LM to use, most accuracy comes from the proper acoustic model.