I'm using Pocketsphinx for keyword spotting. I followed the procedure
mentioned " Adapting a default acoustic model" ( https://cmusphinx.github.io/wiki/tutorialadapt/) to improve the accuracy by
building a model with as many audio sample as possible (which includes both
male and female voices). Male voices are recognized properly but it does
not recognize any of the female voices. Do I build the model with more
female voice samples? Is there any other alternative to recognize female
voices?
Adaptation does not help with keyword spotting, you have to train model from scratch probably and better focus on some more advanced algorithm (depends on your goals).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to use Pocketsphinx only for detection for one keyword. Something like "hello champ". Detection with all genders. Do I still need to train model from scratch and have the following data requirements met (from documentation).
AND you have plenty of data to train on:
1 hour of recording for command and control for a single speaker
5 hours of recordings of 200 speakers for command and control for many speakers
10 hours of recordings for single speaker dictation
50 hours of recordings of 200 speakers for many speakers dictation
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using Pocketsphinx for keyword spotting. I followed the procedure
mentioned " Adapting a default acoustic model" (
https://cmusphinx.github.io/wiki/tutorialadapt/) to improve the accuracy by
building a model with as many audio sample as possible (which includes both
male and female voices). Male voices are recognized properly but it does
not recognize any of the female voices. Do I build the model with more
female voice samples? Is there any other alternative to recognize female
voices?
Thanks,
Madhukar
Adaptation does not help with keyword spotting, you have to train model from scratch probably and better focus on some more advanced algorithm (depends on your goals).
Thanks for the reply.
Will this URL mentioned below help me.
https://cmusphinx.github.io/wiki/tutorialam/
What is the minimum data (number of hours of recording) required to go with training the model.
I'm trying to use Pocketsphinx only for detection for one keyword. Something like "hello champ". Detection with all genders. Do I still need to train model from scratch and have the following data requirements met (from documentation).
AND you have plenty of data to train on:
1 hour of recording for command and control for a single speaker
5 hours of recordings of 200 speakers for command and control for many speakers
10 hours of recordings for single speaker dictation
50 hours of recordings of 200 speakers for many speakers dictation
Yes.
You can also try https://github.com/MycroftAI/mycroft-precise, but you still have to train it.
Thanks for the reply.
Currently I do not have enough data. To start with is it enough to have 1 hr of recording of two different speakers (Both command and dictation).
You can start of course. Practical implementation requires much more data.