Problem recognizing female voice

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Problem recognizing female voice

Forum: Help

Creator: Madhukar

Created: 2019-04-10

Updated: 2019-04-14

Madhukar - 2019-04-10

Hi,

I'm using Pocketsphinx for keyword spotting. I followed the procedure
mentioned " Adapting a default acoustic model" (
https://cmusphinx.github.io/wiki/tutorialadapt/) to improve the accuracy by
building a model with as many audio sample as possible (which includes both
male and female voices). Male voices are recognized properly but it does
not recognize any of the female voices. Do I build the model with more
female voice samples? Is there any other alternative to recognize female
voices?

Thanks,
Madhukar

alternate

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-04-11
  
  Adaptation does not help with keyword spotting, you have to train model from scratch probably and better focus on some more advanced algorithm (depends on your goals).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhukar - 2019-04-12

Thanks for the reply.

Will this URL mentioned below help me.
https://cmusphinx.github.io/wiki/tutorialam/

What is the minimum data (number of hours of recording) required to go with training the model.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhukar - 2019-04-12

I'm trying to use Pocketsphinx only for detection for one keyword. Something like "hello champ". Detection with all genders. Do I still need to train model from scratch and have the following data requirements met (from documentation).

AND you have plenty of data to train on:
1 hour of recording for command and control for a single speaker
5 hours of recordings of 200 speakers for command and control for many speakers
10 hours of recordings for single speaker dictation
50 hours of recordings of 200 speakers for many speakers dictation

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-04-13
  
  Do I still need to train model from scratch and have the following data requirements met (from documentation).
  
  Yes.
  
  I'm trying to use Pocketsphinx only for detection for one keyword. Something like "hello champ".
  
  You can also try https://github.com/MycroftAI/mycroft-precise, but you still have to train it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Madhukar - 2019-04-14

Thanks for the reply.

Currently I do not have enough data. To start with is it enough to have 1 hr of recording of two different speakers (Both command and dictation).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-04-14
  
  You can start of course. Practical implementation requires much more data.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.