Should I train an acoustic model, or should I adapt the default acoustic model?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Should I train an acoustic model, or should I adapt the default acoustic model?

Forum: Help

Creator: Fredrik Persson

Created: 2017-10-28

Updated: 2017-10-28

Fredrik Persson - 2017-10-28

Hi!

I am all new to this program.

I have almost an hour of recorded speech, english, spoken by a person with a very thick foreign accent. I did try to pick out about 10 phrases ("utterances"?), each 5-30 seconds long, and adapt the default english model using this tutorial: https://cmusphinx.github.io/wiki/tutorialadapt/ and I also looked at this youtube video: https://www.youtube.com/watch?v=IAHH6-t9jK0

While performing a little bit better than the default model, my modified model was largely unuseable. I just tried converting some of the audio to text using the modified model.

Therefore I seek advice; considering that I want to create a model for one single person and that person's accent in spoken english, should I add more utterances and try to improve my results, or should I go the other path and train an acoustic model as described here: https://cmusphinx.github.io/wiki/tutorialam/?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-10-28
  
  To train a model you need 20+ hours of speech data. If you have less than 20 hours, you have to adapt the existing model.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.