Menu

Should I train an acoustic model, or should I adapt the default acoustic model?

Help
2017-10-28
2017-10-28
  • Fredrik Persson

    Fredrik Persson - 2017-10-28

    Hi!

    I am all new to this program.

    I have almost an hour of recorded speech, english, spoken by a person with a very thick foreign accent. I did try to pick out about 10 phrases ("utterances"?), each 5-30 seconds long, and adapt the default english model using this tutorial: https://cmusphinx.github.io/wiki/tutorialadapt/ and I also looked at this youtube video: https://www.youtube.com/watch?v=IAHH6-t9jK0

    While performing a little bit better than the default model, my modified model was largely unuseable. I just tried converting some of the audio to text using the modified model.

    Therefore I seek advice; considering that I want to create a model for one single person and that person's accent in spoken english, should I add more utterances and try to improve my results, or should I go the other path and train an acoustic model as described here: https://cmusphinx.github.io/wiki/tutorialam/?

     
    • Nickolay V. Shmyrev

      To train a model you need 20+ hours of speech data. If you have less than 20 hours, you have to adapt the existing model.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.