I have almost an hour of recorded speech, english, spoken by a person with a very thick foreign accent. I did try to pick out about 10 phrases ("utterances"?), each 5-30 seconds long, and adapt the default english model using this tutorial: https://cmusphinx.github.io/wiki/tutorialadapt/ and I also looked at this youtube video: https://www.youtube.com/watch?v=IAHH6-t9jK0
While performing a little bit better than the default model, my modified model was largely unuseable. I just tried converting some of the audio to text using the modified model.
Therefore I seek advice; considering that I want to create a model for one single person and that person's accent in spoken english, should I add more utterances and try to improve my results, or should I go the other path and train an acoustic model as described here: https://cmusphinx.github.io/wiki/tutorialam/?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I am all new to this program.
I have almost an hour of recorded speech, english, spoken by a person with a very thick foreign accent. I did try to pick out about 10 phrases ("utterances"?), each 5-30 seconds long, and adapt the default english model using this tutorial: https://cmusphinx.github.io/wiki/tutorialadapt/ and I also looked at this youtube video: https://www.youtube.com/watch?v=IAHH6-t9jK0
While performing a little bit better than the default model, my modified model was largely unuseable. I just tried converting some of the audio to text using the modified model.
Therefore I seek advice; considering that I want to create a model for one single person and that person's accent in spoken english, should I add more utterances and try to improve my results, or should I go the other path and train an acoustic model as described here: https://cmusphinx.github.io/wiki/tutorialam/?
To train a model you need 20+ hours of speech data. If you have less than 20 hours, you have to adapt the existing model.