Sir I have created a model for isolated words in punjabi language with a recording time of 1 hour. Since I am working on isolated words so my speech utterances are of very small size and a large number of files contributed to become 1 hour. After training when I checked the punjabi.align file then it hardly recognizes any word. Does the reason for such recognition rate is recording time only?
But as the tutorial says minimum time to be 1 hour, so it should recognize atleast some words. Or there is some other possible reason for this recognition rate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sir I have created a model for isolated words in punjabi language with a recording time of 1 hour. Since I am working on isolated words so my speech utterances are of very small size and a large number of files contributed to become 1 hour. After training when I checked the punjabi.align file then it hardly recognizes any word. Does the reason for such recognition rate is recording time only?
But as the tutorial says minimum time to be 1 hour, so it should recognize atleast some words. Or there is some other possible reason for this recognition rate.
you need to follow audio duration sugggestions
there could be numerous reasons. But you need to fix the most obvious first.
as far as I remember you have 0.6 hrs of speech