Hello again. I used to have a very good acoustic model with 43 words and 124 wav files to train with just 12 % error rate. I decide to extend my model with 22 more words and 216 more wav files. So now I have 65 words with 340 wav files to train. but the problem is that now the error rate is very high. It is about 38%. Why does it happend?
This are the last lines:
MODULE: DECODE Decoding using models previously trained
Decoding 340 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 37.4% (127/340) WORD ERROR RATE: 22.1% (205/930)
Last edit: Mariano Baci 2020-01-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think this is little bit weird, because my model used to work very well with just 124 wav files so i though adding 216 more it will work at least little bit better but it doesnt, it has the same results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, without dataset of the size specified in the tutorial it will not work. You can probably record more data and use less parameters in the model.
There is a large research on low-resourced languages, I am just not sure you'll be able to apply it. For example, you can also add English data to training as a helper dataset and use a common phoneset.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello again. I used to have a very good acoustic model with 43 words and 124 wav files to train with just 12 % error rate. I decide to extend my model with 22 more words and 216 more wav files. So now I have 65 words with 340 wav files to train. but the problem is that now the error rate is very high. It is about 38%. Why does it happend?
This are the last lines:
MODULE: DECODE Decoding using models previously trained
Decoding 340 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 37.4% (127/340) WORD ERROR RATE: 22.1% (205/930)
Last edit: Mariano Baci 2020-01-20
You see the unstability of results due to the small amount of training data.
But is it not supposed to increase accuracy as quantity increases?
No, until you have at least 50 hours of data.
But I need just 65 word. Should I record these words for 50 hours?
Not necessary, you can take arbitrary annotated recordings.
They doesn't exist in my language
Care to say what is your super secret language?
Albanian
I think this is little bit weird, because my model used to work very well with just 124 wav files so i though adding 216 more it will work at least little bit better but it doesnt, it has the same results.
Is this normal?
Yes, without dataset of the size specified in the tutorial it will not work. You can probably record more data and use less parameters in the model.
There is a large research on low-resourced languages, I am just not sure you'll be able to apply it. For example, you can also add English data to training as a helper dataset and use a common phoneset.
Okay, thank you so much
Okay, thank you so much