when i built a speaker dependant(female) model of 700 audios in training for a sylabble based model i obtained an error rate of 7.8%
when i built a speaker independant sylabble based model (2 female voices) which is 1400 transcripts (700 transcripts repeated twice with two different female voices to increase the data) the error rate increased to 8.6%.
why is that when data is more the error rate increases instead of decreasing?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The GMM acoustic model is not very good for 2-class classification if you do not add enough mixtures. You might increase the mixtures in your model and then the accuracy will go down. You need either many speakers or a single speaker, not two speakers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
when i built a speaker dependant(female) model of 700 audios in training for a sylabble based model i obtained an error rate of 7.8%
when i built a speaker independant sylabble based model (2 female voices) which is 1400 transcripts (700 transcripts repeated twice with two different female voices to increase the data) the error rate increased to 8.6%.
why is that when data is more the error rate increases instead of decreasing?
The GMM acoustic model is not very good for 2-class classification if you do not add enough mixtures. You might increase the mixtures in your model and then the accuracy will go down. You need either many speakers or a single speaker, not two speakers.
When you say many speakers what should be the minimuum number of speakers required for speaker independant?
This question is answered in acoustic model training tutorial.