I've been adapting the HUB4 acoustic model using training data I have created. Using 10 audio files for decode, I used to have 50% then 60% accuracy using 1 - 3 hrs of training data. I increased the number of hours further using another set of audio files but when I reached an average accuracy of 71%, the accuracy does not improve that much (.5% each training of 1hr audio files). Some audio files increase and some decrease having an average accuracy almost the same (71%). Is there some kind of limit to the accuracy when adapting acoustic models? Or is there something wrong with my training data or adaptation process? I am using SphinxTrain for adaptation, bw, mllr solve, mllr transform, bw then map adapt as what I have read on the tutorials.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I've been adapting the HUB4 acoustic model using training data I have created. Using 10 audio files for decode, I used to have 50% then 60% accuracy using 1 - 3 hrs of training data. I increased the number of hours further using another set of audio files but when I reached an average accuracy of 71%, the accuracy does not improve that much (.5% each training of 1hr audio files). Some audio files increase and some decrease having an average accuracy almost the same (71%). Is there some kind of limit to the accuracy when adapting acoustic models? Or is there something wrong with my training data or adaptation process? I am using SphinxTrain for adaptation, bw, mllr solve, mllr transform, bw then map adapt as what I have read on the tutorials.
Probably language model is somewhat different. Try to test it on training data - the accuracy must be very high unless you made a mistake.