I used some of the training data as test data but the value of the WER is 93%, What could be the problem?
I attached here my whole training folder and I have the following questions:
Why can't I achieve 0% WER for a testing data that came from the training audios itself? Is it possible to achieve 10% WER with 5 hours of data? or is this not enough? What are other ways to improve WER of my system? I have 4000 audio utterances and sometimes I get 3000 errors in one baum welch iteration. What can I do to fix this because I have read this is a serious issue.
*Do i need to use force aligned? if yes then where can I download sphinx3?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why can't I achieve 0% WER for a testing data that came from the training audios itself?
In your training folder features are not extracted properly. Some features are shorter than it shoudl be. Total duration is reported to be 2.1 hours while it is actually 4.8 hours. It seems features are still extracted from 44khz files, you need to run training from start reextracting features.
For example, file size of BT_204.mfc must be 31kB, in your archive it is 8kB.
Once you reextract features, you will get WER 5%, I have this WER when I train with your folder without any modifications.
I have 4000 audio utterances and sometimes I get 3000 errors in one baum welch iteration. What can I do to fix this because I have read this is a serious issue.
Once you fix feature extraction errors will go away.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the fast reply Sir and it was indeed true that Word Error Rate had dropped to 6%.
But we still have a problem
When I configured my acoustic model to my program, it does not transcribe the data files accurately while the an4.align shows really good result.
For example, I used A_1 audio file as input and the transcribed text from my program is "abaja abajo abajada abajada abajada" but in an4.align it showed "aba abaaba abaja abajo abajada"
What could be my problem this time? I have already checked everything. Why can't the configured model transcribe my trained audio data accurately?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sphinx4 is not going to accurately reproduce results of training. You need accurate model first of all, for example your LM does not seem correct, you'd better train it properly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If Sphinx4 cannot accurately reproduce results of training then is there any study about this? Like a research or article so I can cite it on my paper. I can't find any.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then whats in the training data that does not let the sphinx4 decoder transcribe it accurately? Or what does the sphinx4 decoder lack that cannot transcribe the training data accurately? Please do reply. I REALLY NEED THIS.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to provide all the information to describe your problem and reproduce your troubles. The faster you provide the information the faster you get an advice.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then whats in the training data that does not let the sphinx4 decoder transcribe it accurately? Or what does the sphinx4 decoder lack that cannot transcribe the training data accurately? Please do reply. I REALLY NEED THIS.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I configured my acoustic model to my program, it does not transcribe the data files accurately while the an4.align shows really good result.
For example, I used A_1 audio file as input and the transcribed text from my program is "abaja abajo abajada abajada abajada" but in an4.align it showed "aba abaaba abaja abajo abajada"
There must be a reason why sphinx4 decoder cannot accurately reproduced trained results. Do you by any chance know the reason sir?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It seems to be a bug in sphinx4 related to .lm.bin language model. If you use arpa format language model, result will be accurate. I need to investigate why lmbin does not work in s4, that will take some time. You can report a bug about it in our issue tracker.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have 5 hours of training audio
I used some of the training data as test data but the value of the WER is 93%, What could be the problem?
I attached here my whole training folder and I have the following questions:
Why can't I achieve 0% WER for a testing data that came from the training audios itself?
Is it possible to achieve 10% WER with 5 hours of data? or is this not enough?
What are other ways to improve WER of my system?
I have 4000 audio utterances and sometimes I get 3000 errors in one baum welch iteration. What can I do to fix this because I have read this is a serious issue.
*Do i need to use force aligned? if yes then where can I download sphinx3?
In your training folder features are not extracted properly. Some features are shorter than it shoudl be. Total duration is reported to be 2.1 hours while it is actually 4.8 hours. It seems features are still extracted from 44khz files, you need to run training from start reextracting features.
For example, file size of BT_204.mfc must be 31kB, in your archive it is 8kB.
Once you reextract features, you will get WER 5%, I have this WER when I train with your folder without any modifications.
Once you fix feature extraction errors will go away.
Some of your files are still 44khz:
To batch resample files, use sox.
Thank you very much for patiently answering all my questions Sir
Thank you for the fast reply Sir and it was indeed true that Word Error Rate had dropped to 6%.
But we still have a problem
When I configured my acoustic model to my program, it does not transcribe the data files accurately while the an4.align shows really good result.
For example, I used A_1 audio file as input and the transcribed text from my program is "abaja abajo abajada abajada abajada" but in an4.align it showed "aba abaaba abaja abajo abajada"
What could be my problem this time? I have already checked everything. Why can't the configured model transcribe my trained audio data accurately?
Last edit: Leimiaoren 2016-02-03
Sphinx4 is not going to accurately reproduce results of training. You need accurate model first of all, for example your LM does not seem correct, you'd better train it properly.
If Sphinx4 cannot accurately reproduce results of training then is there any study about this? Like a research or article so I can cite it on my paper. I can't find any.
There are no studies, it is a common sense.
Then whats in the training data that does not let the sphinx4 decoder transcribe it accurately? Or what does the sphinx4 decoder lack that cannot transcribe the training data accurately? Please do reply. I REALLY NEED THIS.
Please. help.
You need to provide all the information to describe your problem and reproduce your troubles. The faster you provide the information the faster you get an advice.
Then whats in the training data that does not let the sphinx4 decoder transcribe it accurately? Or what does the sphinx4 decoder lack that cannot transcribe the training data accurately? Please do reply. I REALLY NEED THIS.
No idea, I haven't seen your training data and also I haven't seen why sphinx4 cannot transribe accurately.
When I configured my acoustic model to my program, it does not transcribe the data files accurately while the an4.align shows really good result.
For example, I used A_1 audio file as input and the transcribed text from my program is "abaja abajo abajada abajada abajada" but in an4.align it showed "aba abaaba abaja abajo abajada"
There must be a reason why sphinx4 decoder cannot accurately reproduced trained results. Do you by any chance know the reason sir?
It seems to be a bug in sphinx4 related to .lm.bin language model. If you use arpa format language model, result will be accurate. I need to investigate why lmbin does not work in s4, that will take some time. You can report a bug about it in our issue tracker.
thank you very much for answering. I can use this as reference.