I am working on building an LVCSR System for Tamil ( Indian Language). I built
an acoustic model already with 8 hours of speech data. The training was
successful with only 41 out of 350 files ignored in the final iteration of
baum-welch. The model worked fine giving satisfactory results.
Recently I collected another 5 hours of data. When I added the new data to the
existing 8 hours and tried training the total 13 hours of data,
Sphinxtrain failed after iteration 5 of Phase 3 ignoring all the 627
files.
I have thoroughly checked all the audio files, the dictionary, transcription
and fileids again and again. There is no problem with those aspects to my
knowledge. The wav files are also in the correct format.
I have uploaded the log files of both the models for your reference.
I can even upload samples of my data if you need them. I dont understand why
data that was already trained successfully also gets ignored totally after
adding the additional data.
I have been trying to solve this problem for long. Any suggestion and help
will be greatly appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have uploaded my project folder (etc and wav) containing the new data which
causes these problems. I have sent you the download link via the messaging
facility available in this forum.
Kindly check if you have obtained the data and tell me if I have to provide
anything more. Do you also need samples of the old data which trained
successfully??
Thanks,
Melvin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your data consists of a long files with quite long silences between
utterances. I think what happens is that recgnizer fails to find a good
starting point to get initial phonetic segmentation. You definitely want to
help him and to split the audio on utterances. Silence inside each utterance
should be minimal. Utterance should include about 0.2 of silence at the
beginning and in the end.
Dither option must be enabled during feature extraction to help with zero
energy regions you ave in your data.
You might want to use long audio aligner framework recently developed in
sphinx4 to obtain segmentation for each of your files. You can use your
initial 8 hour iteration.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much for your reply. I repeated the feature extraction step again
with the dither option enabled. This led to a little improvement i.e 163 of
263 files were ignored compared to 238 of 263 previously.
I removed the long silences in between the utterances for 30 files and
repeated the training again , hoping that some of the 30 files would get
accepted. But there was no change at all, all the 30 files were ignored again.
What are your thoughts on this ?? Why didn't correcting those files help? I
need your guidance in this regard.
I will remove the silences from all the files and train again and report the
results soon. And how do I use the long audio aligner ? Is there any tutorial
for it?
Thanks,
Melvin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am working on building an LVCSR System for Tamil ( Indian Language). I built
an acoustic model already with 8 hours of speech data. The training was
successful with only 41 out of 350 files ignored in the final iteration of
baum-welch. The model worked fine giving satisfactory results.
Recently I collected another 5 hours of data. When I added the new data to the
existing 8 hours and tried training the total 13 hours of data,
Sphinxtrain failed after iteration 5 of Phase 3 ignoring all the 627
files.
I have thoroughly checked all the audio files, the dictionary, transcription
and fileids again and again. There is no problem with those aspects to my
knowledge. The wav files are also in the correct format.
I have uploaded the log files of both the models for your reference.
Model1 (8 hours): http://www.4shared.com/document/QRGWQew6/tamilcontinue-
8gau.html
Model2 (13 hours): http://www.4shared.com/document/PjULcutW/tamilnew.html
I can even upload samples of my data if you need them. I dont understand why
data that was already trained successfully also gets ignored totally after
adding the additional data.
I have been trying to solve this problem for long. Any suggestion and help
will be greatly appreciated.
In order to help us to solve your problem, you need to provide the database.
Dear Nickolay,
I have uploaded my project folder (etc and wav) containing the new data which
causes these problems. I have sent you the download link via the messaging
facility available in this forum.
Kindly check if you have obtained the data and tell me if I have to provide
anything more. Do you also need samples of the old data which trained
successfully??
Thanks,
Melvin
Hello
Your data consists of a long files with quite long silences between
utterances. I think what happens is that recgnizer fails to find a good
starting point to get initial phonetic segmentation. You definitely want to
help him and to split the audio on utterances. Silence inside each utterance
should be minimal. Utterance should include about 0.2 of silence at the
beginning and in the end.
Dither option must be enabled during feature extraction to help with zero
energy regions you ave in your data.
You might want to use long audio aligner framework recently developed in
sphinx4 to obtain segmentation for each of your files. You can use your
initial 8 hour iteration.
Dear Nickolay,
Thank you so much for your reply. I repeated the feature extraction step again
with the dither option enabled. This led to a little improvement i.e 163 of
263 files were ignored compared to 238 of 263 previously.
I removed the long silences in between the utterances for 30 files and
repeated the training again , hoping that some of the 30 files would get
accepted. But there was no change at all, all the 30 files were ignored again.
What are your thoughts on this ?? Why didn't correcting those files help? I
need your guidance in this regard.
I will remove the silences from all the files and train again and report the
results soon. And how do I use the long audio aligner ? Is there any tutorial
for it?
Thanks,
Melvin