I have a problem with training module 2. The first iteration seems to run perfectly. But the following normalization leads to a huge likelihood (end of logfile: Current Overall Likelihood Per Frame = -33794.0269704748).
So, I inspected the logfile of the first Baum-Welch iteration.
I think the 0.000000e+00 is very suspicious (compared to the e-41). Why aren't utterances that failed excluded automatically? What went wrong?
Thanks
Andreas
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-04-04
At first I want to thank all of you for your help !!
It was the suggestion of Chris which made my corpus running ! (although it is still running, module 2 has been processed sucessfully now).
Simply switching on the dithering of the audio (I added '-dither yes' as parameter to the wave2feat call) solved all the problems.
I also tested removing the silences in the whole corpus. This reduced the likelihood after the the normalization of the first baum-welch data from about -34000 to about -20000. This was still a too big value, so the second iteration failed.
Now I'm analyzing which part of the corpus caused the problems or if it is the whole corpus in average. Maybe I should have a look at contiguous nulls in the audio data or I have a look at the generated cepstra files.
Best regards
Andreas
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Indeed.
Check the following: Is there long silences in the sentencs? If there is, think of a way to cut it out. BW will be easily confused by it.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi andreas_w
I faced the same problem in my first trails
And as Arthur said I solve the problem by trimming silences from the audio file
There is quite many software which do it , me my self I use audio city it is open source and excellent tool.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-04-03
Hello Andreas,
yust an idea but have you tryed to use the -dither flag in the feature extraction in wave2feat. This flag adds a 1/2 bit noise to the silent parts and prevents so divisions by zero...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have a problem with training module 2. The first iteration seems to run perfectly. But the following normalization leads to a huge likelihood (end of logfile: Current Overall Likelihood Per Frame = -33794.0269704748).
So, I inspected the logfile of the first Baum-Welch iteration.
A typical utterance produces something like:
utt> 64 sr429 261 0 148 116 60 119 1.107095e-41 -5.068369e+01 -1.322844e+04 utt 0.182x 1.076e upd 0.182x 1.073e fwd 0.038x 1.045e bwd 0.144x 1.079e gau 4.007x 1.065e rsts 0.048x 1.139e rstf 0.004x 0.539e rstu -0.000x 0.000e
But the logfile reveals several strange utterances, e.g.
utt> 63 sr428 140 0 64 53 31 62 0.000000e+00 -5.280189e+01 -7.392265e+03 utt 0.106x 1.153e upd 0.106x 1.141e fwd 0.024x 1.071e bwd 0.081x 1.161e gau 1.206x 0.944e rsts 0.020x 1.782e rstf 0.001x 0.949e rstu -0.000x 0.000e
I think the 0.000000e+00 is very suspicious (compared to the e-41). Why aren't utterances that failed excluded automatically? What went wrong?
Thanks
Andreas
At first I want to thank all of you for your help !!
It was the suggestion of Chris which made my corpus running ! (although it is still running, module 2 has been processed sucessfully now).
Simply switching on the dithering of the audio (I added '-dither yes' as parameter to the wave2feat call) solved all the problems.
I also tested removing the silences in the whole corpus. This reduced the likelihood after the the normalization of the first baum-welch data from about -34000 to about -20000. This was still a too big value, so the second iteration failed.
Now I'm analyzing which part of the corpus caused the problems or if it is the whole corpus in average. Maybe I should have a look at contiguous nulls in the audio data or I have a look at the generated cepstra files.
Best regards
Andreas
Indeed.
Check the following: Is there long silences in the sentencs? If there is, think of a way to cut it out. BW will be easily confused by it.
Arthur
Hi andreas_w
I faced the same problem in my first trails
And as Arthur said I solve the problem by trimming silences from the audio file
There is quite many software which do it , me my self I use audio city it is open source and excellent tool.
Hello Andreas,
yust an idea but have you tryed to use the -dither flag in the feature extraction in wave2feat. This flag adds a 1/2 bit noise to the silent parts and prevents so divisions by zero...