ERROR: 56 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 298: 8KHz_TEST_DR4_FLKD0_SX199 ignored
ERROR: 47 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 298: 8KHz_TRAIN_DR3_FMJF0_SX354 ignored
... and so on
I looked through the net and found the explanation that my transcriptions could be broken or that it would disappear after a few iterations... But none of both helped...
It ignores all files, so I came up to the idea that it could have to do with the conversion I'vd done:
I used following sox command to convert the wav files into au files:
sox -V <ifilename>.wav -c 1 -r 8000 -U <ofilename>.au resample -ql
it produced the following output:
sox: resample opts: Kaiser window, cutoff 0.940000, beta 16.000000
sox: Detected file format type: sph
sox: Input file SX199.WAV: using sample rate 16000
size shorts, encoding signed (2's complement), 1 channel
sox: Output file sx199.au: using sample rate 8000
size bytes, encoding u-law, 1 channel
sox: Output file: comment "Processed by SoX"
Hi Sebastian,
It is ok for some utterance being not trained by Baum-Welch. That usually just means that the BW program itself doesn't think the transcription is correct for the data.
Now, one thing you could do is to create a force-aligned version of transcription to the baum welch training. (That is the silence is explicatly specified). Without doing so, you will find that there could be a wild difference between the transcription and the waveform and causing a lot of "final state not reached".
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Feature mismatch is still the top killer of training.
I should have written it down somewhere. Perhaps I should really start the "10 common pitfalls of training and recognition" .
Again, it is sort of normal to have some utterance cannot reach the final state in Baum-Welch. For you, if the number is not too much (say less than 50), I will think that is fine. If the number is too much, consider widen the alpha beam and beta beam in bw. They are trivial to find.
For waveform information, you can try cooledit or Soundforge. I don't know about audacity but I heard that it is a powerful tool, you may also give it a trial.
I am glad that you solved the problem, just don't hestiate to ask any question in this forum.
Regards,
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nothing helps :-(
I found the two values (abeam and bbeam) in step 2, 4 and 7 and experienced with some values but it's only getting worse :-(
I'm loosing my belief in the SphinxTrainer :-(
In my (little) clinically example I've made before, everything worked perfect, but now, in a real-case, it's leaving me :-(
It can't be, that nobody has an idea...
Can't you tell me, how you, as an expert in SphinxTrain, would set up the SphinxTrain with some wav-files like https://xantippe.cs.uni-sb.de/~germi/cmu/SA1.WAV
processed by a sox-call like
sox -V <ifilename>.WAV -c 1 -r 8000 -U <ofilename>.au resample -ql
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-08-10
Sebastian -- I no longer have SphinxTrain available to me, nor do I have a Linux system, so my advice relies only on my memory.
Let me suggest that there may be problem with the .au files that you produced from SoX. From your first posting, it appears that you are converting from NIST-format 16 kHz audio files to 8 kHz mu-law (8 bits/sample) Sun-format .au files. As far as I remember, the wave2feat feature-computation program does not handle mu-law input audio files correctly (but possibly wave2feat has been changed since last time I looked?); if this is true, then the feature files you have computed would be useless. Furthermore, I don't think that wave2feat handles .au files correctly. I suggest that you reprocess your original audio files to produce raw (-t raw) 16-bit signed linear (-s) output files and then recompute features.
cheers,
jerry wolf
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sebastian,
I think you are a little bit dramatic. Probably, as many previous users, you are disappointed by a hammer because you use it to do screw-driving. :-)
As Jerry said, wave2feat currently does not support the au format. Is it necessary for you to convert the file to au?
Arthur
P.S. Don't cry, you only have around 1 hour of speech. Those are simple stuffs. ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-08-10
IMHO the .au format is only a minor problem (but I think you should output the data in raw format instead), but the mu-law output is the most serious problem with your audio data (and hence with your cepstral features).
I hope this helps. If not, then please show us both the SoX command and the wave2feat command.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
that really helped to kill most of the ERROR-messages...
some (over 30) perstist, but I think, like Arthur said, that's a minor problem?
It's a project at our university and it's becoming the last step of it, so, I have to produce results, that's why I'm almost crying, if there still are 1000's of ERRORs ;-)
But, my question is, if the combination of a .RAW file, produced by following SoX command:
sox <infile>.au -s -r 16000 -w <outfile>.raw resample -ql
Hi Sebastian,
I am glad that it works for you and you should thank for Jerry in particular because he is really experienced person in signal processing in general.
Again, please don't hestiate to ask any questions in this forum. We will try our best to help.
Regards,
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
wav2feat allows both 8k and 16k, though you need to be careful about computing filter bank sizes. The bottom line is if you use the same setting in both training and testing. The result should not be too poor. If you are in doubt, probably using the default setting will be the best for you.
wav2feat actually supports arbitrary sampling frequency, it's just that you need to compute all other parameters.
Now Sebastian, I guess we are more less done for this thread. For next question, please start another thread. My another suggestion is try to be a little bit more careful about how different tools are supposed to be used. SphinxTrain's tools may be less robust than I want, it works pretty well if you know how to use it.
Regards,
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-08-15
Sebastian -- you MUST NOT convert 8 kHz-sampled audio files to a 16 kHz sample rate (or if you do, you must ralize that that will NOT be equivalent to the original 16 kHz-sampled ones). Audio files sampled at 16 kHz contain signal information in the range 0-8 kHz, but 8 kHz files contain information from only 0-4 kHz (due to a theorem by Nyquist).
- when you convert from 16 to 8 kHz-sampled, the processing must filter out the 4-8 kHz information before converting the sample rate.
- if you convert from 8 to 16 kHz-sampled, there is no way to reconstruct that 4-8 khz information, so you will achieve a 16 kHz sample rate, but with signal information only in 0-4 kHz, not 0-8 kHz, and so the signal bandwidth will be wrong.
Yes, SphinxTrain will correctly handle audio files sampled at 8 or 16 kHz. The number of filters and the upper frequency (in both wave2feat and in the recognizer front end) must be set appropriately for each sample rate. I'm sorry but I no longer have the Sphinx documentation so I can't tell you what those settings are. Look in the wave2feat help information.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, it's me again :-)
Meanwhile I wrote a lot of scripts and something like that for using the corpus that made be available for me.
It's all running fine, but if I run the RunAll.pl script, it's getting strange...
The ERROR message(s) are:
MODULE: 02 Training Context Independent models (2005-08-04 08:38)
Cleaning up directories: accumulator...logs...models...
Flat initialize
mk_mdef_gen Log File
mk_flat Log File
accum_mean Log File
norm_mean Log File
accum_var Log File
norm_var Log File
cp_mean Log File
cp_var Log File
Baum welch starting for iteration: 1 (1 of 1) Log File
ERROR: 56 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 298: 8KHz_TEST_DR4_FLKD0_SX199 ignored
ERROR: 47 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 298: 8KHz_TRAIN_DR3_FMJF0_SX354 ignored
... and so on
I looked through the net and found the explanation that my transcriptions could be broken or that it would disappear after a few iterations... But none of both helped...
It ignores all files, so I came up to the idea that it could have to do with the conversion I'vd done:
I used following sox command to convert the wav files into au files:
sox -V <ifilename>.wav -c 1 -r 8000 -U <ofilename>.au resample -ql
it produced the following output:
sox: resample opts: Kaiser window, cutoff 0.940000, beta 16.000000
sox: Detected file format type: sph
sox: Input file SX199.WAV: using sample rate 16000
size shorts, encoding signed (2's complement), 1 channel
sox: Output file sx199.au: using sample rate 8000
size bytes, encoding u-law, 1 channel
sox: Output file: comment "Processed by SoX"
Does anyone have an idea?
I put the whole stuff in a .tar.gz file at http://mitglied.lycos.de/germi/cmu/
I also included some original wav files in ./wav_old/...
Thanks a lot,
Sebastian
Hi NeoGermi,
Following may be one of the reseson for "Final state not reached"
1) Either your transcription file does not match with sound file.
2) or there is abrupt change in sound file.
I Hope your problem will get solve....
Tushar P.
Hi Sebastian,
It is ok for some utterance being not trained by Baum-Welch. That usually just means that the BW program itself doesn't think the transcription is correct for the data.
Arthur
Hm, that could be an idea, but...
what do you mean with forced-aligned version?
At the moment my transcription looks like:
<s> SPOKEN TEXT </s>
is that already forced aligned?
Thanks a lot,
Sebastian
Feature mismatch is still the top killer of training.
I should have written it down somewhere. Perhaps I should really start the "10 common pitfalls of training and recognition" .
Again, it is sort of normal to have some utterance cannot reach the final state in Baum-Welch. For you, if the number is not too much (say less than 50), I will think that is fine. If the number is too much, consider widen the alpha beam and beta beam in bw. They are trivial to find.
For waveform information, you can try cooledit or Soundforge. I don't know about audacity but I heard that it is a powerful tool, you may also give it a trial.
I am glad that you solved the problem, just don't hestiate to ask any question in this forum.
Regards,
Arthur
sorry, the link above is broken...
I uploaded it again here:https://xantippe.cs.uni-sb.de/~germi/
@TusharP: sorry, but the transcriptions are ok, from my point of view... and the sound files seem to be ok, too... what I can say about them ...
could anyone have a look at the sound files to verify them?
thanks a lot,
Sebastian
additionally:
I placed the whole stuff uncompressed under the last given adress...
so, nobody has to download a 40 MB file ;-)
greetz,
Sebastian
P.S.: @Arthur: I can understand, that some files could be ignored.. but in my case, all files will be ignored :-(
ok... I solved the problem :-)
It was because of the bin/wave2feat was executed for files with a 16000 bit sampling rate and my files have only 8000..
so, in the first two-three iterations only 3-4 files will be ignored...
but after some further iterations the count of the ignored files will growth instead of shrink... is that normal?
and, by the way, does anyone know a good tool for getting as much information as possible out of sound files? (like sampling rate, frame rate, etc)...
Thanks a lot,
Sebastian
I could cry :-(
Nothing helps :-(
I found the two values (abeam and bbeam) in step 2, 4 and 7 and experienced with some values but it's only getting worse :-(
I'm loosing my belief in the SphinxTrainer :-(
In my (little) clinically example I've made before, everything worked perfect, but now, in a real-case, it's leaving me :-(
It can't be, that nobody has an idea...
Can't you tell me, how you, as an expert in SphinxTrain, would set up the SphinxTrain with some wav-files like
https://xantippe.cs.uni-sb.de/~germi/cmu/SA1.WAV
processed by a sox-call like
sox -V <ifilename>.WAV -c 1 -r 8000 -U <ofilename>.au resample -ql
and it's result:
https://xantippe.cs.uni-sb.de/~germi/cmu/SA1.au
Please...
I'm really desperate :-(
Thanks a lot,
Sebastian
Sebastian -- I no longer have SphinxTrain available to me, nor do I have a Linux system, so my advice relies only on my memory.
Let me suggest that there may be problem with the .au files that you produced from SoX. From your first posting, it appears that you are converting from NIST-format 16 kHz audio files to 8 kHz mu-law (8 bits/sample) Sun-format .au files. As far as I remember, the wave2feat feature-computation program does not handle mu-law input audio files correctly (but possibly wave2feat has been changed since last time I looked?); if this is true, then the feature files you have computed would be useless. Furthermore, I don't think that wave2feat handles .au files correctly. I suggest that you reprocess your original audio files to produce raw (-t raw) 16-bit signed linear (-s) output files and then recompute features.
cheers,
jerry wolf
Sebastian,
I think you are a little bit dramatic. Probably, as many previous users, you are disappointed by a hammer because you use it to do screw-driving. :-)
As Jerry said, wave2feat currently does not support the au format. Is it necessary for you to convert the file to au?
Arthur
P.S. Don't cry, you only have around 1 hour of speech. Those are simple stuffs. ;-)
IMHO the .au format is only a minor problem (but I think you should output the data in raw format instead), but the mu-law output is the most serious problem with your audio data (and hence with your cepstral features).
I hope this helps. If not, then please show us both the SoX command and the wave2feat command.
cheers,
jerry
hm, thanks a lot!
that really helped to kill most of the ERROR-messages...
some (over 30) perstist, but I think, like Arthur said, that's a minor problem?
It's a project at our university and it's becoming the last step of it, so, I have to produce results, that's why I'm almost crying, if there still are 1000's of ERRORs ;-)
But, my question is, if the combination of a .RAW file, produced by following SoX command:
sox <infile>.au -s -r 16000 -w <outfile>.raw resample -ql
and following wave2feat command:
train/bin/wave2feat -c train/etc/train_huge.fileids \ -verbose yes \ -ei raw \ -raw yes \ -eo mfc \ -nfilt 31 \ -di wav \ -do feat \ -srate 16000
is a good choice, or if you think, I could optimize some values (the <infile>.au is the same as given some messages above)...
sorry, but I'm a really newbie in things like sound-files or speech processing...
Thanks a lot,
Sebastian
Hi Sebastian,
I am glad that it works for you and you should thank for Jerry in particular because he is really experienced person in signal processing in general.
Again, please don't hestiate to ask any questions in this forum. We will try our best to help.
Regards,
Arthur
appendix:
I reconverted the 8KHz .AU files to 16KHz .RAW files...
I (my chef ;-) ), prefer(s) the 8KHz telephone version...
Is it also possible for SphinxTrain?
Thanks a lot,
Sebastian
wav2feat allows both 8k and 16k, though you need to be careful about computing filter bank sizes. The bottom line is if you use the same setting in both training and testing. The result should not be too poor. If you are in doubt, probably using the default setting will be the best for you.
wav2feat actually supports arbitrary sampling frequency, it's just that you need to compute all other parameters.
Now Sebastian, I guess we are more less done for this thread. For next question, please start another thread. My another suggestion is try to be a little bit more careful about how different tools are supposed to be used. SphinxTrain's tools may be less robust than I want, it works pretty well if you know how to use it.
Regards,
Arthur
Sebastian -- you MUST NOT convert 8 kHz-sampled audio files to a 16 kHz sample rate (or if you do, you must ralize that that will NOT be equivalent to the original 16 kHz-sampled ones). Audio files sampled at 16 kHz contain signal information in the range 0-8 kHz, but 8 kHz files contain information from only 0-4 kHz (due to a theorem by Nyquist).
- when you convert from 16 to 8 kHz-sampled, the processing must filter out the 4-8 kHz information before converting the sample rate.
- if you convert from 8 to 16 kHz-sampled, there is no way to reconstruct that 4-8 khz information, so you will achieve a 16 kHz sample rate, but with signal information only in 0-4 kHz, not 0-8 kHz, and so the signal bandwidth will be wrong.
Yes, SphinxTrain will correctly handle audio files sampled at 8 or 16 kHz. The number of filters and the upper frequency (in both wave2feat and in the recognizer front end) must be set appropriately for each sample rate. I'm sorry but I no longer have the Sphinx documentation so I can't tell you what those settings are. Look in the wave2feat help information.
cheers,
jerry