When I run the kmeans.pl script the log file (kmeans.log)
reports this:
INFO: main.c(580): -> Aborting k-means, bad initialization
INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 437: Empty cluster 62
WARNING: "kmeans.c", line 437: Empty cluster 179
WARNING: "kmeans.c", line 437: Empty cluster 200
WARNING: "kmeans.c", line 437: Empty cluster 206
WARNING: "kmeans.c", line 437: Empty cluster 225
INFO: main.c(580): -> Aborting k-means, bad initialization
INFO: main.c(589): best-so-far sqerr = -1.000000e+00
ERROR: "main.c", line 808: Too few observations for kmeans
ERROR: "main.c", line 1313: Unable to do k-means for state 0; skipping...
INFO: s3gau_io.c(218): Wrote /localhome/ghelfi/7452/model_parameters/7452.ci_semi_flatinitial/means [1x4x256 array]
INFO: s3gau_io.c(218): Wrote /localhome/ghelfi/7452/model_parameters/7452.ci_semi_flatinitial/variances [1x4x256 array]
INFO: main.c(1398): No mixing weight file given; none written
INFO: main.c(1555): TOTALS: km 0.208x 1.089e+00 var 0.000x 0.000e+00 em 0.000x 0.000e+00 all 0.208x 1.089e+00
Notice that I' m doing the trainer with only a vocabulary of 16 words and with 39 (13 phrases with 3 repetitions) audio files (.wav)
What's about this error???
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure, but I suspect the important error message here is "no mixing weight file given; none written." Take a look at the log output from earlier scripts in the process and make sure none of them were supposed to write a mixing weight file. Or maybe that's a file you were supposed to have specified yourself at the beginning; did you?
Jessica
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-07-14
Thanks Jessica but I' m quite sure that the problem isn't that.
The scripts before shown no error, that line on the error log refers that in the script is not specified the output mixing weight file and so that file wasn 't wrote.
Anyway thanks for youyr help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-07-14
I respectfully disagree with Jessica. I think that things are shown going wrong with "INFO: main.c(580): -> Aborting k-means, bad initialization", and everything printed after that is simply confirmation.
You haven't shown us the initial part of your kmeans.log file or told us how you computed the cepstra that you are using. There's an example as to what kmeans.log should look like in http://www-2.cs.cmu.edu/~rsingh/sphinxman/logfiles.html#099
and also some hints about things you may have done wrong there also.
" Q. I am trying to do VQ. It just doesn't go through. What could be wrong?
A. Its hard to say without looking at the log files. If a log file is not being generated, check for machine/path problems. If it is being generated, here are the common causes you can check for:
1. byte-swap of the feature files
2. negative file lengths
3. bad vectors in the feature files, such as those computed from headers
4. the presence of very short files (a vector or two long) "
It's possible that the amount of training data that you are using is simply not enough. 39 utterances is very little.
cheers,
jerry wolf
soliloquy learning, inc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-07-15
Claudio -- There are many other things that you must check first. For example, you said that you have 39 .wav files; the wave2feat program does not know about .wav files (only raw and NIST format) and you may have computed cepstral features from the headers as well as the signal. You should probably convert them to raw unheadered files first.
What is the sampling rate of your audio files? Are the parameters for wave2feat correct for that rate?
Have you looked at the cepstral data to be sure it's OK?
I don't know much about the kmeans program for vector quantization, so it's only a guess that you need more than 39 utterances of training data. Acoustic model training is normally done with much more -- 4 or more _hours_ of transcribed speech data, for example. I do not know how little is required for adequate training, even for a small vocabulary such as yours.
I'm sorry I can't help more definitely. Perhaps someone else with more experience with SphinxTrain can help you.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-07-18
Claudio -- are you modeling words or phones? With your very-small vocabulary, words might be better.
"When you have a very small closed vocabulary (50-60 words)
If you have only about 50-60 words in your vocabulary, and if your entire test data vocabulary is covered by the training data, then you are probably better off training word models rather than phone models. To do this, simply define the phoneset as your set of words themselves and have a dictionary that maps each word to itself and train. Also, use a lesser number of fillers, and if you do need to train phone models make sure that each of your tied states has enough counts (at least 5 or 10 instances of each)."
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-08-22
I have been experiencing an error similar to this for some time now and can't seem to get through it. I've noticed that it runs successfully if I make_feats with the -dither flag on, although I'm wondering if that just hides the problem rather than solving it? The SphinxTrain FAQ mentions two utilities to examine feat files, seecep and cepview, these don't seem to be included in the distro. Any idea on where I can obtain these?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-08-22
While I still plead ignorance as to the precise interpretation of the kmeans errors described above, let me comment on Steve's make_feats/-dither question. As my colleague cbquillen pointed out to me recently, one aspect of audio signals that'll _kill_ the training (or decoding) processing in Sphinx<N> is the presence of "digital silence", i.e., intervals of signal where the amplitude is 0 (or even a constand dc-offset value). This is because the feature processing computes a cepstrum (log of the power spectrum), and taking the log of zero (or even a very small positive number) produces a floating point exception (which is ignored by default) and extremely large resulting values.
The -dither option in make_feats adds a random LSB signal to the input signal, thus filling in any digital silences and protecting you from these evil happenings; I'd call it "fixing", not "covering up" the problem. (BTW, there is no such protection in the front end processing of Sphinx2/3, as I found out the hard way with one particular test set.)
I too would appreciate a "cepview"-like utility. I've looked around the CMU website and googled thus around the WWW, without success; it seems to be a CMU-internal tool. It wouldn't take too long to write one, but I've been too busy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I run the kmeans.pl script the log file (kmeans.log)
reports this:
INFO: main.c(580): -> Aborting k-means, bad initialization
INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 437: Empty cluster 62
WARNING: "kmeans.c", line 437: Empty cluster 179
WARNING: "kmeans.c", line 437: Empty cluster 200
WARNING: "kmeans.c", line 437: Empty cluster 206
WARNING: "kmeans.c", line 437: Empty cluster 225
INFO: main.c(580): -> Aborting k-means, bad initialization
INFO: main.c(589): best-so-far sqerr = -1.000000e+00
ERROR: "main.c", line 808: Too few observations for kmeans
ERROR: "main.c", line 1313: Unable to do k-means for state 0; skipping...
INFO: s3gau_io.c(218): Wrote /localhome/ghelfi/7452/model_parameters/7452.ci_semi_flatinitial/means [1x4x256 array]
INFO: s3gau_io.c(218): Wrote /localhome/ghelfi/7452/model_parameters/7452.ci_semi_flatinitial/variances [1x4x256 array]
INFO: main.c(1398): No mixing weight file given; none written
INFO: main.c(1555): TOTALS: km 0.208x 1.089e+00 var 0.000x 0.000e+00 em 0.000x 0.000e+00 all 0.208x 1.089e+00
Notice that I' m doing the trainer with only a vocabulary of 16 words and with 39 (13 phrases with 3 repetitions) audio files (.wav)
What's about this error???
Thanks.
I'm not sure, but I suspect the important error message here is "no mixing weight file given; none written." Take a look at the log output from earlier scripts in the process and make sure none of them were supposed to write a mixing weight file. Or maybe that's a file you were supposed to have specified yourself at the beginning; did you?
Jessica
Thanks Jessica but I' m quite sure that the problem isn't that.
The scripts before shown no error, that line on the error log refers that in the script is not specified the output mixing weight file and so that file wasn 't wrote.
Anyway thanks for youyr help.
I respectfully disagree with Jessica. I think that things are shown going wrong with "INFO: main.c(580): -> Aborting k-means, bad initialization", and everything printed after that is simply confirmation.
You haven't shown us the initial part of your kmeans.log file or told us how you computed the cepstra that you are using. There's an example as to what kmeans.log should look like in http://www-2.cs.cmu.edu/~rsingh/sphinxman/logfiles.html#099
and also some hints about things you may have done wrong there also.
Also, you'll find a very small set of hints in http://www-2.cs.cmu.edu/~rsingh/sphinxman/FAQ.html#6:
" Q. I am trying to do VQ. It just doesn't go through. What could be wrong?
A. Its hard to say without looking at the log files. If a log file is not being generated, check for machine/path problems. If it is being generated, here are the common causes you can check for:
1. byte-swap of the feature files
2. negative file lengths
3. bad vectors in the feature files, such as those computed from headers
4. the presence of very short files (a vector or two long) "
It's possible that the amount of training data that you are using is simply not enough. 39 utterances is very little.
cheers,
jerry wolf
soliloquy learning, inc.
Claudio -- There are many other things that you must check first. For example, you said that you have 39 .wav files; the wave2feat program does not know about .wav files (only raw and NIST format) and you may have computed cepstral features from the headers as well as the signal. You should probably convert them to raw unheadered files first.
What is the sampling rate of your audio files? Are the parameters for wave2feat correct for that rate?
Have you looked at the cepstral data to be sure it's OK?
I don't know much about the kmeans program for vector quantization, so it's only a guess that you need more than 39 utterances of training data. Acoustic model training is normally done with much more -- 4 or more _hours_ of transcribed speech data, for example. I do not know how little is required for adequate training, even for a small vocabulary such as yours.
I'm sorry I can't help more definitely. Perhaps someone else with more experience with SphinxTrain can help you.
cheers,
jerry
Claudio -- are you modeling words or phones? With your very-small vocabulary, words might be better.
I just noticed this in http://www-2.cs.cmu.edu/~rsingh/sphinxman/scriptman1.html, which might be helpful:
"When you have a very small closed vocabulary (50-60 words)
If you have only about 50-60 words in your vocabulary, and if your entire test data vocabulary is covered by the training data, then you are probably better off training word models rather than phone models. To do this, simply define the phoneset as your set of words themselves and have a dictionary that maps each word to itself and train. Also, use a lesser number of fillers, and if you do need to train phone models make sure that each of your tied states has enough counts (at least 5 or 10 instances of each)."
I have been experiencing an error similar to this for some time now and can't seem to get through it. I've noticed that it runs successfully if I make_feats with the -dither flag on, although I'm wondering if that just hides the problem rather than solving it? The SphinxTrain FAQ mentions two utilities to examine feat files, seecep and cepview, these don't seem to be included in the distro. Any idea on where I can obtain these?
While I still plead ignorance as to the precise interpretation of the kmeans errors described above, let me comment on Steve's make_feats/-dither question. As my colleague cbquillen pointed out to me recently, one aspect of audio signals that'll _kill_ the training (or decoding) processing in Sphinx<N> is the presence of "digital silence", i.e., intervals of signal where the amplitude is 0 (or even a constand dc-offset value). This is because the feature processing computes a cepstrum (log of the power spectrum), and taking the log of zero (or even a very small positive number) produces a floating point exception (which is ignored by default) and extremely large resulting values.
The -dither option in make_feats adds a random LSB signal to the input signal, thus filling in any digital silences and protecting you from these evil happenings; I'd call it "fixing", not "covering up" the problem. (BTW, there is no such protection in the front end processing of Sphinx2/3, as I found out the hard way with one particular test set.)
I too would appreciate a "cepview"-like utility. I've looked around the CMU website and googled thus around the WWW, without success; it seems to be a CMU-internal tool. It wouldn't take too long to write one, but I've been too busy.