I am using some recorded wav files with SphinxTrain. I am training Sphinx for a limited vocabulary engine. When I start the script of vector quantization, I get a lot of warnings (total 1203) in my log such as this.
INFO: main.c(572): -> Aborting k-means, bad initialization
INFO: kmeans.c(153): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 431: Empty cluster 109
WARNING: "kmeans.c", line 431: Empty cluster 140
WARNING: "kmeans.c", line 431: Empty cluster 175
WARNING: "kmeans.c", line 431: Empty cluster 194
WARNING: "kmeans.c", line 431: Empty cluster 202
WARNING: "kmeans.c", line 431: Empty cluster 227
....
....
and the following errors
ERROR: "main.c", line 800: Too few observations for kmeans
ERROR: "main.c", line 1363: Unable to do k-means for state 0; skipping...
From information on the web, I think that the problem is with my audio data. I have 539 frames , so number of frames is not a problem.
Do I need to remove headers from my wav files using some software ? Any other troubleshooting ideas?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sphinxtrain determines the size by the size of the extracted features, not by the size of the audio. Most probably you extracted features incorrectly, for 8kHz you need to edit ./scripts_pl/make_feats.pl and change parameters like upper frequency and number of filters.
Btw, the reasonable amount of audio starts from one hour, not from one minute. Also, please don't forget to set the smaller number of senones for training, something around 200 instead of 1000.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Somehow, SphinxTrain is severely underestimating the amount of audio data.
eg. If I put a recorded audio file which is one minute long in my wav directory(with the appropriate transcription file in etc) and run the script verify_all.pl, It shows me
Total Hours Training: 0.00170299145299145 (0.1 minute or 6 seconds)
Why is this the case? I have recorded my audio files with Audacity at 8kHz / 16 bit and they are not defective.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am using some recorded wav files with SphinxTrain. I am training Sphinx for a limited vocabulary engine. When I start the script of vector quantization, I get a lot of warnings (total 1203) in my log such as this.
INFO: main.c(572): -> Aborting k-means, bad initialization
INFO: kmeans.c(153): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 431: Empty cluster 109
WARNING: "kmeans.c", line 431: Empty cluster 140
WARNING: "kmeans.c", line 431: Empty cluster 175
WARNING: "kmeans.c", line 431: Empty cluster 194
WARNING: "kmeans.c", line 431: Empty cluster 202
WARNING: "kmeans.c", line 431: Empty cluster 227
....
....
and the following errors
ERROR: "main.c", line 800: Too few observations for kmeans
ERROR: "main.c", line 1363: Unable to do k-means for state 0; skipping...
From information on the web, I think that the problem is with my audio data. I have 539 frames , so number of frames is not a problem.
Do I need to remove headers from my wav files using some software ? Any other troubleshooting ideas?
Thanks!
Sphinxtrain determines the size by the size of the extracted features, not by the size of the audio. Most probably you extracted features incorrectly, for 8kHz you need to edit ./scripts_pl/make_feats.pl and change parameters like upper frequency and number of filters.
Btw, the reasonable amount of audio starts from one hour, not from one minute. Also, please don't forget to set the smaller number of senones for training, something around 200 instead of 1000.
It says you have not enough training data for the models you are trying to train.
This error may come if there are certain phones for which you don't even have atleast a single word to model in training.
Thanks for your help.
Somehow, SphinxTrain is severely underestimating the amount of audio data.
eg. If I put a recorded audio file which is one minute long in my wav directory(with the appropriate transcription file in etc) and run the script verify_all.pl, It shows me
Total Hours Training: 0.00170299145299145 (0.1 minute or 6 seconds)
Why is this the case? I have recorded my audio files with Audacity at 8kHz / 16 bit and they are not defective.
Thanks.