I have created several WAV files as training data, which I then convert to RAW for use with wave2feat (I have alsa tried SPH(-nist)), like so:
for i in `ls $WAVEDIR`
do
sox $WAVEDIR/$i -t au -r 16000 -b -c 2 $NISTDIR/`basename $i .wav`.raw 1>/dev/null
done
And then strip them of their headers (Yes, SoX gives then tiny headers, containing the filename and some other unreadable data) like so:
or file in `ls $NISTDIR`
do
for i in `cat $NISTDIR/$file`
do
if [ $next ]
then
echo "$i" >> $NISTDIR/$file.tmp
else
length="29"
let length+="`echo $file | wc -c`"
echo "$i" | cut -b $length-"`echo "$i" | wc -c`" > $NISTDIR/$file.tmp
next="x"
fi
done
mv $NISTDIR/$file.tmp $NISTDIR/$file
next=""
done
And that same command also works when I try to play 'goforward.16k' provided with the Sphinx II turtle demo
As you can see I've stripped the RAW files of their headers (I assume this won't be necessary for the SPH files, if they have any ..), and used the -dither flag with wave2feat to prevent digital zeroes.
Now when I run kmeans_init with the following arguments:
INFO: main.c(544): Initializing means using random k-means
INFO: main.c(547): Trial 0: 256 means
INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 437: Empty cluster 92
WARNING: "kmeans.c", line 437: Empty cluster 107
...
INFO: main.c(580): -> Aborting k-means, bad initialization
INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 437: Empty cluster 30
...
ARNING: "kmeans.c", line 437: Empty cluster 245
INFO: main.c(580): -> Aborting k-means, bad initialization
ERROR: "main.c", line 808: Too few observations for kmeans
ERROR: "main.c", line 1313: Unable to do k-means for state 0; skipping...
INFO: s3gau_io.c(218): Wrote /root/lmtrain/a.cmn [1x4x256 array]
INFO: s3gau_io.c(218): Wrote /root/lmtrain/a.cvr [1x4x256 array]
INFO: main.c(1398): No mixing weight file given; none written
INFO: main.c(1555): TOTALS: km 1.236x 2.204e+00 var .000x 0.000e+00 em 0.000x 0.000e+00 all 1.236x2.204e+00
I hope someone can give me a hand with this problem, as I'm pretty much stuck right now training my model..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I did several trial (byte swap, dithering, sox conversions..); but I noticed that the only thing that affects the number of kmeans abort is the number of frames. In particolar I was trying with a small vocabulary ~ 2000 frames and I've got only kmeans-abort. Increasing the number of frames the kmeans-aborts became fewer, and now with ~80000 frames I've only 2 (abort/empty cluster). Now I'm producing raw utterances with adrec tool (in sphinx2 distribution), make_feats (with -raw instead of -nist).
Is this the solution?
ciao
andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This was exactly what I found (when I was working on this a while ago). There is some minimum amount of training data you need. Otherwise, you just get junk results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-04-17
I can't help you directly with your kmeans_init problem, but I believe roberteast's response is on target. You can't do this with "several files of data", but rather you need much more data than that.
But let me comment on your audio file misconceptions.
1. the first sox example you showed does not convert the WAV files to raw data, rather to AU-format data; that's why you had to strip off "tiny headers", because that's what AU files have. Dcarreira's comments are accurate, because the file headers are varaible size and depend on the length og the file name. His sox command is what you need.
2. Your alternative plan to convert the .wav files to .sph, which can be used directly by wave2feat, is also good.
3. Your examples show "-c 2" indicating two-channel audio data files. Note that wave2feat assumes single-channel audio data files, so this is a problem for you. See the thread in the Open Discussion forum entitled "Sampling rates and sphinx's interpretation".
4. Your examples also indicate "-b" meaning byte data (8 bits per sample). As mentioned in the above-referenced thread, that's not enough precision for good quality speech, and 16-bit sampling is preferable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-04-17
I can't help you directly with your kmeans_init problem, but I believe roberteast's response is on target. You can't do this with "several files of data", but rather you need much more data than that.
But let me comment on your audio file misconceptions.
1. the first sox example you showed does not convert the WAV files to raw data, rather to AU-format data; that's why you had to strip off "tiny headers", because that's what AU files have. Dcarreira's comments are accurate, because the file headers are varaible size and depend on the length og the file name. His sox command is what you need.
2. Your alternative plan to convert the .wav files to .sph, which can be used directly by wave2feat, is also good.
3. Your examples show "-c 2" indicating two-channel audio data files. Note that wave2feat assumes single-channel audio data files, so this is a problem for you. See the thread in the Open Discussion forum entitled "Sampling rates and sphinx's interpretation".
4. Your examples also indicate "-b" meaning byte data (8 bits per sample). As mentioned in the above-referenced thread, that's not enough precision for good quality speech, and 16-bit sampling is preferable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have created several WAV files as training data, which I then convert to RAW for use with wave2feat (I have alsa tried SPH(-nist)), like so:
for i in `ls $WAVEDIR`
do
sox $WAVEDIR/$i -t au -r 16000 -b -c 2 $NISTDIR/`basename $i .wav`.raw 1>/dev/null
done
And then strip them of their headers (Yes, SoX gives then tiny headers, containing the filename and some other unreadable data) like so:
or file in `ls $NISTDIR`
do
for i in `cat $NISTDIR/$file`
do
if [ $next ]
then
echo "$i" >> $NISTDIR/$file.tmp
else
length="29"
let length+="`echo $file | wc -c`"
echo "$i" | cut -b $length-"`echo "$i" | wc -c`" > $NISTDIR/$file.tmp
next="x"
fi
done
mv $NISTDIR/$file.tmp $NISTDIR/$file
next=""
done
Then I convert them to CEP like so:
wave2feat -nfft 256 -mach_endian $ENDIANNESS -dither -c $CONTROL_F -raw -di / -ei raw -do / -eo mfc
And I've also tried this using SPH, or -nist for wave2feat, using something like:
sox <file.wav> <file.sph>
And
wave2feat -nfft 256 -mach_endian $ENDIANNESS -dither -c $CONTROL_F -nist -di / -ei sph -do / -eo mfc
The RAW files can be played using play like so:
play <file.raw> -r16000 -v50 -sb -traw -c2
And that same command also works when I try to play 'goforward.16k' provided with the Sphinx II turtle demo
As you can see I've stripped the RAW files of their headers (I assume this won't be necessary for the SPH files, if they have any ..), and used the -dither flag with wave2feat to prevent digital zeroes.
Now when I run kmeans_init with the following arguments:
"-gthobj single \ -stride 1 \ -ntrial 5 \ -minratio 0.001 \ -ndensity 256 \ -meanfn $CODEMEANS_F \ -varfn $CODEVARS_F \ -reest no \ -segdmpdirs $DUMPDIR \ -segdmpfn $DUMP_F \ -ceplen 13 \ -feat c/1..L-1/,d/1..L-1/,c/0/d/0/dd/0/,dd/1..L-1/ \ -agc none \ -cmn current"
I still get tthe following errors:
INFO: main.c(544): Initializing means using random k-means
INFO: main.c(547): Trial 0: 256 means
INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 437: Empty cluster 92
WARNING: "kmeans.c", line 437: Empty cluster 107
...
INFO: main.c(580): -> Aborting k-means, bad initialization
INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
WARNING: "kmeans.c", line 437: Empty cluster 30
...
ARNING: "kmeans.c", line 437: Empty cluster 245
INFO: main.c(580): -> Aborting k-means, bad initialization
ERROR: "main.c", line 808: Too few observations for kmeans
ERROR: "main.c", line 1313: Unable to do k-means for state 0; skipping...
INFO: s3gau_io.c(218): Wrote /root/lmtrain/a.cmn [1x4x256 array]
INFO: s3gau_io.c(218): Wrote /root/lmtrain/a.cvr [1x4x256 array]
INFO: main.c(1398): No mixing weight file given; none written
INFO: main.c(1555): TOTALS: km 1.236x 2.204e+00 var .000x 0.000e+00 em 0.000x 0.000e+00 all 1.236x2.204e+00
I hope someone can give me a hand with this problem, as I'm pretty much stuck right now training my model..
did you check you file sizes after you used sox. I found that the head size varies with the filename.
You can use sox to strip the header
sox file.wav -t raw -r 16000 -b -c 2 file.raw
this could save you some work and possible avoid errors when striping the headers....
It looks like you are truncating the output to signed bytes yes?
Hi, I did several trial (byte swap, dithering, sox conversions..); but I noticed that the only thing that affects the number of kmeans abort is the number of frames. In particolar I was trying with a small vocabulary ~ 2000 frames and I've got only kmeans-abort. Increasing the number of frames the kmeans-aborts became fewer, and now with ~80000 frames I've only 2 (abort/empty cluster). Now I'm producing raw utterances with adrec tool (in sphinx2 distribution), make_feats (with -raw instead of -nist).
Is this the solution?
ciao
andrea
This was exactly what I found (when I was working on this a while ago). There is some minimum amount of training data you need. Otherwise, you just get junk results.
I can't help you directly with your kmeans_init problem, but I believe roberteast's response is on target. You can't do this with "several files of data", but rather you need much more data than that.
But let me comment on your audio file misconceptions.
1. the first sox example you showed does not convert the WAV files to raw data, rather to AU-format data; that's why you had to strip off "tiny headers", because that's what AU files have. Dcarreira's comments are accurate, because the file headers are varaible size and depend on the length og the file name. His sox command is what you need.
2. Your alternative plan to convert the .wav files to .sph, which can be used directly by wave2feat, is also good.
3. Your examples show "-c 2" indicating two-channel audio data files. Note that wave2feat assumes single-channel audio data files, so this is a problem for you. See the thread in the Open Discussion forum entitled "Sampling rates and sphinx's interpretation".
4. Your examples also indicate "-b" meaning byte data (8 bits per sample). As mentioned in the above-referenced thread, that's not enough precision for good quality speech, and 16-bit sampling is preferable.
I can't help you directly with your kmeans_init problem, but I believe roberteast's response is on target. You can't do this with "several files of data", but rather you need much more data than that.
But let me comment on your audio file misconceptions.
1. the first sox example you showed does not convert the WAV files to raw data, rather to AU-format data; that's why you had to strip off "tiny headers", because that's what AU files have. Dcarreira's comments are accurate, because the file headers are varaible size and depend on the length og the file name. His sox command is what you need.
2. Your alternative plan to convert the .wav files to .sph, which can be used directly by wave2feat, is also good.
3. Your examples show "-c 2" indicating two-channel audio data files. Note that wave2feat assumes single-channel audio data files, so this is a problem for you. See the thread in the Open Discussion forum entitled "Sampling rates and sphinx's interpretation".
4. Your examples also indicate "-b" meaning byte data (8 bits per sample). As mentioned in the above-referenced thread, that's not enough precision for good quality speech, and 16-bit sampling is preferable.