Menu

Problems with kmeans_init

Help
2003-04-04
2012-09-22
  • Jasper van Veghel

    I have created several WAV files as training data, which I then convert to RAW for use with wave2feat (I have alsa tried SPH(-nist)), like so:

    for i in `ls $WAVEDIR`
    do
            sox $WAVEDIR/$i -t au -r 16000 -b -c 2 $NISTDIR/`basename $i .wav`.raw 1>/dev/null
    done

    And then strip them of their headers (Yes, SoX gives then tiny headers, containing the filename and some other unreadable data) like so:

    or file in `ls $NISTDIR`
    do
            for i in `cat $NISTDIR/$file`
            do
                    if [ $next ]
                    then
                            echo "$i" >> $NISTDIR/$file.tmp
                    else
                            length="29"
                            let length+="`echo $file | wc -c`"
                            echo "$i" | cut -b $length-"`echo "$i" | wc -c`" > $NISTDIR/$file.tmp
                            next="x"
                    fi
            done
            mv $NISTDIR/$file.tmp $NISTDIR/$file
            next=""
    done

    Then I convert them to CEP like so:

    wave2feat -nfft 256 -mach_endian $ENDIANNESS -dither -c $CONTROL_F -raw -di / -ei raw -do / -eo mfc

    And I've also tried this using SPH, or -nist for wave2feat, using something like:

    sox <file.wav> <file.sph>

    And

    wave2feat -nfft 256 -mach_endian $ENDIANNESS -dither -c $CONTROL_F -nist -di / -ei sph -do / -eo mfc

    The RAW files can be played using play like so:

    play <file.raw> -r16000 -v50 -sb -traw -c2

    And that same command also works when I try to play 'goforward.16k' provided with the Sphinx II turtle demo

    As you can see I've stripped the RAW files of their headers (I assume this won't be necessary for the SPH files, if they have any ..), and used the -dither flag with wave2feat to prevent digital zeroes.

    Now when I run kmeans_init with the following arguments:

    "-gthobj single \ -stride 1 \ -ntrial 5 \ -minratio 0.001 \ -ndensity 256 \ -meanfn $CODEMEANS_F \ -varfn $CODEVARS_F \ -reest no \ -segdmpdirs $DUMPDIR \ -segdmpfn $DUMP_F \ -ceplen 13 \ -feat c/1..L-1/,d/1..L-1/,c/0/d/0/dd/0/,dd/1..L-1/ \ -agc none \ -cmn current"

    I still get tthe following errors:

    INFO: main.c(544): Initializing means using random k-means
    INFO: main.c(547): Trial 0: 256 means
    INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
    WARNING: "kmeans.c", line 437: Empty cluster 92
    WARNING: "kmeans.c", line 437: Empty cluster 107
    ...
    INFO: main.c(580):      -> Aborting k-means, bad initialization
    INFO: kmeans.c(159): km iter [0] 1.000000e+00 ...
    WARNING: "kmeans.c", line 437: Empty cluster 30
    ...
    ARNING: "kmeans.c", line 437: Empty cluster 245
    INFO: main.c(580):      -> Aborting k-means, bad initialization
    ERROR: "main.c", line 808: Too few observations for kmeans
    ERROR: "main.c", line 1313: Unable to do k-means for state 0; skipping...
    INFO: s3gau_io.c(218): Wrote /root/lmtrain/a.cmn [1x4x256 array]
    INFO: s3gau_io.c(218): Wrote /root/lmtrain/a.cvr [1x4x256 array]
    INFO: main.c(1398): No mixing weight file given; none written
    INFO: main.c(1555): TOTALS: km 1.236x 2.204e+00 var .000x 0.000e+00 em 0.000x 0.000e+00 all 1.236x2.204e+00

    I hope someone can give me a hand with this problem, as I'm pretty much stuck right now training my model..

     
    • Dan Carreira

      Dan Carreira - 2003-04-08

      did you check you file sizes after you used sox.  I found that the head size varies with the filename.

      You can use sox to strip the header

      sox file.wav -t raw -r 16000 -b -c 2 file.raw

      this could save you some work and possible avoid errors when striping the headers....

      It looks like you are truncating the output to signed bytes yes?

       
    • Andrea Michelotti

      Hi, I did several trial (byte swap, dithering, sox conversions..); but I  noticed that the only thing that affects the number of kmeans abort is the number of frames. In particolar I was trying with a small vocabulary ~ 2000 frames and I've got only kmeans-abort. Increasing the number of frames the kmeans-aborts became fewer, and now with ~80000 frames I've only 2 (abort/empty cluster). Now I'm producing raw utterances with adrec tool (in sphinx2 distribution), make_feats (with -raw instead of -nist).
      Is this the solution?
      ciao
      andrea

       
      • robert b

        robert b - 2003-04-15

        This was exactly what I found (when I was working on this a while ago).  There is some minimum amount of training data you need.  Otherwise, you just get junk results.

         
    • Anonymous

      Anonymous - 2003-04-17

      I can't help you directly with your kmeans_init problem, but I believe roberteast's response is on target.  You can't do this with "several files of data", but rather you need much more data than that.

      But let me comment on your audio file misconceptions.
        1. the first sox example you showed does not convert the WAV files to raw data, rather to AU-format data; that's why you had to strip off "tiny headers", because that's what AU files have.  Dcarreira's comments are accurate, because the file headers are varaible size and depend on the length og the file name.  His sox command is what you need.
        2. Your alternative plan to convert the .wav files to .sph, which can be used directly by wave2feat, is also good.
        3. Your examples show "-c 2" indicating two-channel audio data files.  Note that wave2feat assumes single-channel audio data files, so this is a problem for you.  See the thread in the Open Discussion forum entitled "Sampling rates and sphinx's interpretation".
        4. Your examples also indicate "-b" meaning byte data (8 bits per sample).  As mentioned in the above-referenced thread, that's not enough precision for good quality speech, and 16-bit sampling is preferable.

       
    • Anonymous

      Anonymous - 2003-04-17

      I can't help you directly with your kmeans_init problem, but I believe roberteast's response is on target.  You can't do this with "several files of data", but rather you need much more data than that.

      But let me comment on your audio file misconceptions.
        1. the first sox example you showed does not convert the WAV files to raw data, rather to AU-format data; that's why you had to strip off "tiny headers", because that's what AU files have.  Dcarreira's comments are accurate, because the file headers are varaible size and depend on the length og the file name.  His sox command is what you need.
        2. Your alternative plan to convert the .wav files to .sph, which can be used directly by wave2feat, is also good.
        3. Your examples show "-c 2" indicating two-channel audio data files.  Note that wave2feat assumes single-channel audio data files, so this is a problem for you.  See the thread in the Open Discussion forum entitled "Sampling rates and sphinx's interpretation".
        4. Your examples also indicate "-b" meaning byte data (8 bits per sample).  As mentioned in the above-referenced thread, that's not enough precision for good quality speech, and 16-bit sampling is preferable.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.