Menu

Format of waveform files for SphinxTrain

Help
2009-06-02
2012-09-22
  • Peter Gruenbaum

    Peter Gruenbaum - 2009-06-02

    I am trying to do a task that looks like is common: be able to use Sphinx for non-English languages. This involves using SphinxTrain, and there does not appear to be a complete tutorial on this. One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.

    The tinydoc.txt file says "Put your waveform files in wav/". But it doesn't explain what those waveform files are. I guessed that it might a .wav file (given the directory name), but when I tried running make_feats.pl, according to the directions, it gave the following error:

    ERROR: "c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2f
    eat.c", line 655: Cannot read C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
    v/uno.wav.sph

    So it looks like it appended ".sph" onto the end of my file name. Some other documentation (I can't seem to find it right now) suggested that I should not put the extension on the file, but it didn't explain how it would know what extension to use. Can anyone explain to me how this works? What is a sph file? Are there other formats I can specify, and if so, what are they and how do I specify them?

    I am running on Windows XP using ActivePerl.

    Thanks for your help,

    Peter Gruenbaum
    SDK Bridge

     
    • Nickolay V. Shmyrev

      > One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.

      That would be great, but to make sure are you aware about an4 tutorial? It would be nice to integrate into it somehow. Probably additional section would be suitable.

      http://www.speech.cs.cmu.edu/sphinx/tutorial.html

      > But it doesn't explain what those waveform files are.

      They must be 16 kHz 16 bit mono mswav files with wav extension.

      > gave the following error:

      You also need to change configuration in etc/sphinx_train.cfg:

      $CFG_WAVFILE_EXTENSION = 'wav';
      $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw

      > What is a sph file?

      It is an analog of mswav file which is often used by speech databases. It has different header, to listen it you need to convert it with sox for example.

      sox a.wav a.sph

      or back

      sox a.sph a.wav

      But in general it's not recommended to use sph and not needed, just change the configuration as describe above.

       
    • Peter Gruenbaum

      Peter Gruenbaum - 2009-06-02

      Thanks, that is useful information. I tried making your changes, but now when I run make_feats.pl, it throws this error:


      Microsoft Visual C++ Debug Library

      Debug Error!

      Program: ...
      Module: ...phinx4\SphinxTrain.nightly\SphinxTrain\span\bin\wave2feat.exe
      File: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2feat.c
      Line: 724

      Run-Time Check Failure #3 - The variable 'hdr_buf' is being used without being initialized.

      Any idea what that could be?

      Thanks,
      Peter

       
    • Peter Gruenbaum

      Peter Gruenbaum - 2009-06-02

      Here is the information sent to the console, if that's helpful. It's occurred to me that perhaps the wav files were not in the right format. I generated them with Audacity, and it claims that they are 16 bit PCM and they are mono, although there is no information about kHz.

      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_inter
      face.c(100): You are using the internal mechanism to generate the seed.
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(752): Current FE Parameters:
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(753): Sampling Rate: 16000.000000
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(754): Frame Size: 410
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(755): Frame Shift: 160
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(756): FFT Size: 512
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(757): Lower Frequency: 133.333
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(758): Upper Frequency: 6855.5
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(759): Number of filters: 40
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(760): Number of Overflow Samps: 0
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(761): Start Utt Status: 0
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(763): Will add dither to audio
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(764): Dither seeded with -1
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
      oc.c(771): Will not use double bandwidth in mel filter
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
      t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/uno.wav
      LENGTH: zu
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
      t.c(786): Reading MS Wav file C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
      v/uno.wav:
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
      t.c(787): 16 bit PCM data, 1 channels 24576 samples
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
      t.c(788): Sampled at 44100
      INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
      t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/dos.wav

       
      • Nickolay V. Shmyrev

        Y, it's a bug. I've just fixed it in trunk by applying the following patch:

        ===================================================================
        --- wave2feat.c (revision 9127)
        +++ wave2feat.c (working copy)
        @@ -718,10 +718,9 @@
        }
        else if (P->input_format == MSWAV){
        / Read the header /
        - MSWAV_hdr hdr_buf;
        + MSWAV_hdr
        hdr_buf = NULL;
        / MC: read till just before datatag /
        - const int hdr_len_to_read = ((char ) (&hdr_buf->datatag))
        - - (char
        ) hdr_buf;
        + const int hdr_len_to_read = offsetof (MSWAV_hdr, datatag);
        if ((hdr_buf =
        (MSWAV_hdr *) calloc(1, sizeof(MSWAV_hdr))) == NULL) {
        E_ERROR("Cannot allocate for input file header\n");

         
    • Peter Gruenbaum

      Peter Gruenbaum - 2009-06-04

      Your bug fix worked. Thanks!

      So here's where I get stuck. I have created the various data files, and verify_all.pl comes through okay, but I can't find instructions that explain how to create something that I can then use in a Sphinx4 application. Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.) Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.

      Peter

       
      • Nickolay V. Shmyrev

        > Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.)

        No, you need to run make_feats.pl and RunAll.pl, take a look at tutorial I quoted to you first.

        > Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.

        This is described in the docs:

        http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html

         
        • Peter Gruenbaum

          Peter Gruenbaum - 2009-06-08

          Finally getting a chance to get back to this project. Thanks for your help. It does seem like the information needed is distributed in three places at the moment.

          I tried RunAll.pl, but got the fatal error below. Any idea what could be causing that?

          Thanks,
          Peter

          C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
          MODULE: 00 verify training files
          O.S. is case insensitive ("A" == "a").
          Phones will be treated as case insensitive.
          Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
          phonelist file.
          Found 14 words using 11 phones
          Phase 2: DICT - Checking to make sure there are not duplicate entries in the
          dictionary
          Phase 3: CTL - Check general format; utterance length (must be positive); fi
          les exist
          Phase 4: CTL - Checking number of lines in the transcript should match lines
          in control file
          Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
          reasonable.
          Total Hours Training: 0.0054965811965812
          This is a small amount of data, no comment at this time
          Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
          he dictionary
          Words in dictionary: 11
          Words in filler dictionary: 3
          Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
          the phonelist, and all phones in the phonelist appear at least once
          MODULE: 01 Train LDA transformation
          Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
          MODULE: 02 Train MLLT transformation
          Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
          MODULE: 05 Vector Quantization
          Skipped for continuous models
          MODULE: 10 Training Context Independent models for forced alignment and VTLN
          Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
          Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
          MODULE: 11 Force-aligning transcripts
          Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
          MODULE: 12 Force-aligning data for VTLN
          Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
          MODULE: 20 Training Context Independent models
          Phase 1: Cleaning up directories:
          accumulator...logs...qmanager...models...
          Phase 2: Flat initialize
          FATAL_ERROR: "c:\sphinx4\sphinxtrain\src\libs\libio\corpus.c", line 262: input s
          tring too long. Truncated.
          Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
          pl)

           
          • eliasmajic

            eliasmajic - 2009-06-09

            I think I encountered this possible bug before as well. Make sure theres an empty line at the end of your .fileids & .transcription file in etc/

             
            • Peter Gruenbaum

              Peter Gruenbaum - 2009-06-09

              Thanks, that got the process going a little farther. Now it pops up a message that says:

              ---------------------------
              Microsoft Visual C++ Debug Library
              ---------------------------
              Debug Error!
              
              Program: C:\sphinx4\SphinxTrain\span\bin\bw.exe
              
              This application has requested the Runtime to terminate it in an 
              unusual way.
              

              When I click Ignore, it terminates as follows. Any ideas about this one?

              Thanks,
              Peter

              C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
              MODULE: 00 verify training files
              O.S. is case insensitive ("A" == "a").
              Phones will be treated as case insensitive.
              Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
              phonelist file.
              Found 14 words using 11 phones
              Phase 2: DICT - Checking to make sure there are not duplicate entries in the
              dictionary
              Phase 3: CTL - Check general format; utterance length (must be positive); fi
              les exist
              Phase 4: CTL - Checking number of lines in the transcript should match lines
              in control file
              Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
              reasonable.
              Total Hours Training: 0.0054965811965812
              This is a small amount of data, no comment at this time
              Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
              he dictionary
              Words in dictionary: 11
              Words in filler dictionary: 3
              Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
              the phonelist, and all phones in the phonelist appear at least once
              MODULE: 01 Train LDA transformation
              Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
              MODULE: 02 Train MLLT transformation
              Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
              MODULE: 05 Vector Quantization
              Skipped for continuous models
              MODULE: 10 Training Context Independent models for forced alignment and VTLN
              Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
              Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
              MODULE: 11 Force-aligning transcripts
              Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
              MODULE: 12 Force-aligning data for VTLN
              Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
              MODULE: 20 Training Context Independent models
              Phase 1: Cleaning up directories:
              accumulator...logs...qmanager...models...
              Phase 2: Flat initialize
              Phase 3: Forward-Backward
              Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
              0%
              Only 0 parts of 1 of Baum Welch were successfully completed
              Parts 1 failed to run!
              Training failed in iteration 1
              Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
              pl)

               

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.