CMU Sphinx / Forums / Help: Format of waveform files for SphinxTrain

Peter Gruenbaum - 2009-06-02

I am trying to do a task that looks like is common: be able to use Sphinx for non-English languages. This involves using SphinxTrain, and there does not appear to be a complete tutorial on this. One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.

The tinydoc.txt file says "Put your waveform files in wav/". But it doesn't explain what those waveform files are. I guessed that it might a .wav file (given the directory name), but when I tried running make_feats.pl, according to the directions, it gave the following error:

ERROR: "c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2f
eat.c", line 655: Cannot read C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
v/uno.wav.sph

So it looks like it appended ".sph" onto the end of my file name. Some other documentation (I can't seem to find it right now) suggested that I should not put the extension on the file, but it didn't explain how it would know what extension to use. Can anyone explain to me how this works? What is a sph file? Are there other formats I can specify, and if so, what are they and how do I specify them?

I am running on Windows XP using ActivePerl.

Thanks for your help,

Peter Gruenbaum
SDK Bridge

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-06-02
  
  > One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.
  
  That would be great, but to make sure are you aware about an4 tutorial? It would be nice to integrate into it somehow. Probably additional section would be suitable.
  
  http://www.speech.cs.cmu.edu/sphinx/tutorial.html
  
  > But it doesn't explain what those waveform files are.
  
  They must be 16 kHz 16 bit mono mswav files with wav extension.
  
  > gave the following error:
  
  You also need to change configuration in etc/sphinx_train.cfg:
  
  $CFG_WAVFILE_EXTENSION = 'wav';
  $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
  
  > What is a sph file?
  
  It is an analog of mswav file which is often used by speech databases. It has different header, to listen it you need to convert it with sox for example.
  
  sox a.wav a.sph
  
  or back
  
  sox a.sph a.wav
  
  But in general it's not recommended to use sph and not needed, just change the configuration as describe above.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Peter Gruenbaum - 2009-06-02
  
  Thanks, that is useful information. I tried making your changes, but now when I run make_feats.pl, it throws this error:
  
  Microsoft Visual C++ Debug Library
  
  Debug Error!
  
  Program: ...
  Module: ...phinx4\SphinxTrain.nightly\SphinxTrain\span\bin\wave2feat.exe
  File: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2feat.c
  Line: 724
  
  Run-Time Check Failure #3 - The variable 'hdr_buf' is being used without being initialized.
  
  Any idea what that could be?
  
  Thanks,
  Peter
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Peter Gruenbaum - 2009-06-02
  
  Here is the information sent to the console, if that's helpful. It's occurred to me that perhaps the wav files were not in the right format. I generated them with Audacity, and it claims that they are 16 bit PCM and they are mono, although there is no information about kHz.
  
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_inter
  face.c(100): You are using the internal mechanism to generate the seed.
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(752): Current FE Parameters:
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(753): Sampling Rate: 16000.000000
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(754): Frame Size: 410
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(755): Frame Shift: 160
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(756): FFT Size: 512
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(757): Lower Frequency: 133.333
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(758): Upper Frequency: 6855.5
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(759): Number of filters: 40
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(760): Number of Overflow Samps: 0
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(761): Start Utt Status: 0
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(763): Will add dither to audio
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(764): Dither seeded with -1
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
  oc.c(771): Will not use double bandwidth in mel filter
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
  t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/uno.wav
  LENGTH: zu
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
  t.c(786): Reading MS Wav file C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
  v/uno.wav:
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
  t.c(787): 16 bit PCM data, 1 channels 24576 samples
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
  t.c(788): Sampled at 44100
  INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
  t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/dos.wav
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-06-02
    
    Y, it's a bug. I've just fixed it in trunk by applying the following patch:
    
    ===================================================================
    --- wave2feat.c (revision 9127)
    +++ wave2feat.c (working copy)
    @@ -718,10 +718,9 @@
    }
    else if (P->input_format == MSWAV){
    / Read the header /
    - MSWAV_hdr hdr_buf;
    + MSWAV_hdr hdr_buf = NULL;
    / MC: read till just before datatag /
    - const int hdr_len_to_read = ((char ) (&hdr_buf->datatag))
    - - (char ) hdr_buf;
    + const int hdr_len_to_read = offsetof (MSWAV_hdr, datatag);
    if ((hdr_buf =
    (MSWAV_hdr *) calloc(1, sizeof(MSWAV_hdr))) == NULL) {
    E_ERROR("Cannot allocate for input file header\n");
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Peter Gruenbaum - 2009-06-04
  
  Your bug fix worked. Thanks!
  
  So here's where I get stuck. I have created the various data files, and verify_all.pl comes through okay, but I can't find instructions that explain how to create something that I can then use in a Sphinx4 application. Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.) Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.
  
  Peter
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-06-04
    
    > Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.)
    
    No, you need to run make_feats.pl and RunAll.pl, take a look at tutorial I quoted to you first.
    
    > Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.
    
    This is described in the docs:
    
    http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Peter Gruenbaum - 2009-06-08
      
      Finally getting a chance to get back to this project. Thanks for your help. It does seem like the information needed is distributed in three places at the moment.
      
      I tried RunAll.pl, but got the fatal error below. Any idea what could be causing that?
      
      Thanks,
      Peter
      
      C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
      MODULE: 00 verify training files
      O.S. is case insensitive ("A" == "a").
      Phones will be treated as case insensitive.
      Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
      phonelist file.
      Found 14 words using 11 phones
      Phase 2: DICT - Checking to make sure there are not duplicate entries in the
      dictionary
      Phase 3: CTL - Check general format; utterance length (must be positive); fi
      les exist
      Phase 4: CTL - Checking number of lines in the transcript should match lines
      in control file
      Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
      reasonable.
      Total Hours Training: 0.0054965811965812
      This is a small amount of data, no comment at this time
      Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
      he dictionary
      Words in dictionary: 11
      Words in filler dictionary: 3
      Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
      the phonelist, and all phones in the phonelist appear at least once
      MODULE: 01 Train LDA transformation
      Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
      MODULE: 02 Train MLLT transformation
      Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
      MODULE: 05 Vector Quantization
      Skipped for continuous models
      MODULE: 10 Training Context Independent models for forced alignment and VTLN
      Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
      Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
      MODULE: 11 Force-aligning transcripts
      Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
      MODULE: 12 Force-aligning data for VTLN
      Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
      MODULE: 20 Training Context Independent models
      Phase 1: Cleaning up directories:
      accumulator...logs...qmanager...models...
      Phase 2: Flat initialize
      FATAL_ERROR: "c:\sphinx4\sphinxtrain\src\libs\libio\corpus.c", line 262: input s
      tring too long. Truncated.
      Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
      pl)
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - eliasmajic - 2009-06-09
        
        I think I encountered this possible bug before as well. Make sure theres an empty line at the end of your .fileids & .transcription file in etc/
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Peter Gruenbaum - 2009-06-09
        
        Thanks, that got the process going a little farther. Now it pops up a message that says:
        
        --------------------------- Microsoft Visual C++ Debug Library --------------------------- Debug Error! Program: C:\sphinx4\SphinxTrain\span\bin\bw.exe This application has requested the Runtime to terminate it in an unusual way.
        
        When I click Ignore, it terminates as follows. Any ideas about this one?
        
        Thanks,
        Peter
        
        C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
        MODULE: 00 verify training files
        O.S. is case insensitive ("A" == "a").
        Phones will be treated as case insensitive.
        Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
        phonelist file.
        Found 14 words using 11 phones
        Phase 2: DICT - Checking to make sure there are not duplicate entries in the
        dictionary
        Phase 3: CTL - Check general format; utterance length (must be positive); fi
        les exist
        Phase 4: CTL - Checking number of lines in the transcript should match lines
        in control file
        Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
        reasonable.
        Total Hours Training: 0.0054965811965812
        This is a small amount of data, no comment at this time
        Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
        he dictionary
        Words in dictionary: 11
        Words in filler dictionary: 3
        Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
        the phonelist, and all phones in the phonelist appear at least once
        MODULE: 01 Train LDA transformation
        Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
        MODULE: 02 Train MLLT transformation
        Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
        MODULE: 05 Vector Quantization
        Skipped for continuous models
        MODULE: 10 Training Context Independent models for forced alignment and VTLN
        Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
        Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
        MODULE: 11 Force-aligning transcripts
        Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
        MODULE: 12 Force-aligning data for VTLN
        Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
        MODULE: 20 Training Context Independent models
        Phase 1: Cleaning up directories:
        accumulator...logs...qmanager...models...
        Phase 2: Flat initialize
        Phase 3: Forward-Backward
        Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
        0%
        Only 0 parts of 1 of Baum Welch were successfully completed
        Parts 1 failed to run!
        Training failed in iteration 1
        Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
        pl)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Format of waveform files for SphinxTrain

Speech Recognition Toolkit

Forums

Help

Format of waveform files for SphinxTrain document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Microsoft Visual C++ Debug Library

Format of waveform files for SphinxTrain