I am trying to do a task that looks like is common: be able to use Sphinx for non-English languages. This involves using SphinxTrain, and there does not appear to be a complete tutorial on this. One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.
The tinydoc.txt file says "Put your waveform files in wav/". But it doesn't explain what those waveform files are. I guessed that it might a .wav file (given the directory name), but when I tried running make_feats.pl, according to the directions, it gave the following error:
ERROR: "c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2f
eat.c", line 655: Cannot read C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
v/uno.wav.sph
So it looks like it appended ".sph" onto the end of my file name. Some other documentation (I can't seem to find it right now) suggested that I should not put the extension on the file, but it didn't explain how it would know what extension to use. Can anyone explain to me how this works? What is a sph file? Are there other formats I can specify, and if so, what are they and how do I specify them?
I am running on Windows XP using ActivePerl.
Thanks for your help,
Peter Gruenbaum
SDK Bridge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.
That would be great, but to make sure are you aware about an4 tutorial? It would be nice to integrate into it somehow. Probably additional section would be suitable.
> But it doesn't explain what those waveform files are.
They must be 16 kHz 16 bit mono mswav files with wav extension.
> gave the following error:
You also need to change configuration in etc/sphinx_train.cfg:
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
> What is a sph file?
It is an analog of mswav file which is often used by speech databases. It has different header, to listen it you need to convert it with sox for example.
sox a.wav a.sph
or back
sox a.sph a.wav
But in general it's not recommended to use sph and not needed, just change the configuration as describe above.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is the information sent to the console, if that's helpful. It's occurred to me that perhaps the wav files were not in the right format. I generated them with Audacity, and it claims that they are 16 bit PCM and they are mono, although there is no information about kHz.
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_inter
face.c(100): You are using the internal mechanism to generate the seed.
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(752): Current FE Parameters:
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(753): Sampling Rate: 16000.000000
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(754): Frame Size: 410
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(755): Frame Shift: 160
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(756): FFT Size: 512
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(757): Lower Frequency: 133.333
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(758): Upper Frequency: 6855.5
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(759): Number of filters: 40
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(760): Number of Overflow Samps: 0
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(761): Start Utt Status: 0
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(763): Will add dither to audio
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(764): Dither seeded with -1
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(771): Will not use double bandwidth in mel filter
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/uno.wav
LENGTH: zu
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(786): Reading MS Wav file C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
v/uno.wav:
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(787): 16 bit PCM data, 1 channels 24576 samples
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(788): Sampled at 44100
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/dos.wav
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So here's where I get stuck. I have created the various data files, and verify_all.pl comes through okay, but I can't find instructions that explain how to create something that I can then use in a Sphinx4 application. Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.) Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.
Peter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Finally getting a chance to get back to this project. Thanks for your help. It does seem like the information needed is distributed in three places at the moment.
I tried RunAll.pl, but got the fatal error below. Any idea what could be causing that?
Thanks,
Peter
C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
MODULE: 00 verify training files
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 14 words using 11 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); fi
les exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.0054965811965812
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
he dictionary
Words in dictionary: 11
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
FATAL_ERROR: "c:\sphinx4\sphinxtrain\src\libs\libio\corpus.c", line 262: input s
tring too long. Truncated.
Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
pl)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks, that got the process going a little farther. Now it pops up a message that says:
---------------------------
Microsoft Visual C++ Debug Library
---------------------------
Debug Error!
Program: C:\sphinx4\SphinxTrain\span\bin\bw.exe
This application has requested the Runtime to terminate it in an
unusual way.
When I click Ignore, it terminates as follows. Any ideas about this one?
Thanks,
Peter
C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
MODULE: 00 verify training files
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 14 words using 11 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); fi
les exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.0054965811965812
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
he dictionary
Words in dictionary: 11
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0%
Only 0 parts of 1 of Baum Welch were successfully completed
Parts 1 failed to run!
Training failed in iteration 1
Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
pl)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to do a task that looks like is common: be able to use Sphinx for non-English languages. This involves using SphinxTrain, and there does not appear to be a complete tutorial on this. One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.
The tinydoc.txt file says "Put your waveform files in wav/". But it doesn't explain what those waveform files are. I guessed that it might a .wav file (given the directory name), but when I tried running make_feats.pl, according to the directions, it gave the following error:
ERROR: "c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2f
eat.c", line 655: Cannot read C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
v/uno.wav.sph
So it looks like it appended ".sph" onto the end of my file name. Some other documentation (I can't seem to find it right now) suggested that I should not put the extension on the file, but it didn't explain how it would know what extension to use. Can anyone explain to me how this works? What is a sph file? Are there other formats I can specify, and if so, what are they and how do I specify them?
I am running on Windows XP using ActivePerl.
Thanks for your help,
Peter Gruenbaum
SDK Bridge
> One of the things my company (www.sdkbridge.com) does is technical writing, and I willing to put together a tutorial on this if I can figure out how it works.
That would be great, but to make sure are you aware about an4 tutorial? It would be nice to integrate into it somehow. Probably additional section would be suitable.
http://www.speech.cs.cmu.edu/sphinx/tutorial.html
> But it doesn't explain what those waveform files are.
They must be 16 kHz 16 bit mono mswav files with wav extension.
> gave the following error:
You also need to change configuration in etc/sphinx_train.cfg:
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
> What is a sph file?
It is an analog of mswav file which is often used by speech databases. It has different header, to listen it you need to convert it with sox for example.
sox a.wav a.sph
or back
sox a.sph a.wav
But in general it's not recommended to use sph and not needed, just change the configuration as describe above.
Thanks, that is useful information. I tried making your changes, but now when I run make_feats.pl, it throws this error:
Microsoft Visual C++ Debug Library
Debug Error!
Program: ...
Module: ...phinx4\SphinxTrain.nightly\SphinxTrain\span\bin\wave2feat.exe
File: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2feat.c
Line: 724
Run-Time Check Failure #3 - The variable 'hdr_buf' is being used without being initialized.
Any idea what that could be?
Thanks,
Peter
Here is the information sent to the console, if that's helpful. It's occurred to me that perhaps the wav files were not in the right format. I generated them with Audacity, and it claims that they are 16 bit PCM and they are mono, although there is no information about kHz.
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_inter
face.c(100): You are using the internal mechanism to generate the seed.
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(752): Current FE Parameters:
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(753): Sampling Rate: 16000.000000
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(754): Frame Size: 410
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(755): Frame Shift: 160
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(756): FFT Size: 512
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(757): Lower Frequency: 133.333
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(758): Upper Frequency: 6855.5
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(759): Number of filters: 40
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(760): Number of Overflow Samps: 0
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(761): Start Utt Status: 0
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(763): Will add dither to audio
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(764): Dither seeded with -1
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\fe_sigpr
oc.c(771): Will not use double bandwidth in mel filter
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/uno.wav
LENGTH: zu
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(786): Reading MS Wav file C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wa
v/uno.wav:
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(787): 16 bit PCM data, 1 channels 24576 samples
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(788): Sampled at 44100
INFO: c:\sphinx4\sphinxtrain.nightly\sphinxtrain\src\programs\wave2feat\wave2fea
t.c(139): C:/sphinx4/SphinxTrain.nightly/SphinxTrain/span/wav/dos.wav
Y, it's a bug. I've just fixed it in trunk by applying the following patch:
===================================================================
--- wave2feat.c (revision 9127)
+++ wave2feat.c (working copy)
@@ -718,10 +718,9 @@
}
else if (P->input_format == MSWAV){
/ Read the header /
- MSWAV_hdr hdr_buf;
+ MSWAV_hdr hdr_buf = NULL;
/ MC: read till just before datatag /
- const int hdr_len_to_read = ((char ) (&hdr_buf->datatag))
- - (char ) hdr_buf;
+ const int hdr_len_to_read = offsetof (MSWAV_hdr, datatag);
if ((hdr_buf =
(MSWAV_hdr *) calloc(1, sizeof(MSWAV_hdr))) == NULL) {
E_ERROR("Cannot allocate for input file header\n");
Your bug fix worked. Thanks!
So here's where I get stuck. I have created the various data files, and verify_all.pl comes through okay, but I can't find instructions that explain how to create something that I can then use in a Sphinx4 application. Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.) Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.
Peter
> Which scripts do I need to run? (Presumably those numbered 01 through 07, but it would be nice to be sure.)
No, you need to run make_feats.pl and RunAll.pl, take a look at tutorial I quoted to you first.
> Which files do I need to then have the Sphinx4 configuration file point to and how? Any help appreciated.
This is described in the docs:
http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html
Finally getting a chance to get back to this project. Thanks for your help. It does seem like the information needed is distributed in three places at the moment.
I tried RunAll.pl, but got the fatal error below. Any idea what could be causing that?
Thanks,
Peter
C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
MODULE: 00 verify training files
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 14 words using 11 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); fi
les exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.0054965811965812
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
he dictionary
Words in dictionary: 11
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
FATAL_ERROR: "c:\sphinx4\sphinxtrain\src\libs\libio\corpus.c", line 262: input s
tring too long. Truncated.
Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
pl)
I think I encountered this possible bug before as well. Make sure theres an empty line at the end of your .fileids & .transcription file in etc/
Thanks, that got the process going a little farther. Now it pops up a message that says:
When I click Ignore, it terminates as follows. Any ideas about this one?
Thanks,
Peter
C:\sphinx4\SphinxTrain\span>perl scripts_pl\RunAll.pl
MODULE: 00 verify training files
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 14 words using 11 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); fi
les exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.0054965811965812
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in t
he dictionary
Words in dictionary: 11
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0%
Only 0 parts of 1 of Baum Welch were successfully completed
Parts 1 failed to run!
Training failed in iteration 1
Something failed: (C:/sphinx4/SphinxTrain/span/scripts_pl/20.ci_hmm/slave_convg.
pl)