Experiment name, will be used to name model files and log files
$CFG_EXPTNAME = "$CFG_DB_NAME";
Audio waveform and feature file information
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
$CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
$CFG_FEATFILE_EXTENSION = 'mfc';
$CFG_VECTOR_LENGTH = 13;
Feature extraction parameters
$CFG_WAVFILE_SRATE = 16000.0;
$CFG_NUM_FILT = 40; # For wideband speech it's 40, for telephone 8khz
reasonable value is 31
$CFG_LO_FILT = 133.3334; # For telephone 8kHz speech value is 200
$CFG_HI_FILT = 6855.4976; # For telephone 8kHz speech value is 3500
$CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
$CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely
wrong.
(none/max) Type of AGC to apply to input files
$CFG_AGC = 'none';
(current/none) Type of cepstral mean subtraction/normalization
to apply to input files
$CFG_CMN = 'current';
(yes/no) Normalize variance of input files to 1.0
$CFG_VARNORM = 'no';
(yes/no) Use letter-to-sound rules to guess pronunciations of
unknown words (English, 40-phone specific)
$CFG_LTSOOV = 'no';
(yes/no) Train full covariance matrices
$CFG_FULLVAR = 'no';
(yes/no) Use diagonals only of full covariance matrices for
Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)
$CFG_DIAGFULL = 'no';
(yes/no) Perform vocal tract length normalization in training. This
will result in a "normalized" model which requires VTLN to be done
$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II
$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)
if (($CFG_HMM_TYPE ne ".semi.")
and ($CFG_HMM_TYPE ne ".ptm.")
and ($CFG_HMM_TYPE ne ".cont.")) {
die "Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', " .
"currently $CFG_HMM_TYPE\n";
}
This configuration is fastest and best for most acoustic models in
PocketSphinx and Sphinx-III. See below for Sphinx-II.
$CFG_STATESPERHMM = 3;
$CFG_SKIPSTATE = 'no';
if ($CFG_HMM_TYPE eq '.semi.') {
$CFG_DIRLABEL = 'semi';
Four stream features for PocketSphinx
$CFG_FEATURE = "s2_4x";
$CFG_NUM_STREAMS = 4;
$CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256;
die "For semi continuous models, the initial and final models have the same
density"
if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
} elsif ($CFG_HMM_TYPE eq '.ptm.') {
$CFG_DIRLABEL = 'ptm';
Four stream features for PocketSphinx
$CFG_FEATURE = "s2_4x";
$CFG_NUM_STREAMS = 4;
$CFG_INITIAL_NUM_DENSITIES = 64;
$CFG_FINAL_NUM_DENSITIES = 64;
die "For phonetically tied models, the initial and final models have the same
density"
if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
} elsif ($CFG_HMM_TYPE eq '.cont.') {
$CFG_DIRLABEL = 'cont';
Single stream features - Sphinx 3
$CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 1;
$CFG_INITIAL_NUM_DENSITIES = 1;
$CFG_FINAL_NUM_DENSITIES = 8;
die "The initial has to be less than the final number of densities"
if ($CFG_INITIAL_NUM_DENSITIES > $CFG_FINAL_NUM_DENSITIES);
}
Number of top gaussians to score a frame. A little bit less accurate
computations
make training significantly faster. Uncomment to apply this during the
training
For good accuracy make sure you are using the same setting in decoder
In theory this can be different for various training stages. For example 4
This variable has to be defined, otherwise utils.pl will not load.
$CFG_DONE = 1;
return 1;
when I run "perl scripts_pl/RunAll.pl"
I found the error
MODULE: 11 Force-aligning transcripts
Phase 1: Cleaning up directories:
logs...output...qmanager...
Phase 3: Creating dictionary for alignment...
Phase 4: Creating transcript for alignment...
Phase 5: Running force alignment in 1 parts
Force alignment starting: (1 of 1)
0%
Failed in part 1
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Copy initialize from falign model
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0%
This step had 1 ERROR messages and 1 WARNING messages. Please check the log
file for details.
Training failed in iteration 1
Something failed:
(/root/Desktop/sphinx/sphinx/an4/scripts_pl/20.ci_hmm/slave_convg.pl)
and when I look in the logdir, I found the error
/root/Desktop/sphinx/sphinx/an4/bin/sphinx3_align: line 117:
/root/Desktop/sphinx/sphinx/an4/bin/.libs/lt-sphinx3_align: Permission denied
/root/Desktop/sphinx/sphinx/an4/bin/sphinx3_align: line 117: exec:
/root/Desktop/sphinx/sphinx/an4/bin/.libs/lt-sphinx3_align: cannot execute:
Permission denied
Wed Oct 19 11:43:50 2011
I have used "chmod 777 sphinx3_align" and "su" but no effect. Please HELP ME
in solving this issue.
Thanks and Best Regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Copy real sphinx3_align binary from installation folder (/usr/local/bin
usually) to an4/bin folder. Make sure you can run sphinx3_align from command
line before you start the training.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Sir,
when I'm doing force alignment, i get this error :
Force alignment starting (1 of 1)
0%
This step has 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
Failed in part 1
Something failed: (../script_pl/03.force_align/slave_align.pl)
When I looked to the log file, it said :
ERROR: "cmd_ln.c", line 428: Unknown argument: -ceplen
ERROR: "cmd_ln.c", line 429: cmd_ln_parse failed, forced exit
Any guidance would be appreciated :)
Last edit: Me&I 2013-11-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank you so much Sir for your reply,
I have successfully installed all of the packages recommended in the latest tutorial "http://cmusphinx.sourceforge.net/wiki/tutorialam", and I want to do decoding under an4 database. When I run the command "sphinxtrain run" under the an4 database directory, many errors appear and the training process end with this error :
Cant't open /data_base_directory/result/an4-1-1.match
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.
Help please :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much sir, It works :
Decoding 130 segments starting at 0 (part 1 of 1)
0%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check the log file for details;
Aligning results to find error rate
SENTENCE ERROR: 45.4% (59/130) WORD ERROR RATE: 15.7% (120/773)
I have question sir : How can I enhance these results ? Using force alignment mode can do it?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
HI
I am using latest versions of sphinx3,sphinxbase and sphinxtrain..
sphinxtrain.cfg file is
Configuration script for sphinx trainer --mode:Perl--
$CFG_VERBOSE = 1; # Determines how much goes to the screen.
These are filled in at configuration time
$CFG_DB_NAME = "an4";
$CFG_BASE_DIR = "/root/Desktop/sphinx/sphinx/an4";
$CFG_SPHINXTRAIN_DIR = "/root/Desktop/sphinx/sphinx/SphinxTrain";
Directory containing SphinxTrain binaries
$CFG_BIN_DIR = "$CFG_BASE_DIR/bin";
$CFG_GIF_DIR = "$CFG_BASE_DIR/gifs";
$CFG_SCRIPT_DIR = "$CFG_BASE_DIR/scripts_pl";
Experiment name, will be used to name model files and log files
$CFG_EXPTNAME = "$CFG_DB_NAME";
Audio waveform and feature file information
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
$CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
$CFG_FEATFILE_EXTENSION = 'mfc';
$CFG_VECTOR_LENGTH = 13;
Feature extraction parameters
$CFG_WAVFILE_SRATE = 16000.0;
$CFG_NUM_FILT = 40; # For wideband speech it's 40, for telephone 8khz
reasonable value is 31
$CFG_LO_FILT = 133.3334; # For telephone 8kHz speech value is 200
$CFG_HI_FILT = 6855.4976; # For telephone 8kHz speech value is 3500
$CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
$CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely
wrong.
(none/max) Type of AGC to apply to input files
$CFG_AGC = 'none';
(current/none) Type of cepstral mean subtraction/normalization
to apply to input files
$CFG_CMN = 'current';
(yes/no) Normalize variance of input files to 1.0
$CFG_VARNORM = 'no';
(yes/no) Use letter-to-sound rules to guess pronunciations of
unknown words (English, 40-phone specific)
$CFG_LTSOOV = 'no';
(yes/no) Train full covariance matrices
$CFG_FULLVAR = 'no';
(yes/no) Use diagonals only of full covariance matrices for
Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)
$CFG_DIAGFULL = 'no';
(yes/no) Perform vocal tract length normalization in training. This
will result in a "normalized" model which requires VTLN to be done
during decoding as well.
$CFG_VTLN = 'no';
Starting warp factor for VTLN
$CFG_VTLN_START = 0.80;
Ending warp factor for VTLN
$CFG_VTLN_END = 1.40;
Step size of warping factors
$CFG_VTLN_STEP = 0.05;
Directory to write queue manager logs to
$CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";
Directory to write training logs to
$CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";
Directory for re-estimation counts
$CFG_BWACCUM_DIR = "$CFG_BASE_DIR/bwaccumdir";
Directory to write model parameter files to
$CFG_MODEL_DIR = "$CFG_BASE_DIR/model_parameters";
Directory containing transcripts and control files for
speaker-adaptive training
$CFG_LIST_DIR = "$CFG_BASE_DIR/etc";
Decoding variables for MMIE training
$CFG_LANGUAGEWEIGHT = "11.5";
$CFG_BEAMWIDTH = "1e-100";
$CFG_WORDBEAM = "1e-80";
$CFG_LANGUAGEMODEL = "$CFG_LIST_DIR/$CFG_DB_NAME.lm.DMP";
$CFG_WORDPENALTY = "0.2";
Lattice pruning variables
$CFG_ABEAM = "1e-50";
$CFG_NBEAM = "1e-10";
$CFG_PRUNED_DENLAT_DIR = "$CFG_BASE_DIR/pruned_denlat";
MMIE training related variables
$CFG_MMIE = "no";
$CFG_MMIE_MAX_ITERATIONS = 5;
$CFG_LATTICE_DIR = "$CFG_BASE_DIR/lattice";
$CFG_MMIE_TYPE = "rand"; # Valid values are "rand", "best" or "ci"
$CFG_MMIE_CONSTE = "3.0";
$CFG_NUMLAT_DIR = "$CFG_BASE_DIR/numlat";
$CFG_DENLAT_DIR = "$CFG_BASE_DIR/denlat";
Variables used in main training of models
$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
$CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
$CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
$CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
$CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";
$CFG_FEATPARAMS = "$CFG_LIST_DIR/feat.params";
Variables used in characterizing models
$CFG_HMM_TYPE = '.cont.'; # Sphinx III
$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II
$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)
if (($CFG_HMM_TYPE ne ".semi.")
and ($CFG_HMM_TYPE ne ".ptm.")
and ($CFG_HMM_TYPE ne ".cont.")) {
die "Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', " .
"currently $CFG_HMM_TYPE\n";
}
This configuration is fastest and best for most acoustic models in
PocketSphinx and Sphinx-III. See below for Sphinx-II.
$CFG_STATESPERHMM = 3;
$CFG_SKIPSTATE = 'no';
if ($CFG_HMM_TYPE eq '.semi.') {
$CFG_DIRLABEL = 'semi';
Four stream features for PocketSphinx
$CFG_FEATURE = "s2_4x";
$CFG_NUM_STREAMS = 4;
$CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256;
die "For semi continuous models, the initial and final models have the same
density"
if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
} elsif ($CFG_HMM_TYPE eq '.ptm.') {
$CFG_DIRLABEL = 'ptm';
Four stream features for PocketSphinx
$CFG_FEATURE = "s2_4x";
$CFG_NUM_STREAMS = 4;
$CFG_INITIAL_NUM_DENSITIES = 64;
$CFG_FINAL_NUM_DENSITIES = 64;
die "For phonetically tied models, the initial and final models have the same
density"
if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
} elsif ($CFG_HMM_TYPE eq '.cont.') {
$CFG_DIRLABEL = 'cont';
Single stream features - Sphinx 3
$CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 1;
$CFG_INITIAL_NUM_DENSITIES = 1;
$CFG_FINAL_NUM_DENSITIES = 8;
die "The initial has to be less than the final number of densities"
if ($CFG_INITIAL_NUM_DENSITIES > $CFG_FINAL_NUM_DENSITIES);
}
Number of top gaussians to score a frame. A little bit less accurate
computations
make training significantly faster. Uncomment to apply this during the
training
For good accuracy make sure you are using the same setting in decoder
In theory this can be different for various training stages. For example 4
for
CI stage and 16 for CD stage
$CFG_CI_TOPN = 4;
$CFG_CD_TOPN = 16;
(yes/no) Train multiple-gaussian context-independent models (useful
for alignment, use 'no' otherwise) in the models created
specifically for forced alignment
$CFG_FALIGN_CI_MGAU = 'no';
(yes/no) Train multiple-gaussian context-independent models (useful
for alignment, use 'no' otherwise)
$CFG_CI_MGAU = 'no';
Number of tied states (senones) to create in decision-tree clustering
$CFG_N_TIED_STATES = 1000;
How many parts to run Forward-Backward estimatinon in
$CFG_NPART = 1;
(yes/no) Train a single decision tree for all phones (actually one
per state) (useful for grapheme-based models, use 'no' otherwise)
$CFG_CROSS_PHONE_TREES = 'no';
Use force-aligned transcripts (if available) as input to training
$CFG_FORCEDALIGN = 'yes';
Use a specific set of models for force alignment. If not defined,
context-independent models for the current experiment will be used.
$CFG_FORCE_ALIGN_MDEF =
"$CFG_BASE_DIR/model_architecture/$CFG_EXPTNAME.falign_ci.mdef";
$CFG_FORCE_ALIGN_MODELDIR =
"$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";
Use a specific dictionary and filler dictionary for force alignment.
If these are not defined, a dictionary and filler dictionary will be
created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words
removed from the filler dictionary and added to the dictionary (this
is because the force alignment is not very good at inserting them)
$CFG_FORCE_ALIGN_DICTIONARY =
"$ST::CFG_BASE_DIR/falignout$ST::CFG_EXPTNAME.falign.dict";;
$CFG_FORCE_ALIGN_FILLERDICT =
"$ST::CFG_BASE_DIR/falignout/$ST::CFG_EXPTNAME.falign.fdict";;
Use a particular beam width for force alignment. The wider
(i.e. smaller numerically) the beam, the fewer sentences will be
rejected for bad alignment.
$CFG_FORCE_ALIGN_BEAM = 1e-60;
Calculate an LDA/MLLT transform?
$CFG_LDA_MLLT = 'no';
Dimensionality of LDA/MLLT output
$CFG_LDA_DIMENSION = 29;
This is actually just a difference in log space (it doesn't make
sense otherwise, because different feature parameters have very
different likelihoods)
$CFG_CONVERGENCE_RATIO = 0.1;
Queue::POSIX for multiple CPUs on a local machine
Queue::PBS to use a PBS/TORQUE queue
$CFG_QUEUE_TYPE = "Queue";
Name of queue to use for PBS/TORQUE
$CFG_QUEUE_NAME = "workq";
(yes/no) Build questions for decision tree clustering automatically
$CFG_MAKE_QUESTS = "yes";
If CFG_MAKE_QUESTS is yes, questions are written to this file.
If CFG_MAKE_QUESTS is no, questions are read from this file.
$CFG_QUESTION_SET =
"${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";
$CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";
$CFG_CP_OPERATION =
"${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";
This variable has to be defined, otherwise utils.pl will not load.
$CFG_DONE = 1;
return 1;
when I run "perl scripts_pl/RunAll.pl"
I found the error
MODULE: 11 Force-aligning transcripts
Phase 1: Cleaning up directories:
logs...output...qmanager...
Phase 3: Creating dictionary for alignment...
Phase 4: Creating transcript for alignment...
Phase 5: Running force alignment in 1 parts
Force alignment starting: (1 of 1)
0%
Failed in part 1
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Copy initialize from falign model
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0%
This step had 1 ERROR messages and 1 WARNING messages. Please check the log
file for details.
Training failed in iteration 1
Something failed:
(/root/Desktop/sphinx/sphinx/an4/scripts_pl/20.ci_hmm/slave_convg.pl)
and when I look in the logdir, I found the error
/root/Desktop/sphinx/sphinx/an4/bin/sphinx3_align: line 117:
/root/Desktop/sphinx/sphinx/an4/bin/.libs/lt-sphinx3_align: Permission denied
/root/Desktop/sphinx/sphinx/an4/bin/sphinx3_align: line 117: exec:
/root/Desktop/sphinx/sphinx/an4/bin/.libs/lt-sphinx3_align: cannot execute:
Permission denied
Wed Oct 19 11:43:50 2011
I have used "chmod 777 sphinx3_align" and "su" but no effect. Please HELP ME
in solving this issue.
Thanks and Best Regards
Copy real sphinx3_align binary from installation folder (/usr/local/bin
usually) to an4/bin folder. Make sure you can run sphinx3_align from command
line before you start the training.
Hi Nickolay,
Thanks
It is not an older version but another package. You can learn about cmusphinx packages from our wiki page:
http://cmusphinx.sourceforge.net/wiki/versions
We may port this functionality into sphinxtrain but we didn't make it yet.
Sphinx3 installation does not override anything.
Thanks alot for your help.. Its working...
Best Regards
HI
i have the same problem but i can't find sphinx3_align bin ,,how i can
download it?
sphinx3_align is a part of sphinx3 package
thank you ,i found it
Hi,
Please share where you found this. Am having difficulty locating it, am using the latest sphinx
Thanks
Hi Sir,
when I'm doing force alignment, i get this error :
Force alignment starting (1 of 1)
0%
This step has 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
Failed in part 1
Something failed: (../script_pl/03.force_align/slave_align.pl)
When I looked to the log file, it said :
ERROR: "cmd_ln.c", line 428: Unknown argument: -ceplen
ERROR: "cmd_ln.c", line 429: cmd_ln_parse failed, forced exit
Any guidance would be appreciated :)
Last edit: Me&I 2013-11-12
Please use latest sphinxtrain, your version seems to be outdated.
thank you so much Sir for your reply,
I have successfully installed all of the packages recommended in the latest tutorial "http://cmusphinx.sourceforge.net/wiki/tutorialam", and I want to do decoding under an4 database. When I run the command "sphinxtrain run" under the an4 database directory, many errors appear and the training process end with this error :
Cant't open /data_base_directory/result/an4-1-1.match
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.
Help please :)
Check logdir/decode contents for details. Most likely you didn't rename the LM to decode.
Thank you so much sir, It works :
Decoding 130 segments starting at 0 (part 1 of 1)
0%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check the log file for details;
Aligning results to find error rate
SENTENCE ERROR: 45.4% (59/130) WORD ERROR RATE: 15.7% (120/773)
I have question sir : How can I enhance these results ? Using force alignment mode can do it?
Use better database of bigger size. An4 is too small.