CMU Sphinx / Forums / Help: training for VTLN gives error

vijayabharadwaj gsr - 2011-05-03

I want to train the models with VTLN. I am getting some error.

FATAL_ERROR: "main_align.c", line 1064: fopen(/home/lahari/Speech/isolated-
vtln/vtlnout/0.80/isolated-vtln.alignedtranscripts.1,r) failed

Where I have done mistake. can you please let me know

My sphinxtrain.cfg looks like this

Configuration script for sphinx trainer --mode:Perl--

$CFG_VERBOSE = 1; # Determines how much goes to the screen.

These are filled in at configuration time

$CFG_DB_NAME = "isolated-vtln";
$CFG_BASE_DIR = "/home/lahari/Speech/isolated-vtln";
$CFG_SPHINXTRAIN_DIR = "/home/lahari/Speech/sphinxtrain";

Directory containing SphinxTrain binaries

$CFG_BIN_DIR = "$CFG_BASE_DIR/bin";
$CFG_GIF_DIR = "$CFG_BASE_DIR/gifs";
$CFG_SCRIPT_DIR = "$CFG_BASE_DIR/scripts_pl";

Experiment name, will be used to name model files and log files

$CFG_EXPTNAME = "$CFG_DB_NAME";

Audio waveform and feature file information

$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
$CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
$CFG_FEATFILE_EXTENSION = 'mfc';
$CFG_VECTOR_LENGTH = 13;

$CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
$CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely
wrong.

(none/max) Type of AGC to apply to input files

$CFG_AGC = 'none';

(current/none) Type of cepstral mean subtraction/normalization

to apply to input files

$CFG_CMN = 'current';

(yes/no) Normalize variance of input files to 1.0

$CFG_VARNORM = 'no';

(yes/no) Use letter-to-sound rules to guess pronunciations of

unknown words (English, 40-phone specific)

$CFG_LTSOOV = 'no';

(yes/no) Train full covariance matrices

$CFG_FULLVAR = 'no';

(yes/no) Use diagonals only of full covariance matrices for

Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)

$CFG_DIAGFULL = 'no';

(yes/no) Perform vocal tract length normalization in training. This

will result in a "normalized" model which requires VTLN to be done

during decoding as well.

$CFG_VTLN = 'yes';

Starting warp factor for VTLN

$CFG_VTLN_START = 0.80;

Ending warp factor for VTLN

$CFG_VTLN_END = 1.40;

Step size of warping factors

$CFG_VTLN_STEP = 0.05;

Directory to write queue manager logs to

$CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";

Directory to write training logs to

$CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";

Directory for re-estimation counts

$CFG_BWACCUM_DIR = "$CFG_BASE_DIR/bwaccumdir";

Directory to write model parameter files to

$CFG_MODEL_DIR = "$CFG_BASE_DIR/model_parameters";

Directory containing transcripts and control files for

speaker-adaptive training

$CFG_LIST_DIR = "$CFG_BASE_DIR/etc";

Decoding variables for MMIE training

$CFG_LANGUAGEWEIGHT = "11.5";
$CFG_BEAMWIDTH = "1e-100";
$CFG_WORDBEAM = "1e-80";
$CFG_LANGUAGEMODEL = "$CFG_LIST_DIR/$CFG_DB_NAME.lm.DMP";
$CFG_WORDPENALTY = "0.2";

Lattice pruning variables

$CFG_ABEAM = "1e-50";
$CFG_NBEAM = "1e-10";
$CFG_PRUNED_DENLAT_DIR = "$CFG_BASE_DIR/pruned_denlat";

MMIE training related variables

$CFG_MMIE = "no";
$CFG_MMIE_MAX_ITERATIONS = 5;
$CFG_LATTICE_DIR = "$CFG_BASE_DIR/lattice";
$CFG_MMIE_TYPE = "rand"; # Valid values are "rand", "best" or "ci"
$CFG_MMIE_CONSTE = "3.0";
$CFG_NUMLAT_DIR = "$CFG_BASE_DIR/numlat";
$CFG_DENLAT_DIR = "$CFG_BASE_DIR/denlat";

Variables used in main training of models

$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
$CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
$CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
$CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
$CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";
$CFG_FEATPARAMS = "$CFG_LIST_DIR/feat.params";

Variables used in characterizing models

$CFG_HMM_TYPE = '.cont.'; # Sphinx III

$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II

$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)

if (($CFG_HMM_TYPE ne ".semi.")
and ($CFG_HMM_TYPE ne ".ptm.")
and ($CFG_HMM_TYPE ne ".cont.")) {
die "Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', " .
"currently $CFG_HMM_TYPE\n";
}

This configuration is fastest and best for most acoustic models in

PocketSphinx and Sphinx-III. See below for Sphinx-II.

$CFG_STATESPERHMM = 3;
$CFG_SKIPSTATE = 'no';

if ($CFG_HMM_TYPE eq '.semi.') {
$CFG_DIRLABEL = 'semi';

Four stream features for PocketSphinx

$CFG_FEATURE = "s2_4x";
$CFG_NUM_STREAMS = 4;
$CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256;
die "For semi continuous models, the initial and final models have the same
density"
if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
} elsif ($CFG_HMM_TYPE eq '.ptm.') {
$CFG_DIRLABEL = 'ptm';

Four stream features for PocketSphinx

$CFG_FEATURE = "s2_4x";
$CFG_NUM_STREAMS = 4;
$CFG_INITIAL_NUM_DENSITIES = 64;
$CFG_FINAL_NUM_DENSITIES = 64;
die "For phonetically tied models, the initial and final models have the same
density"
if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
} elsif ($CFG_HMM_TYPE eq '.cont.') {
$CFG_DIRLABEL = 'cont';

Single stream features - Sphinx 3

$CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 1;
$CFG_INITIAL_NUM_DENSITIES = 1;
$CFG_FINAL_NUM_DENSITIES = 8;
die "The initial has to be less than the final number of densities"
if ($CFG_INITIAL_NUM_DENSITIES > $CFG_FINAL_NUM_DENSITIES);
}

(yes/no) Train multiple-gaussian context-independent models (useful

for alignment, use 'no' otherwise) in the models created

specifically for forced alignment

$CFG_FALIGN_CI_MGAU = 'no';

(yes/no) Train multiple-gaussian context-independent models (useful

for alignment, use 'no' otherwise)

$CFG_CI_MGAU = 'no';

Number of tied states (senones) to create in decision-tree clustering

$CFG_N_TIED_STATES = 1000;

How many parts to run Forward-Backward estimatinon in

$CFG_NPART = 1;

(yes/no) Train a single decision tree for all phones (actually one

per state) (useful for grapheme-based models, use 'no' otherwise)

$CFG_CROSS_PHONE_TREES = 'no';

Use force-aligned transcripts (if available) as input to training

$CFG_FORCEDALIGN = 'no';

Use a specific set of models for force alignment. If not defined,

context-independent models for the current experiment will be used.

$CFG_FORCE_ALIGN_MDEF =
"$CFG_BASE_DIR/model_architecture/$CFG_EXPTNAME.falign_ci.mdef";
$CFG_FORCE_ALIGN_MODELDIR =
"$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";

Use a specific dictionary and filler dictionary for force alignment.

If these are not defined, a dictionary and filler dictionary will be

created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words

removed from the filler dictionary and added to the dictionary (this

is because the force alignment is not very good at inserting them)

$CFG_FORCE_ALIGN_DICTIONARY =

"$ST::CFG_BASE_DIR/falignout$ST::CFG_EXPTNAME.falign.dict";;

$CFG_FORCE_ALIGN_FILLERDICT =

"$ST::CFG_BASE_DIR/falignout/$ST::CFG_EXPTNAME.falign.fdict";;

Use a particular beam width for force alignment. The wider

(i.e. smaller numerically) the beam, the fewer sentences will be

rejected for bad alignment.

$CFG_FORCE_ALIGN_BEAM = 1e-60;

Calculate an LDA/MLLT transform?

$CFG_LDA_MLLT = 'no';

Dimensionality of LDA/MLLT output

$CFG_LDA_DIMENSION = 29;

This is actually just a difference in log space (it doesn't make

sense otherwise, because different feature parameters have very

different likelihoods)

$CFG_CONVERGENCE_RATIO = 0.1;

Queue::POSIX for multiple CPUs on a local machine

Queue::PBS to use a PBS/TORQUE queue

$CFG_QUEUE_TYPE = "Queue";

Name of queue to use for PBS/TORQUE

$CFG_QUEUE_NAME = "workq";

(yes/no) Build questions for decision tree clustering automatically

$CFG_MAKE_QUESTS = "yes";

If CFG_MAKE_QUESTS is yes, questions are written to this file.

If CFG_MAKE_QUESTS is no, questions are read from this file.

$CFG_QUESTION_SET =
"${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";

$CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";

$CFG_CP_OPERATION =
"${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";

This variable has to be defined, otherwise utils.pl will not load.

$CFG_DONE = 1;

return 1;

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-04

Hello

In case of issues with training it's recommended to check logs in logdir to
find out what exactly went wrong. This file is missing because it wasn't
created in the previous step. There could be multiple reasons for that, one of
them is that you don't have sphinx3_align in bin folder or you can't run it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-12

Dear Sir,

I have already copied all sphinx3 executables into model bin directory. If ran
that executable it is working fine.

$ ./sphinx3_align
INFO: info.c(65): Host: 'localhost.localdomain'
INFO: info.c(69): Directory: '/home/lahari/Speech/isolatedvtln/bin'
INFO: info.c(73): ./sphinx3_align Compiled on: May 11 2011, AT: 17:16:58

INFO: cmd_ln.c(465): Looking for default argument file: default.arg
INFO: cmd_ln.c(468): Can't find default argument file default.arg.
INFO: cmd_ln.c(512): Parsing command line:
./sphinx3_align

ERROR: "cmd_ln.c", line 632: No arguments given, exiting
Arguments list definition:

-adchdr 0 Number of bytes to skip at the beginning of a waveform file (44 for WAV, 1024 for Sphere)
-adcin no Input is waveform data rather than cepstra (-cepdir and -cepext are still used)
-agc none Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh 2.0 Initial threshold for automatic gain control
-beam 1e-64 Main pruning beam applied to triphones in forward search
-cb2mllr .1cls. Senone to MLLR transformation matrix mapping file (or .1cls.)
-cepdir Input cepstrum files directory (prefixed to filespecs in control file)
-cepext .mfc Input cepstrum files extension (prefixed to filespecs in control file)
-ceplen 13 Number of components in the input feature vector
-ci_pbeam 1e-80 CI phone beam for CI-based GMM Selection.
-cmn current Cepstral mean normalization scheme ('current', 'prior', or 'none')
-cmninit 8.0 Initial values (comma-separated) for cepstral mean when 'prior' is used
-cond_ds no Conditional Down-sampling, override normal down sampling. require specify a gaussian selection map
-ctl Control file listing utterances to be processed
-ctlcount 1000000000 No. of utterances to be processed (after skipping -ctloffset entries)
-ctloffset 0 No. of utterances at the beginning of -ctl file to be skipped
-ctl_mllr Control file that list the corresponding MLLR matrix for an utterance
-dict Main pronunciation dictionary (lexicon) input file
-dist_ds no Distance-based Down-sampling, override normal down sampling.
-ds 1 Ratio of Down-sampling the frame computation.
-fdict Silence and filler (noise) word pronunciation dictionary input file
-feat 1s_c_d_dd Feature stream type, depends on the acoustic model
-featparams File containing feature extraction parameters.
-frate 100 Frame rate (only requred for xlabel style phone labels)
-gs Gaussian Selection Mapping.
-gs4gs yes A flag that specified whether the input GS map will be used for Gaussian Selection. If it is disabled, the map will only provide information to other modules.
-hmm Directory for specifying Sphinx 3's hmm, the following files are assummed to be present, mdef, mean, var, mixw, tmat. If -mdef, -mean, -var, -mixw or -tmat are specified, they will override this command.
-hyp Recognition result file, with only words
-hypseg Recognition result file, with word segmentations and scores
-insent Input transcript file corresponding to control file
-insert_sil 1 Whether to insert optional silences and fillers between words.
-kdmaxbbi -1 Maximum number of Gaussians per leaf node in kd-Trees
-kdmaxdepth 0 Maximum depth of kd-Trees to use
-kdtree kd-Tree file for Gaussian selection (for .s2semi models only)
-lambda Interpolation weights (CD/CI senone) parameters input file
-lda File containing transformation matrix to be applied to features (single-stream features only)
-ldadim 0 Dimensionality of output of feature transformation (0 to use entire matrix)
-log3table yes Determines whether to use the logs3 table or to compute the values at run time.
-logbase 1.0003 Base in which all log-likelihoods calculated
-logfn Log file (default stdout/stderr)
-lts_mismatch no Use CMUDict letter-to-sound rules to generate pronunciations for LM words doesn't appear in the dictionary . Use it with care. It assumes that the phone set in the mdef and dict are the same as the LTS rule.
-maxcdsenpf 100000 Max no. of distinct CD senone will be computed.
-mdef Model definition input file
-mean Mixture gaussian means input file
-mixw Senone mixture weights input file
-mixwfloor 0.0000001 Senone mixture weights floor (applied to data from -mixw file)
-mllr MLLR transfomation matrix to be applied to mixture gaussian means
-outsent Output transcript file with exact pronunciation/transcription
-phlabdir Output directory for xlabel style phone labels; optionally end with ,CTL
-phsegdir Output directory for phone segmentation files; optionally end with ,CTL
-s2cdsen no Output context-dependent senone indices in Sphinx-II state segmentations
-s2stsegdir Output directory for Sphinx-II format state segmentation files; optionally end with ,CTL
-senmgau .cont. Senone to mixture-gaussian mapping file (or .semi. or .cont.)
-stsegdir Output directory for state segmentation files; optionally end with ,CTL
-subvq Sub-vector quantized form of acoustic model
-subvqbeam 3.0e-3 Beam selecting best components within each mixture Gaussian
-svq4svq no A flag that specified whether the input SVQ will be used as approximate scores of the Gaussians
-svspec Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
-tighten_factor 0.5 From 0 to 1, it tightens the beam width when the frame is dropped
-tmat HMM state transition matrix input file
-tmatfloor 0.0001 HMM state transition probability floor (applied to -tmat file)
-topn 4 (S3.0 GMM Computation only) No. of top scoring densities computed in each mixture gaussian codebook (semi-continuous models only)
-var Mixture gaussian variances input file
-varfloor 0.0001 Mixture gaussian variance floor (applied to data from -var file)
-varnorm no Variance normalize each utterance (only if CMN == current)
-vqeval 3 Number of subvectors to use for SubVQ-based frame evaluation (3 for all)
-wdsegdir Output directory for word segmentation files; optionally end with ,CTL

ERROR: "cmd_ln.c", line 650: cmd_ln_parse_r failed
ERROR: "cmd_ln.c", line 699: cmd_ln_parse failed, forced exit

But still the program giving the error

INFO: info.c(65): Host: 'localhost.localdomain'
INFO: info.c(69): Directory: '/home/lahari/Speech/isolatedvtln'
INFO: info.c(73): /home/lahari/Speech/isolatedvtln/bin/sphinx3_align Compiled
on: May 11 2011, AT: 17:16:58

INFO: cmd_ln.c(512): Parsing command line:
/home/lahari/Speech/isolatedvtln/bin/sphinx3_align \
-mdef /home/lahari/Speech/isolatedvtln/model_architecture/isolatedvtln.falign_ci.mdef \
-senmgau .cont. \
-mixw /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/mixture_weights \
-mixwfloor 1e-08 \
-tmat /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/transition_matrices \
-mean /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/means \
-var /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/variances \
-varfloor 0.0001 \
-dict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.dict \
-fdict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.fdict \
-ctl /home/lahari/Speech/isolatedvtln/etc/isolatedvtln_train.fileids \
-ctloffset 0 \
-ctlcount 320 \
-cepdir /home/lahari/Speech/isolatedvtln/feat \
-cepext .0.80.mfc \
-insent /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.aligninput \
-outsent /home/lahari/Speech/isolatedvtln/vtlnout/0.80/isolatedvtln.alignedtranscripts.1 \
-wdsegdir /home/lahari/Speech/isolatedvtln/vtlnout/0.80,CTL \
-beam 1e-60 \
-agc none \
-cmn current \
-varnorm no \
-feat 1s_c_d_dd \
-ceplen 13

Current configuration:

-adchdr 0 0
-adcin no no
-agc none none
-agcthresh 2.0 2.000000e+00
-beam 1e-64 1.000000e-60
-cb2mllr .1cls. .1cls.
-cepdir /home/lahari/Speech/isolatedvtln/feat
-cepext .mfc .0.80.mfc
-ceplen 13 13
-ci_pbeam 1e-80 1.000000e-80
-cmn current current
-cmninit 8.0 8.0
-cond_ds no no
-ctl /home/lahari/Speech/isolatedvtln/etc/isolatedvtln_train.fileids
-ctlcount 1000000000 320
-ctloffset 0 0
-ctl_mllr
-dict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.dict
-dist_ds no no
-ds 1 1
-fdict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-frate 100 100
-gs
-gs4gs yes yes
-hmm
-hyp
-hypseg
-insent /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.aligninput
-insert_sil 1 1
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-lambda
-lda
-ldadim 0 0
-log3table yes yes
-logbase 1.0003 1.000300e+00
-logfn
-lts_mismatch no no
-maxcdsenpf 100000 100000
-mdef /home/lahari/Speech/isolatedvtln/model_architecture/isolatedvtln.falign_ci.mdef
-mean /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/means
-mixw /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/mixture_weights
-mixwfloor 0.0000001 1.000000e-08
-mllr
-outsent /home/lahari/Speech/isolatedvtln/vtlnout/0.80/isolatedvtln.alignedtranscripts.1
-phlabdir
-phsegdir
-s2cdsen no no
-s2stsegdir
-senmgau .cont. .cont.
-stsegdir
-subvq
-subvqbeam 3.0e-3 3.000000e-03
-svq4svq no no
-svspec
-tighten_factor 0.5 5.000000e-01
-tmat /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/transition_matrices
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-var /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/variances
-varfloor 0.0001 1.000000e-04
-varnorm no no
-vqeval 3 3
-wdsegdir /home/lahari/Speech/isolatedvtln/vtlnout/0.80,CTL

FATAL_ERROR: "main_align.c", line 1064: fopen(/home/lahari/Speech/isolatedvtln
/vtlnout/0.80/isolatedvtln.alignedtranscripts.1,r) failed

Please let me know, where I did mistake.

Note: I have changed the model from isolated-vtln to isolatedvtln.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-12

Log file

MODULE: 10 Training Context Independent models for forced alignment and VTLN
(2011-05-12 11:33)
Phase 1: Cleaning up directories:
accumulator... logs... qmanager... models... completed
Phase 2: Flat initialize
mk_mdef_gen Log File
completed
mk_flat Log File
completed
init_gau Log File
completed
norm Log File
completed
init_gau Log File
completed
norm Log File
completed
cp_parm Log File
completed
cp_parm Log File
completed
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
bw Log File
completed
Normalization for iteration: 1
norm Log File
completed
Current Overall Likelihood Per Frame = -3.60234575364793
Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)
bw Log File
This step had 140 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed
Normalization for iteration: 2
norm Log File
completed
Current Overall Likelihood Per Frame = -1.39648795518676
Convergence Ratio = 2.20585779846117
Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
bw Log File
This step had 18 ERROR messages and 2 WARNING messages. Please check the log
file for details.
completed
Normalization for iteration: 3
norm Log File
completed
Current Overall Likelihood Per Frame = -0.53251214094547
Convergence Ratio = 0.86397581424129
Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
bw Log File
This step had 4 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed
Normalization for iteration: 4
norm Log File
completed
Current Overall Likelihood Per Frame = 1.1948810297441
Convergence Ratio = 1.72739317068957
Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
bw Log File
This step had 8 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed
Normalization for iteration: 5
norm Log File
completed
Current Overall Likelihood Per Frame = 2.12185047978418
Convergence Ratio = 0.926969450040081
Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
bw Log File
This step had 10 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed
Normalization for iteration: 6
norm Log File
completed
Current Overall Likelihood Per Frame = 2.58621930454428
Convergence Ratio = 0.464368824760097
Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
bw Log File
This step had 12 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed
Normalization for iteration: 7
norm Log File
completed
Current Overall Likelihood Per Frame = 2.63752307796553
Training completed after 7 iterations
MODULE: 11 Force-aligning transcripts (2011-05-12 11:52)
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN (2011-05-12 11:52)
Phase 1: Cleaning up directories:
logs... qmanager... output...
Phase 2: Creating dictionary for alignment...
Phase 3: Creating transcript for alignment...
Phase 4: Running VTLN alignment for warp factor 0.80
Phase 5: Extracting features with warp factor 0.80
../scripts_pl/make_feats.pl Log File
completed
Phase 6: Running force alignment in 1 parts
Force alignment starting: (1 of 1)
sphinx3_align Log File
This step had 1 ERROR messages and 0 WARNING messages. Please check the log
file for details.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-12

Failed in part 1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-12

First of all please download and compile sphinx3 snapshot. You can find the
instructions here:

http://cmusphinx.sourceforge.net/wiki/download

Then answer the following questions:

Is there the following folder folder?

/home/lahari/Speech/isolatedvtln/vtlnout/0.80

If no can you create it?

If yes can you run the command?

touch /home/lahari/Speech/isolatedvtln/vtlnout/0.80/isolatedvtln.alignedtranscripts.1

What is the output of this command?

Which sphinxtrain version are you using?

What OS are you using?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-12

Dear Sir,

I am using sphinx3 snap shot with sphinxbase-0.7, Sphinxtrain version-1.0.7 I
am working Fedora 15 linux operating system.

Actually the directory

/home/lahari/Speech/isolatedvtln/vtlnout/0.80

was not created in vtlnout.

vtlnout directory contains following files before it comes out

-rw-rw-r-- 1 lahari lahari 350244 May 12 16:24 isolatedvtln.aligninput
-rw-rw-r-- 1 lahari lahari 20831 May 12 16:24 isolatedvtln.falign.dict
-rw-rw-r-- 1 lahari lahari 27 May 12 16:24 isolatedvtln.falign.fdict
-rw-rw-r-- 1 lahari lahari 0 May 12 16:25 isolatedvtln.vtlnctl.0.80
-rw-rw-r-- 1 lahari lahari 0 May 12 16:25 isolatedvtln.vtlnlsn.0.80

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hello

This is a bug which was just fixed in Sphixntrain trunk. You need to update.
The patch to fix it is simple, you can also edit the script yourself:

Index: slave_align.pl
===================================================================
--- slave_align.pl      (revision 10961)
+++ slave_align.pl      (working copy)
@@ -166,6 +166,7 @@
     Log("Phase 4: Running VTLN alignment for warp factor $warp");
     # Build state segmentation directories
     my $wdsegdir = catdir($outdir, $warp);
+    mkpath($wdsegdir, 0, 0777);
     open INPUT,"${ST::CFG_LISTOFFILES}" or die "Failed to open $ST::CFG_LISTOFFILES: $!";
     my $have_feats = 1;
     my %dirs;

vijayabharadwaj gsr - 2011-05-13

Thank you sir. With the above fix, it is working fine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-18

The HTML shows this information:

Extracting features with warp factor 0.85
../scripts_pl/make_feats.pl Log File
completed
Phase 6: Running force alignment in 1 parts
Force alignment starting: (1 of 1)
sphinx3_align Log File
This step had 990 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed
Phase 7: Updating master control file and transcript
Phase 4: Running VTLN alignment for warp factor 0.90
Phase 5: Extracting features with warp factor 0.90
../scripts_pl/make_feats.pl Log File
completed
Phase 6: Running force alignment in 1 parts
Force alignment starting: (1 of 1)
sphinx3_align Log File
This step had 991 ERROR messages and 0 WARNING messages. Please check the log
file for details.
completed

If you go to log file this is the information you will see

INFO: feat.c(1189): At directory /home/lahari/Speech/largevocvtln/feat
INFO: feat.c(1006): Reading mfc file: '/home/lahari/Speech/largevocvtln/feat
/balarama-mohan/13-30-15.1.35.mfc'
ERROR: "feat.c", line 1096: /home/lahari/Speech/largevocvtln/feat/balarama-
mohan/13-30-15.1.35.mfc: Maximum output size(15006 frames) < actual

frames(26438)

ERROR: "main_align.c", line 908: Utt 13-30-15: Input file read (balarama-
mohan/13-30-15) with dir (/home/lahari/Speech/largevocvtln/feat) and extension
(.1.35.mfc) failed
INFO: corpus.c(662): 13-30-15: 0.0 sec CPU, 0.0 sec Clk; TOT: 117.9 sec CPU,
124.0 sec Clk

I doubt the VTLN has not been completed properly because of so many fails. Is
it true?

Do I have to reduce the audio size to 1 to 2 minute long? Ofcouse the wiki has
said 30 sec long is better but it is very difficult to record in that way.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-18

In some cases, final state not reached. So, no allignment.

INFO: feat.c(1189): At directory /home/lahari/Speech/largevocvtln/feat
INFO: feat.c(1006): Reading mfc file:
'/home/lahari/Speech/largevocvtln/feat/vkrishna/1-30-4.1.35.mfc'
INFO: cmn.c(175): CMN: 12.76 0.27 -0.10 0.01 -0.25 -0.03 -0.31 -0.16 -0.21
-0.19 -0.16 -0.15 -0.18
INFO: main_align.c(919): 1-30-4: 9373 input frames
ERROR: "main_align.c", line 765: Final state not reached; no alignment for
1-30-4

INFO: corpus.c(662): 1-30-4: 0.5 sec CPU, 0.5 sec Clk; TOT: 2.3 sec CPU, 2.4
sec Clk

What can i do for this case?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-19

I doubt the VTLN has not been completed properly because of so many fails.
Is it true?

Yes. You need to make files shorter.

In some cases, final state not reached. So, no allignment.

Fix the transcription or the dictionary. The audio should match the text.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2011-05-22

I have gone through sphinx wiki. No where I could find how and where to tell
the sphinx4 decoder to use vtln trained models . Can you please give me
pointer for this sir.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-22

Hello

Sphinx4 decoder doesn't support VTLN right now. You can change the warp factor
in frontend properties but there is no currently implemeted functionality to
find the best VTLN factor. This part requires some work.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sarinsukumar - 2011-08-10

IHi,
I am also trying for VTLN on pocketsphinx, does this implemented in
pocketsphinx?
i can find MLLR and MAP, but no VTLN.

Will it improve gender normalization for pocketsphinx?

Please advice
Thanks in advance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-08-16

am also trying for VTLN on pocketsphinx, does this implemented in
pocketsphinx?

VTLN estimation is not implemented in pocketsphinx.

Will it improve gender normalization for pocketsphinx?

Yes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sarinsukumar - 2011-08-16

Hi very thanks for the help.
I have seen some frequency warping functions and decoding options in
pocketsphinx and sphinxtrain source . cant i use them for implementing VTLN?

is it available in Sphinx 4?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-08-16

. cant i use them for implementing VTLN?

yes

is it available in Sphinx 4?

not really, but it's very easy to implement such thing in frontend.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sarinsukumar - 2011-08-18

can you please give me a pointer to any implementation details?

Thanks in advance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

training for VTLN gives error

Speech Recognition Toolkit

Forums

Help

training for VTLN gives error document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Configuration script for sphinx trainer --mode:Perl--

These are filled in at configuration time

Directory containing SphinxTrain binaries

Experiment name, will be used to name model files and log files

Audio waveform and feature file information

(none/max) Type of AGC to apply to input files

(current/none) Type of cepstral mean subtraction/normalization

to apply to input files

(yes/no) Normalize variance of input files to 1.0

(yes/no) Use letter-to-sound rules to guess pronunciations of

unknown words (English, 40-phone specific)

(yes/no) Train full covariance matrices

(yes/no) Use diagonals only of full covariance matrices for

Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)

(yes/no) Perform vocal tract length normalization in training. This

will result in a "normalized" model which requires VTLN to be done

during decoding as well.

Starting warp factor for VTLN

Ending warp factor for VTLN

Step size of warping factors

Directory to write queue manager logs to

Directory to write training logs to

Directory for re-estimation counts

Directory to write model parameter files to

Directory containing transcripts and control files for

speaker-adaptive training

Decoding variables for MMIE training

Lattice pruning variables

MMIE training related variables

Variables used in main training of models

Variables used in characterizing models

$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II

$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)

This configuration is fastest and best for most acoustic models in

PocketSphinx and Sphinx-III. See below for Sphinx-II.

Four stream features for PocketSphinx

Four stream features for PocketSphinx

Single stream features - Sphinx 3

(yes/no) Train multiple-gaussian context-independent models (useful

for alignment, use 'no' otherwise) in the models created

specifically for forced alignment

(yes/no) Train multiple-gaussian context-independent models (useful

for alignment, use 'no' otherwise)

Number of tied states (senones) to create in decision-tree clustering

How many parts to run Forward-Backward estimatinon in

(yes/no) Train a single decision tree for all phones (actually one

per state) (useful for grapheme-based models, use 'no' otherwise)

Use force-aligned transcripts (if available) as input to training

Use a specific set of models for force alignment. If not defined,

context-independent models for the current experiment will be used.

Use a specific dictionary and filler dictionary for force alignment.

If these are not defined, a dictionary and filler dictionary will be

created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words

removed from the filler dictionary and added to the dictionary (this

is because the force alignment is not very good at inserting them)

$CFG_FORCE_ALIGN_DICTIONARY =

$CFG_FORCE_ALIGN_FILLERDICT =

Use a particular beam width for force alignment. The wider

(i.e. smaller numerically) the beam, the fewer sentences will be

rejected for bad alignment.

Calculate an LDA/MLLT transform?

Dimensionality of LDA/MLLT output

This is actually just a difference in log space (it doesn't make

sense otherwise, because different feature parameters have very

different likelihoods)

Queue::POSIX for multiple CPUs on a local machine

Queue::PBS to use a PBS/TORQUE queue

Name of queue to use for PBS/TORQUE

(yes/no) Build questions for decision tree clustering automatically

If CFG_MAKE_QUESTS is yes, questions are written to this file.

If CFG_MAKE_QUESTS is no, questions are read from this file.

$CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";

This variable has to be defined, otherwise utils.pl will not load.

frames(26438)

training for VTLN gives error