Menu

training for VTLN gives error

Help
2011-05-03
2012-09-22
  • vijayabharadwaj gsr

    I want to train the models with VTLN. I am getting some error.

    FATAL_ERROR: "main_align.c", line 1064: fopen(/home/lahari/Speech/isolated-
    vtln/vtlnout/0.80/isolated-vtln.alignedtranscripts.1,r) failed

    Where I have done mistake. can you please let me know

    My sphinxtrain.cfg looks like this

    Configuration script for sphinx trainer --mode:Perl--

    $CFG_VERBOSE = 1; # Determines how much goes to the screen.

    These are filled in at configuration time

    $CFG_DB_NAME = "isolated-vtln";
    $CFG_BASE_DIR = "/home/lahari/Speech/isolated-vtln";
    $CFG_SPHINXTRAIN_DIR = "/home/lahari/Speech/sphinxtrain";

    Directory containing SphinxTrain binaries

    $CFG_BIN_DIR = "$CFG_BASE_DIR/bin";
    $CFG_GIF_DIR = "$CFG_BASE_DIR/gifs";
    $CFG_SCRIPT_DIR = "$CFG_BASE_DIR/scripts_pl";

    Experiment name, will be used to name model files and log files

    $CFG_EXPTNAME = "$CFG_DB_NAME";

    Audio waveform and feature file information

    $CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
    $CFG_WAVFILE_EXTENSION = 'wav';
    $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
    $CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
    $CFG_FEATFILE_EXTENSION = 'mfc';
    $CFG_VECTOR_LENGTH = 13;

    $CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
    $CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely
    wrong.

    (none/max) Type of AGC to apply to input files

    $CFG_AGC = 'none';

    (current/none) Type of cepstral mean subtraction/normalization

    to apply to input files

    $CFG_CMN = 'current';

    (yes/no) Normalize variance of input files to 1.0

    $CFG_VARNORM = 'no';

    (yes/no) Use letter-to-sound rules to guess pronunciations of

    unknown words (English, 40-phone specific)

    $CFG_LTSOOV = 'no';

    (yes/no) Train full covariance matrices

    $CFG_FULLVAR = 'no';

    (yes/no) Use diagonals only of full covariance matrices for

    Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)

    $CFG_DIAGFULL = 'no';

    (yes/no) Perform vocal tract length normalization in training. This

    will result in a "normalized" model which requires VTLN to be done

    during decoding as well.

    $CFG_VTLN = 'yes';

    Starting warp factor for VTLN

    $CFG_VTLN_START = 0.80;

    Ending warp factor for VTLN

    $CFG_VTLN_END = 1.40;

    Step size of warping factors

    $CFG_VTLN_STEP = 0.05;

    Directory to write queue manager logs to

    $CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";

    Directory to write training logs to

    $CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";

    Directory for re-estimation counts

    $CFG_BWACCUM_DIR = "$CFG_BASE_DIR/bwaccumdir";

    Directory to write model parameter files to

    $CFG_MODEL_DIR = "$CFG_BASE_DIR/model_parameters";

    Directory containing transcripts and control files for

    speaker-adaptive training

    $CFG_LIST_DIR = "$CFG_BASE_DIR/etc";

    Decoding variables for MMIE training

    $CFG_LANGUAGEWEIGHT = "11.5";
    $CFG_BEAMWIDTH = "1e-100";
    $CFG_WORDBEAM = "1e-80";
    $CFG_LANGUAGEMODEL = "$CFG_LIST_DIR/$CFG_DB_NAME.lm.DMP";
    $CFG_WORDPENALTY = "0.2";

    Lattice pruning variables

    $CFG_ABEAM = "1e-50";
    $CFG_NBEAM = "1e-10";
    $CFG_PRUNED_DENLAT_DIR = "$CFG_BASE_DIR/pruned_denlat";

    MMIE training related variables

    $CFG_MMIE = "no";
    $CFG_MMIE_MAX_ITERATIONS = 5;
    $CFG_LATTICE_DIR = "$CFG_BASE_DIR/lattice";
    $CFG_MMIE_TYPE = "rand"; # Valid values are "rand", "best" or "ci"
    $CFG_MMIE_CONSTE = "3.0";
    $CFG_NUMLAT_DIR = "$CFG_BASE_DIR/numlat";
    $CFG_DENLAT_DIR = "$CFG_BASE_DIR/denlat";

    Variables used in main training of models

    $CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
    $CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
    $CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
    $CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
    $CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";
    $CFG_FEATPARAMS = "$CFG_LIST_DIR/feat.params";

    Variables used in characterizing models

    $CFG_HMM_TYPE = '.cont.'; # Sphinx III

    $CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II

    $CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)

    if (($CFG_HMM_TYPE ne ".semi.")
    and ($CFG_HMM_TYPE ne ".ptm.")
    and ($CFG_HMM_TYPE ne ".cont.")) {
    die "Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', " .
    "currently $CFG_HMM_TYPE\n";
    }

    This configuration is fastest and best for most acoustic models in

    PocketSphinx and Sphinx-III. See below for Sphinx-II.

    $CFG_STATESPERHMM = 3;
    $CFG_SKIPSTATE = 'no';

    if ($CFG_HMM_TYPE eq '.semi.') {
    $CFG_DIRLABEL = 'semi';

    Four stream features for PocketSphinx

    $CFG_FEATURE = "s2_4x";
    $CFG_NUM_STREAMS = 4;
    $CFG_INITIAL_NUM_DENSITIES = 256;
    $CFG_FINAL_NUM_DENSITIES = 256;
    die "For semi continuous models, the initial and final models have the same
    density"
    if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
    } elsif ($CFG_HMM_TYPE eq '.ptm.') {
    $CFG_DIRLABEL = 'ptm';

    Four stream features for PocketSphinx

    $CFG_FEATURE = "s2_4x";
    $CFG_NUM_STREAMS = 4;
    $CFG_INITIAL_NUM_DENSITIES = 64;
    $CFG_FINAL_NUM_DENSITIES = 64;
    die "For phonetically tied models, the initial and final models have the same
    density"
    if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
    } elsif ($CFG_HMM_TYPE eq '.cont.') {
    $CFG_DIRLABEL = 'cont';

    Single stream features - Sphinx 3

    $CFG_FEATURE = "1s_c_d_dd";
    $CFG_NUM_STREAMS = 1;
    $CFG_INITIAL_NUM_DENSITIES = 1;
    $CFG_FINAL_NUM_DENSITIES = 8;
    die "The initial has to be less than the final number of densities"
    if ($CFG_INITIAL_NUM_DENSITIES > $CFG_FINAL_NUM_DENSITIES);
    }

    (yes/no) Train multiple-gaussian context-independent models (useful

    for alignment, use 'no' otherwise) in the models created

    specifically for forced alignment

    $CFG_FALIGN_CI_MGAU = 'no';

    (yes/no) Train multiple-gaussian context-independent models (useful

    for alignment, use 'no' otherwise)

    $CFG_CI_MGAU = 'no';

    Number of tied states (senones) to create in decision-tree clustering

    $CFG_N_TIED_STATES = 1000;

    How many parts to run Forward-Backward estimatinon in

    $CFG_NPART = 1;

    (yes/no) Train a single decision tree for all phones (actually one

    per state) (useful for grapheme-based models, use 'no' otherwise)

    $CFG_CROSS_PHONE_TREES = 'no';

    Use force-aligned transcripts (if available) as input to training

    $CFG_FORCEDALIGN = 'no';

    Use a specific set of models for force alignment. If not defined,

    context-independent models for the current experiment will be used.

    $CFG_FORCE_ALIGN_MDEF =
    "$CFG_BASE_DIR/model_architecture/$CFG_EXPTNAME.falign_ci.mdef";
    $CFG_FORCE_ALIGN_MODELDIR =
    "$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";

    Use a specific dictionary and filler dictionary for force alignment.

    If these are not defined, a dictionary and filler dictionary will be

    created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words

    removed from the filler dictionary and added to the dictionary (this

    is because the force alignment is not very good at inserting them)

    $CFG_FORCE_ALIGN_DICTIONARY =

    "$ST::CFG_BASE_DIR/falignout$ST::CFG_EXPTNAME.falign.dict";;

    $CFG_FORCE_ALIGN_FILLERDICT =

    "$ST::CFG_BASE_DIR/falignout/$ST::CFG_EXPTNAME.falign.fdict";;

    Use a particular beam width for force alignment. The wider

    (i.e. smaller numerically) the beam, the fewer sentences will be

    rejected for bad alignment.

    $CFG_FORCE_ALIGN_BEAM = 1e-60;

    Calculate an LDA/MLLT transform?

    $CFG_LDA_MLLT = 'no';

    Dimensionality of LDA/MLLT output

    $CFG_LDA_DIMENSION = 29;

    This is actually just a difference in log space (it doesn't make

    sense otherwise, because different feature parameters have very

    different likelihoods)

    $CFG_CONVERGENCE_RATIO = 0.1;

    Queue::POSIX for multiple CPUs on a local machine

    Queue::PBS to use a PBS/TORQUE queue

    $CFG_QUEUE_TYPE = "Queue";

    Name of queue to use for PBS/TORQUE

    $CFG_QUEUE_NAME = "workq";

    (yes/no) Build questions for decision tree clustering automatically

    $CFG_MAKE_QUESTS = "yes";

    If CFG_MAKE_QUESTS is yes, questions are written to this file.

    If CFG_MAKE_QUESTS is no, questions are read from this file.

    $CFG_QUESTION_SET =
    "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";

    $CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";

    $CFG_CP_OPERATION =
    "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";

    This variable has to be defined, otherwise utils.pl will not load.

    $CFG_DONE = 1;

    return 1;

     
  • Nickolay V. Shmyrev

    Hello

    In case of issues with training it's recommended to check logs in logdir to
    find out what exactly went wrong. This file is missing because it wasn't
    created in the previous step. There could be multiple reasons for that, one of
    them is that you don't have sphinx3_align in bin folder or you can't run it.

     
  • vijayabharadwaj gsr

    Dear Sir,

    I have already copied all sphinx3 executables into model bin directory. If ran
    that executable it is working fine.

    $ ./sphinx3_align
    INFO: info.c(65): Host: 'localhost.localdomain'
    INFO: info.c(69): Directory: '/home/lahari/Speech/isolatedvtln/bin'
    INFO: info.c(73): ./sphinx3_align Compiled on: May 11 2011, AT: 17:16:58

    INFO: cmd_ln.c(465): Looking for default argument file: default.arg
    INFO: cmd_ln.c(468): Can't find default argument file default.arg.
    INFO: cmd_ln.c(512): Parsing command line:
    ./sphinx3_align

    ERROR: "cmd_ln.c", line 632: No arguments given, exiting
    Arguments list definition:

    -adchdr 0 Number of bytes to skip at the beginning of a waveform file (44 for WAV, 1024 for Sphere)
    -adcin no Input is waveform data rather than cepstra (-cepdir and -cepext are still used)
    -agc none Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
    -agcthresh 2.0 Initial threshold for automatic gain control
    -beam 1e-64 Main pruning beam applied to triphones in forward search
    -cb2mllr .1cls. Senone to MLLR transformation matrix mapping file (or .1cls.)
    -cepdir Input cepstrum files directory (prefixed to filespecs in control file)
    -cepext .mfc Input cepstrum files extension (prefixed to filespecs in control file)
    -ceplen 13 Number of components in the input feature vector
    -ci_pbeam 1e-80 CI phone beam for CI-based GMM Selection.
    -cmn current Cepstral mean normalization scheme ('current', 'prior', or 'none')
    -cmninit 8.0 Initial values (comma-separated) for cepstral mean when 'prior' is used
    -cond_ds no Conditional Down-sampling, override normal down sampling. require specify a gaussian selection map
    -ctl Control file listing utterances to be processed
    -ctlcount 1000000000 No. of utterances to be processed (after skipping -ctloffset entries)
    -ctloffset 0 No. of utterances at the beginning of -ctl file to be skipped
    -ctl_mllr Control file that list the corresponding MLLR matrix for an utterance
    -dict Main pronunciation dictionary (lexicon) input file
    -dist_ds no Distance-based Down-sampling, override normal down sampling.
    -ds 1 Ratio of Down-sampling the frame computation.
    -fdict Silence and filler (noise) word pronunciation dictionary input file
    -feat 1s_c_d_dd Feature stream type, depends on the acoustic model
    -featparams File containing feature extraction parameters.
    -frate 100 Frame rate (only requred for xlabel style phone labels)
    -gs Gaussian Selection Mapping.
    -gs4gs yes A flag that specified whether the input GS map will be used for Gaussian Selection. If it is disabled, the map will only provide information to other modules.
    -hmm Directory for specifying Sphinx 3's hmm, the following files are assummed to be present, mdef, mean, var, mixw, tmat. If -mdef, -mean, -var, -mixw or -tmat are specified, they will override this command.
    -hyp Recognition result file, with only words
    -hypseg Recognition result file, with word segmentations and scores
    -insent Input transcript file corresponding to control file
    -insert_sil 1 Whether to insert optional silences and fillers between words.
    -kdmaxbbi -1 Maximum number of Gaussians per leaf node in kd-Trees
    -kdmaxdepth 0 Maximum depth of kd-Trees to use
    -kdtree kd-Tree file for Gaussian selection (for .s2semi models only)
    -lambda Interpolation weights (CD/CI senone) parameters input file
    -lda File containing transformation matrix to be applied to features (single-stream features only)
    -ldadim 0 Dimensionality of output of feature transformation (0 to use entire matrix)
    -log3table yes Determines whether to use the logs3 table or to compute the values at run time.
    -logbase 1.0003 Base in which all log-likelihoods calculated
    -logfn Log file (default stdout/stderr)
    -lts_mismatch no Use CMUDict letter-to-sound rules to generate pronunciations for LM words doesn't appear in the dictionary . Use it with care. It assumes that the phone set in the mdef and dict are the same as the LTS rule.
    -maxcdsenpf 100000 Max no. of distinct CD senone will be computed.
    -mdef Model definition input file
    -mean Mixture gaussian means input file
    -mixw Senone mixture weights input file
    -mixwfloor 0.0000001 Senone mixture weights floor (applied to data from -mixw file)
    -mllr MLLR transfomation matrix to be applied to mixture gaussian means
    -outsent Output transcript file with exact pronunciation/transcription
    -phlabdir Output directory for xlabel style phone labels; optionally end with ,CTL
    -phsegdir Output directory for phone segmentation files; optionally end with ,CTL
    -s2cdsen no Output context-dependent senone indices in Sphinx-II state segmentations
    -s2stsegdir Output directory for Sphinx-II format state segmentation files; optionally end with ,CTL
    -senmgau .cont. Senone to mixture-gaussian mapping file (or .semi. or .cont.)
    -stsegdir Output directory for state segmentation files; optionally end with ,CTL
    -subvq Sub-vector quantized form of acoustic model
    -subvqbeam 3.0e-3 Beam selecting best components within each mixture Gaussian
    -svq4svq no A flag that specified whether the input SVQ will be used as approximate scores of the Gaussians
    -svspec Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
    -tighten_factor 0.5 From 0 to 1, it tightens the beam width when the frame is dropped
    -tmat HMM state transition matrix input file
    -tmatfloor 0.0001 HMM state transition probability floor (applied to -tmat file)
    -topn 4 (S3.0 GMM Computation only) No. of top scoring densities computed in each mixture gaussian codebook (semi-continuous models only)
    -var Mixture gaussian variances input file
    -varfloor 0.0001 Mixture gaussian variance floor (applied to data from -var file)
    -varnorm no Variance normalize each utterance (only if CMN == current)
    -vqeval 3 Number of subvectors to use for SubVQ-based frame evaluation (3 for all)
    -wdsegdir Output directory for word segmentation files; optionally end with ,CTL

    ERROR: "cmd_ln.c", line 650: cmd_ln_parse_r failed
    ERROR: "cmd_ln.c", line 699: cmd_ln_parse failed, forced exit

    But still the program giving the error

    INFO: info.c(65): Host: 'localhost.localdomain'
    INFO: info.c(69): Directory: '/home/lahari/Speech/isolatedvtln'
    INFO: info.c(73): /home/lahari/Speech/isolatedvtln/bin/sphinx3_align Compiled
    on: May 11 2011, AT: 17:16:58

    INFO: cmd_ln.c(512): Parsing command line:
    /home/lahari/Speech/isolatedvtln/bin/sphinx3_align \
    -mdef /home/lahari/Speech/isolatedvtln/model_architecture/isolatedvtln.falign_ci.mdef \
    -senmgau .cont. \
    -mixw /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/mixture_weights \
    -mixwfloor 1e-08 \
    -tmat /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/transition_matrices \
    -mean /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/means \
    -var /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/variances \
    -varfloor 0.0001 \
    -dict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.dict \
    -fdict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.fdict \
    -ctl /home/lahari/Speech/isolatedvtln/etc/isolatedvtln_train.fileids \
    -ctloffset 0 \
    -ctlcount 320 \
    -cepdir /home/lahari/Speech/isolatedvtln/feat \
    -cepext .0.80.mfc \
    -insent /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.aligninput \
    -outsent /home/lahari/Speech/isolatedvtln/vtlnout/0.80/isolatedvtln.alignedtranscripts.1 \
    -wdsegdir /home/lahari/Speech/isolatedvtln/vtlnout/0.80,CTL \
    -beam 1e-60 \
    -agc none \
    -cmn current \
    -varnorm no \
    -feat 1s_c_d_dd \
    -ceplen 13

    Current configuration:

    -adchdr 0 0
    -adcin no no
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -beam 1e-64 1.000000e-60
    -cb2mllr .1cls. .1cls.
    -cepdir /home/lahari/Speech/isolatedvtln/feat
    -cepext .mfc .0.80.mfc
    -ceplen 13 13
    -ci_pbeam 1e-80 1.000000e-80
    -cmn current current
    -cmninit 8.0 8.0
    -cond_ds no no
    -ctl /home/lahari/Speech/isolatedvtln/etc/isolatedvtln_train.fileids
    -ctlcount 1000000000 320
    -ctloffset 0 0
    -ctl_mllr
    -dict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.dict
    -dist_ds no no
    -ds 1 1
    -fdict /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.falign.fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -frate 100 100
    -gs
    -gs4gs yes yes
    -hmm
    -hyp
    -hypseg
    -insent /home/lahari/Speech/isolatedvtln/vtlnout/isolatedvtln.aligninput
    -insert_sil 1 1
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -lambda
    -lda
    -ldadim 0 0
    -log3table yes yes
    -logbase 1.0003 1.000300e+00
    -logfn
    -lts_mismatch no no
    -maxcdsenpf 100000 100000
    -mdef /home/lahari/Speech/isolatedvtln/model_architecture/isolatedvtln.falign_ci.mdef
    -mean /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/means
    -mixw /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/mixture_weights
    -mixwfloor 0.0000001 1.000000e-08
    -mllr
    -outsent /home/lahari/Speech/isolatedvtln/vtlnout/0.80/isolatedvtln.alignedtranscripts.1
    -phlabdir
    -phsegdir
    -s2cdsen no no
    -s2stsegdir
    -senmgau .cont. .cont.
    -stsegdir
    -subvq
    -subvqbeam 3.0e-3 3.000000e-03
    -svq4svq no no
    -svspec
    -tighten_factor 0.5 5.000000e-01
    -tmat /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/transition_matrices
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -var /home/lahari/Speech/isolatedvtln/model_parameters/isolatedvtln.falign_ci_cont/variances
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -vqeval 3 3
    -wdsegdir /home/lahari/Speech/isolatedvtln/vtlnout/0.80,CTL

    FATAL_ERROR: "main_align.c", line 1064: fopen(/home/lahari/Speech/isolatedvtln
    /vtlnout/0.80/isolatedvtln.alignedtranscripts.1,r) failed

    Please let me know, where I did mistake.

    Note: I have changed the model from isolated-vtln to isolatedvtln.

     
  • vijayabharadwaj gsr

    Log file


    MODULE: 10 Training Context Independent models for forced alignment and VTLN
    (2011-05-12 11:33)
    Phase 1: Cleaning up directories:
    accumulator... logs... qmanager... models... completed
    Phase 2: Flat initialize
    mk_mdef_gen Log File
    completed
    mk_flat Log File
    completed
    init_gau Log File
    completed
    norm Log File
    completed
    init_gau Log File
    completed
    norm Log File
    completed
    cp_parm Log File
    completed
    cp_parm Log File
    completed
    Phase 3: Forward-Backward
    Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
    bw Log File
    completed
    Normalization for iteration: 1
    norm Log File
    completed
    Current Overall Likelihood Per Frame = -3.60234575364793
    Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)
    bw Log File
    This step had 140 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed
    Normalization for iteration: 2
    norm Log File
    completed
    Current Overall Likelihood Per Frame = -1.39648795518676
    Convergence Ratio = 2.20585779846117
    Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
    bw Log File
    This step had 18 ERROR messages and 2 WARNING messages. Please check the log
    file for details.
    completed
    Normalization for iteration: 3
    norm Log File
    completed
    Current Overall Likelihood Per Frame = -0.53251214094547
    Convergence Ratio = 0.86397581424129
    Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
    bw Log File
    This step had 4 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed
    Normalization for iteration: 4
    norm Log File
    completed
    Current Overall Likelihood Per Frame = 1.1948810297441
    Convergence Ratio = 1.72739317068957
    Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
    bw Log File
    This step had 8 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed
    Normalization for iteration: 5
    norm Log File
    completed
    Current Overall Likelihood Per Frame = 2.12185047978418
    Convergence Ratio = 0.926969450040081
    Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
    bw Log File
    This step had 10 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed
    Normalization for iteration: 6
    norm Log File
    completed
    Current Overall Likelihood Per Frame = 2.58621930454428
    Convergence Ratio = 0.464368824760097
    Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
    bw Log File
    This step had 12 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed
    Normalization for iteration: 7
    norm Log File
    completed
    Current Overall Likelihood Per Frame = 2.63752307796553
    Training completed after 7 iterations
    MODULE: 11 Force-aligning transcripts (2011-05-12 11:52)
    Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    MODULE: 12 Force-aligning data for VTLN (2011-05-12 11:52)
    Phase 1: Cleaning up directories:
    logs... qmanager... output...
    Phase 2: Creating dictionary for alignment...
    Phase 3: Creating transcript for alignment...
    Phase 4: Running VTLN alignment for warp factor 0.80
    Phase 5: Extracting features with warp factor 0.80
    ../scripts_pl/make_feats.pl Log File
    completed
    Phase 6: Running force alignment in 1 parts
    Force alignment starting: (1 of 1)
    sphinx3_align Log File
    This step had 1 ERROR messages and 0 WARNING messages. Please check the log
    file for details.

     
  • vijayabharadwaj gsr

    Failed in part 1

     
  • Nickolay V. Shmyrev

    First of all please download and compile sphinx3 snapshot. You can find the
    instructions here:

    http://cmusphinx.sourceforge.net/wiki/download

    Then answer the following questions:

    Is there the following folder folder?

    /home/lahari/Speech/isolatedvtln/vtlnout/0.80
    

    If no can you create it?

    If yes can you run the command?

    touch /home/lahari/Speech/isolatedvtln/vtlnout/0.80/isolatedvtln.alignedtranscripts.1
    

    What is the output of this command?

    Which sphinxtrain version are you using?

    What OS are you using?

     
  • vijayabharadwaj gsr

    Dear Sir,

    I am using sphinx3 snap shot with sphinxbase-0.7, Sphinxtrain version-1.0.7 I
    am working Fedora 15 linux operating system.

    Actually the directory

    /home/lahari/Speech/isolatedvtln/vtlnout/0.80

    was not created in vtlnout.

    vtlnout directory contains following files before it comes out

    -rw-rw-r-- 1 lahari lahari 350244 May 12 16:24 isolatedvtln.aligninput
    -rw-rw-r-- 1 lahari lahari 20831 May 12 16:24 isolatedvtln.falign.dict
    -rw-rw-r-- 1 lahari lahari 27 May 12 16:24 isolatedvtln.falign.fdict
    -rw-rw-r-- 1 lahari lahari 0 May 12 16:25 isolatedvtln.vtlnctl.0.80
    -rw-rw-r-- 1 lahari lahari 0 May 12 16:25 isolatedvtln.vtlnlsn.0.80

     
  • Nickolay V. Shmyrev

    Hello

    This is a bug which was just fixed in Sphixntrain trunk. You need to update.
    The patch to fix it is simple, you can also edit the script yourself:

    Index: slave_align.pl
    ===================================================================
    --- slave_align.pl      (revision 10961)
    +++ slave_align.pl      (working copy)
    @@ -166,6 +166,7 @@
         Log("Phase 4: Running VTLN alignment for warp factor $warp");
         # Build state segmentation directories
         my $wdsegdir = catdir($outdir, $warp);
    +    mkpath($wdsegdir, 0, 0777);
         open INPUT,"${ST::CFG_LISTOFFILES}" or die "Failed to open $ST::CFG_LISTOFFILES: $!";
         my $have_feats = 1;
         my %dirs;
    
     
  • vijayabharadwaj gsr

    Thank you sir. With the above fix, it is working fine.

     
  • vijayabharadwaj gsr

    The HTML shows this information:

    Extracting features with warp factor 0.85
    ../scripts_pl/make_feats.pl Log File
    completed
    Phase 6: Running force alignment in 1 parts
    Force alignment starting: (1 of 1)
    sphinx3_align Log File
    This step had 990 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed
    Phase 7: Updating master control file and transcript
    Phase 4: Running VTLN alignment for warp factor 0.90
    Phase 5: Extracting features with warp factor 0.90
    ../scripts_pl/make_feats.pl Log File
    completed
    Phase 6: Running force alignment in 1 parts
    Force alignment starting: (1 of 1)
    sphinx3_align Log File
    This step had 991 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    completed

    If you go to log file this is the information you will see

    INFO: feat.c(1189): At directory /home/lahari/Speech/largevocvtln/feat
    INFO: feat.c(1006): Reading mfc file: '/home/lahari/Speech/largevocvtln/feat
    /balarama-mohan/13-30-15.1.35.mfc'
    ERROR: "feat.c", line 1096: /home/lahari/Speech/largevocvtln/feat/balarama-
    mohan/13-30-15.1.35.mfc: Maximum output size(15006 frames) < actual

    frames(26438)

    ERROR: "main_align.c", line 908: Utt 13-30-15: Input file read (balarama-
    mohan/13-30-15) with dir (/home/lahari/Speech/largevocvtln/feat) and extension
    (.1.35.mfc) failed
    INFO: corpus.c(662): 13-30-15: 0.0 sec CPU, 0.0 sec Clk; TOT: 117.9 sec CPU,
    124.0 sec Clk

    I doubt the VTLN has not been completed properly because of so many fails. Is
    it true?

    Do I have to reduce the audio size to 1 to 2 minute long? Ofcouse the wiki has
    said 30 sec long is better but it is very difficult to record in that way.

     
  • vijayabharadwaj gsr

    In some cases, final state not reached. So, no allignment.

    INFO: feat.c(1189): At directory /home/lahari/Speech/largevocvtln/feat
    INFO: feat.c(1006): Reading mfc file:
    '/home/lahari/Speech/largevocvtln/feat/vkrishna/1-30-4.1.35.mfc'
    INFO: cmn.c(175): CMN: 12.76 0.27 -0.10 0.01 -0.25 -0.03 -0.31 -0.16 -0.21
    -0.19 -0.16 -0.15 -0.18
    INFO: main_align.c(919): 1-30-4: 9373 input frames
    ERROR: "main_align.c", line 765: Final state not reached; no alignment for
    1-30-4

    INFO: corpus.c(662): 1-30-4: 0.5 sec CPU, 0.5 sec Clk; TOT: 2.3 sec CPU, 2.4
    sec Clk

    What can i do for this case?

     
  • Nickolay V. Shmyrev

    I doubt the VTLN has not been completed properly because of so many fails.
    Is it true?

    Yes. You need to make files shorter.

    In some cases, final state not reached. So, no allignment.

    Fix the transcription or the dictionary. The audio should match the text.

     
  • vijayabharadwaj gsr

    I have gone through sphinx wiki. No where I could find how and where to tell
    the sphinx4 decoder to use vtln trained models . Can you please give me
    pointer for this sir.

     
  • Nickolay V. Shmyrev

    Hello

    Sphinx4 decoder doesn't support VTLN right now. You can change the warp factor
    in frontend properties but there is no currently implemeted functionality to
    find the best VTLN factor. This part requires some work.

     
  • sarinsukumar

    sarinsukumar - 2011-08-10

    IHi,
    I am also trying for VTLN on pocketsphinx, does this implemented in
    pocketsphinx?
    i can find MLLR and MAP, but no VTLN.

    Will it improve gender normalization for pocketsphinx?

    Please advice
    Thanks in advance.

     
  • Nickolay V. Shmyrev

    am also trying for VTLN on pocketsphinx, does this implemented in
    pocketsphinx?

    VTLN estimation is not implemented in pocketsphinx.

    Will it improve gender normalization for pocketsphinx?

    Yes

     
  • sarinsukumar

    sarinsukumar - 2011-08-16

    Hi very thanks for the help.
    I have seen some frequency warping functions and decoding options in
    pocketsphinx and sphinxtrain source . cant i use them for implementing VTLN?

    is it available in Sphinx 4?

     
  • Nickolay V. Shmyrev

    . cant i use them for implementing VTLN?

    yes

    is it available in Sphinx 4?

    not really, but it's very easy to implement such thing in frontend.

     
  • sarinsukumar

    sarinsukumar - 2011-08-18

    can you please give me a pointer to any implementation details?

    Thanks in advance.

     

Log in to post a comment.