Menu

kmeans.c Empty cluster

Help
rangolit
2011-11-01
2012-09-22
  • rangolit

    rangolit - 2011-11-01

    I need to create a simple model of 10 words. When I run the script
    scripts_pl/RunAll.pl
    I get this output to the screen:

    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
        Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
            Found 6 words using 10 phones
        Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
        Phase 3: CTL - Check general format; utterance length (must be positive); files exist
        Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
        Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
            Estimated Total Hours Training: 0.000875
            This is a small amount of data, no comment at this time
        Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
            Words in dictionary: 3
            Words in filler dictionary: 3
        Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    Feature type is s2_4x which is 4 streams
    LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
    Skipping LDA training
    Feature type is s2_4x which is 4 streams
    LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
    Skipping MLLT training
    MODULE: 05 Vector Quantization
    This step had 2 ERROR messages and 8044 WARNING messages.  Please check the log file for details.
    MODULE: 10 Training Context Independent models for forced alignment and VTLN
    Skipped:  $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    Skipped:  $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
    MODULE: 11 Force-aligning transcripts
    Skipped:  $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    MODULE: 12 Force-aligning data for VTLN
    Skipped:  $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
    MODULE: 20 Training Context Independent models
        Phase 1: Cleaning up directories:
        accumulator...logs...qmanager...models...
        Phase 2: Flat initialize
        Phase 3: Forward-Backward
            Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
            0% 100% 
    This step had 18 ERROR messages and 27 WARNING messages.  Please check the log file for details.
    Training failed in iteration 1
    Something failed: (/home/zentarim/voice/scripts_pl/20.ci_hmm/slave_convg.pl)
    
    
    
    
    
    zentarim@Sphinx:~/voice$ cd logdir/
    zentarim@Sphinx:~/voice/logdir$ ls
    05.vector_quantize  20.ci_hmm
    zentarim@Sphinx:~/voice/logdir$ cd 05.vector_quantize/
    zentarim@Sphinx:~/voice/logdir/05.vector_quantize$ ls
    voice.kmeans.log  voice.vq.agg_seg.log
    

    File voice.kmeans.log contains :

    ...
    INFO: feat.c(684): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: main.c(520): No mdef files.  Assuming 1-class init
    INFO: main.c(1352): 1-class dump file
    INFO: main.c(1390): Corpus 0: sz==315 frames
    INFO: main.c(1399): Convergence ratios are abs(cur - prior) / abs(prior)
    INFO: main.c(231): alloc'ing 0Mb obs buf
    INFO: main.c(577): Initializing means using random k-means
    INFO: main.c(580): Trial 0: 256 means
    INFO: kmeans.c(153): km iter [0] 1.000000e+00 ...
    WARNING: "kmeans.c", line 431: Empty cluster 1
    WARNING: "kmeans.c", line 431: Empty cluster 35
    WARNING: "kmeans.c", line 431: Empty cluster 54
    WARNING: "kmeans.c", line 431: Empty cluster 59
    ...
    INFO: main.c(613):  -> Aborting k-means, bad initialization
    INFO: kmeans.c(153): km iter [0] 1.000000e+00 ...
    ...
    
    WARNING: "kmeans.c", line 431: Empty cluster 250
    WARNING: "kmeans.c", line 431: Empty cluster 251
    WARNING: "kmeans.c", line 431: Empty cluster 253
    INFO: main.c(613):  -> Aborting k-means, bad initialization
    INFO: main.c(622):  best-so-far sqerr = -1.000000e+00
    ERROR: "main.c", line 841: Too few observations for kmeans
    ERROR: "main.c", line 1408: Unable to do k-means for state 0; skipping...
    INFO: s3gau_io.c(226): Wrote /home/zentarim/voice/model_parameters/voice.ci_semi_flatinitial/means [1x4x256 array]
    INFO: s3gau_io.c(226): Wrote /home/zentarim/voice/model_parameters/voice.ci_semi_flatinitial/variances [1x4x256 array]
    INFO: main.c(1509): No mixing weight file given; none written
    INFO: main.c(1669): TOTALS: km 0.046x 1.692e+00 var 0.000x 0.000e+00 em 0.000x 0.000e+00 all 0.047x 1.653e+00
    

    My etc/sphinx_train.cfg contains:

    # Configuration script for sphinx trainer                  -*-mode:Perl-*-
    
    $CFG_VERBOSE = 1;       # Determines how much goes to the screen.
    
    # These are filled in at configuration time
    $CFG_DB_NAME = "voice";
    $CFG_BASE_DIR = "/home/zentarim/voice";
    $CFG_SPHINXTRAIN_DIR = "/home/zentarim/sphinxtrain-1.0.7";
    
    # Directory containing SphinxTrain binaries
    $CFG_BIN_DIR = "$CFG_BASE_DIR/bin";
    $CFG_GIF_DIR = "$CFG_BASE_DIR/gifs";
    $CFG_SCRIPT_DIR = "$CFG_BASE_DIR/scripts_pl";
    
    # Experiment name, will be used to name model files and log files
    $CFG_EXPTNAME = "$CFG_DB_NAME";
    
    # Audio waveform and feature file information
    $CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
    $CFG_WAVFILE_EXTENSION = 'wav';
    $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
    $CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
    $CFG_FEATFILE_EXTENSION = 'mfc';
    $CFG_VECTOR_LENGTH = 13;
    
    $CFG_MIN_ITERATIONS = 1;  # BW Iterate at least this many times
    $CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely wrong.
    
    # (none/max) Type of AGC to apply to input files
    $CFG_AGC = 'none';
    # (current/none) Type of cepstral mean subtraction/normalization
    # to apply to input files
    $CFG_CMN = 'current';
    # (yes/no) Normalize variance of input files to 1.0
    $CFG_VARNORM = 'no';
    # (yes/no) Use letter-to-sound rules to guess pronunciations of
    # unknown words (English, 40-phone specific)
    $CFG_LTSOOV = 'no';
    # (yes/no) Train full covariance matrices
    $CFG_FULLVAR = 'no';
    # (yes/no) Use diagonals only of full covariance matrices for
    # Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)
    $CFG_DIAGFULL = 'no';
    
    # (yes/no) Perform vocal tract length normalization in training.  This
    # will result in a "normalized" model which requires VTLN to be done
    # during decoding as well.
    $CFG_VTLN = 'no';
    # Starting warp factor for VTLN
    $CFG_VTLN_START = 0.80;
    # Ending warp factor for VTLN
    $CFG_VTLN_END = 1.40;
    # Step size of warping factors
    $CFG_VTLN_STEP = 0.05;
    
    # Directory to write queue manager logs to
    $CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";
    # Directory to write training logs to
    $CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";
    # Directory for re-estimation counts
    $CFG_BWACCUM_DIR = "$CFG_BASE_DIR/bwaccumdir";
    # Directory to write model parameter files to
    $CFG_MODEL_DIR = "$CFG_BASE_DIR/model_parameters";
    
    # Directory containing transcripts and control files for
    # speaker-adaptive training
    $CFG_LIST_DIR = "$CFG_BASE_DIR/etc";
    
    # Decoding variables for MMIE training
    $CFG_LANGUAGEWEIGHT = "11.5";
    $CFG_BEAMWIDTH      = "1e-100";
    $CFG_WORDBEAM       = "1e-80";
    $CFG_LANGUAGEMODEL  = "$CFG_LIST_DIR/$CFG_DB_NAME.lm.DMP";
    $CFG_WORDPENALTY    = "0.2";
    
    # Lattice pruning variables
    $CFG_ABEAM              = "1e-50";
    $CFG_NBEAM              = "1e-10";
    $CFG_PRUNED_DENLAT_DIR  = "$CFG_BASE_DIR/pruned_denlat";
    
    # MMIE training related variables
    $CFG_MMIE = "no";
    $CFG_MMIE_MAX_ITERATIONS = 5;
    $CFG_LATTICE_DIR = "$CFG_BASE_DIR/lattice";
    $CFG_MMIE_TYPE   = "rand"; # Valid values are "rand", "best" or "ci"
    $CFG_MMIE_CONSTE = "3.0";
    $CFG_NUMLAT_DIR  = "$CFG_BASE_DIR/numlat";
    $CFG_DENLAT_DIR  = "$CFG_BASE_DIR/denlat";
    
    # Variables used in main training of models
    $CFG_DICTIONARY     = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
    $CFG_RAWPHONEFILE   = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
    $CFG_FILLERDICT     = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
    $CFG_LISTOFFILES    = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
    $CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";
    $CFG_FEATPARAMS     = "$CFG_LIST_DIR/feat.params";
    
    # Variables used in characterizing models
    
    #$CFG_HMM_TYPE = '.cont.'; # Sphinx III
    $CFG_HMM_TYPE  = '.semi.'; # PocketSphinx and Sphinx II
    #$CFG_HMM_TYPE  = '.ptm.'; # PocketSphinx (larger data sets)
    
    if (($CFG_HMM_TYPE ne ".semi.")
        and ($CFG_HMM_TYPE ne ".ptm.")
        and ($CFG_HMM_TYPE ne ".cont.")) {
      die "Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', " .
        "currently $CFG_HMM_TYPE\n";
    }
    
    # This configuration is fastest and best for most acoustic models in
    # PocketSphinx and Sphinx-III.  See below for Sphinx-II.
    $CFG_STATESPERHMM = 3;
    $CFG_SKIPSTATE = 'no';
    
    if ($CFG_HMM_TYPE eq '.semi.') {
      $CFG_DIRLABEL = 'semi';
    # Four stream features for PocketSphinx
      $CFG_FEATURE = "s2_4x";
      $CFG_NUM_STREAMS = 4;
      $CFG_INITIAL_NUM_DENSITIES = 256;
      $CFG_FINAL_NUM_DENSITIES = 256;
      die "For semi continuous models, the initial and final models have the same density" 
        if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
    } elsif ($CFG_HMM_TYPE eq '.ptm.') {
      $CFG_DIRLABEL = 'ptm';
    # Four stream features for PocketSphinx
      $CFG_FEATURE = "s2_4x";
      $CFG_NUM_STREAMS = 4;
      $CFG_INITIAL_NUM_DENSITIES = 64;
      $CFG_FINAL_NUM_DENSITIES = 64;
      die "For phonetically tied models, the initial and final models have the same density" 
        if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
    } elsif ($CFG_HMM_TYPE eq '.cont.') {
      $CFG_DIRLABEL = 'cont';
    # Single stream features - Sphinx 3
      $CFG_FEATURE = "1s_c_d_dd";
      $CFG_NUM_STREAMS = 1;
      $CFG_INITIAL_NUM_DENSITIES = 1;
      $CFG_FINAL_NUM_DENSITIES = 8;
      die "The initial has to be less than the final number of densities" 
        if ($CFG_INITIAL_NUM_DENSITIES > $CFG_FINAL_NUM_DENSITIES);
    }
    
    # Number of top gaussians to score a frame. A little bit less accurate computations
    # make training significantly faster. Uncomment to apply this during the training
    # For good accuracy make sure you are using the same setting in decoder
    # In theory this can be different for various training stages. For example 4 for
    # CI stage and 16 for CD stage
    # $CFG_CI_NTOP = 4;
    # $CFG_CD_NTOP = 16;
    
    # (yes/no) Train multiple-gaussian context-independent models (useful
    # for alignment, use 'no' otherwise) in the models created
    # specifically for forced alignment
    $CFG_FALIGN_CI_MGAU = 'no';
    # (yes/no) Train multiple-gaussian context-independent models (useful
    # for alignment, use 'no' otherwise)
    $CFG_CI_MGAU = 'no';
    # Number of tied states (senones) to create in decision-tree clustering
    $CFG_N_TIED_STATES = 200;
    # How many parts to run Forward-Backward estimatinon in
    $CFG_NPART = 1;
    
    # (yes/no) Train a single decision tree for all phones (actually one
    # per state) (useful for grapheme-based models, use 'no' otherwise)
    $CFG_CROSS_PHONE_TREES = 'no';
    
    # Use force-aligned transcripts (if available) as input to training
    $CFG_FORCEDALIGN = 'no';
    
    # Use a specific set of models for force alignment.  If not defined,
    # context-independent models for the current experiment will be used.
    $CFG_FORCE_ALIGN_MDEF = "$CFG_BASE_DIR/model_architecture/$CFG_EXPTNAME.falign_ci.mdef";
    $CFG_FORCE_ALIGN_MODELDIR = "$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";
    
    # Use a specific dictionary and filler dictionary for force alignment.
    # If these are not defined, a dictionary and filler dictionary will be
    # created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words
    # removed from the filler dictionary and added to the dictionary (this
    # is because the force alignment is not very good at inserting them)
    
    # $CFG_FORCE_ALIGN_DICTIONARY = "$ST::CFG_BASE_DIR/falignout$ST::CFG_EXPTNAME.falign.dict";;
    # $CFG_FORCE_ALIGN_FILLERDICT = "$ST::CFG_BASE_DIR/falignout/$ST::CFG_EXPTNAME.falign.fdict";;
    
    # Use a particular beam width for force alignment.  The wider
    # (i.e. smaller numerically) the beam, the fewer sentences will be
    # rejected for bad alignment.
    $CFG_FORCE_ALIGN_BEAM = 1e-60;
    
    # Calculate an LDA/MLLT transform?
    $CFG_LDA_MLLT = 'no';
    # Dimensionality of LDA/MLLT output
    $CFG_LDA_DIMENSION = 29;
    
    # This is actually just a difference in log space (it doesn't make
    # sense otherwise, because different feature parameters have very
    # different likelihoods)
    $CFG_CONVERGENCE_RATIO = 0.1;
    
    # Queue::POSIX for multiple CPUs on a local machine
    # Queue::PBS to use a PBS/TORQUE queue
    $CFG_QUEUE_TYPE = "Queue";
    
    # Name of queue to use for PBS/TORQUE
    $CFG_QUEUE_NAME = "workq";
    
    # (yes/no) Build questions for decision tree clustering automatically
    $CFG_MAKE_QUESTS = "yes";
    # If CFG_MAKE_QUESTS is yes, questions are written to this file.
    # If CFG_MAKE_QUESTS is no, questions are read from this file.
    $CFG_QUESTION_SET = "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";
    #$CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";
    
    $CFG_CP_OPERATION = "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";
    
    # This variable has to be defined, otherwise utils.pl will not load.
    $CFG_DONE = 1;
    
    return 1;
    

    my etc/feat.params contains:

    -samprate 8000.0
    -nfilt 31
    -lowerf 200.00
    -upperf 3500.00
    -dither yes
    

    the wav files have parameters:
    8kHz,16 Bit, Mono
    duration of each record about 0.5 sec (one word)

    Where I was wrong? I can provide more information.
    Thanks in advance for your answer.

     
  • Nickolay V. Shmyrev

    You need more training data. Please read the tutorial

    http://cmusphinx.sourceforge.net/wiki/tutorialam

    You want to create an acoustic model for new language/dialect
    OR you need specialized model for small vocabulary application
    AND you have plenty of data to train on:
    1 hour of recording for command and control for single speaker
    5 hour of recordings of 200 speakers for command and control for many speakers
    10 hours of recordings for single speaker dictation
    50 hours of recordings of 200 speakers for many speakers dictation
    AND you have knowledge on phonetic structure of the language
    AND you have time to train the model and optimize parameters (1 month)

     
  • rangolit

    rangolit - 2011-11-01

    You need more training data. Please read the tutorial

    Thanks for the answer, nshmyrev
    Maybe you advise me what to do.I need to recognize 10 words. Numbers only. In
    Russian. Recording time will be less than one hour.

     
  • Nickolay V. Shmyrev

    I need to recognize 10 words. Numbers only. In Russian.

    You can use existing model.

     

Log in to post a comment.