Menu

ERROR: "main_align.c", line 850: Uttid mismatch: ctlfile

Help
2017-08-22
2017-08-22
  • Maria del Mar Martinez Sanchez

    Hi,

    I am training a PTM model with VoxForge 8KHz speech data (41 hours) for Spanish language, see the configuration below.
    And I am getting this kind error for all of the data files at the stage 11 (force align):

    db2.5.falign.log:ERROR: "main_align.c", line 850: Uttid mismatch: ctlfile = "es-0042"; transcript = "jigdominguez-20100602-rxf/wav/es-0042"
    

    Searching in this forum I found a response "there is a mistake in the feature extraction" but I am not using any customized feature extraction software, but the sphinx-5prealpha one.
    Maybe the 1s_c_d_dd configuration is not suitable and I should use the default s2_4x instead?.

    Thanks in advance,
    Mar

    +++++++++++++++++++++++++
    sphinx_train.cfg
    +++++++++++++++++++++++++
    
    $CFG_VERBOSE = 1;       # Determines how much goes to the screen.
    $CFG_DB_NAME = "db1";
    $CFG_EXPTNAME = "$CFG_DB_NAME";
    $CFG_BASE_DIR = "/Users/mar/entrenamiento/db1";
    $CFG_SPHINXTRAIN_DIR = "/usr/local/lib/sphinxtrain";
    $CFG_BIN_DIR = "/usr/local/libexec/sphinxtrain";
    $CFG_SCRIPT_DIR = "/usr/local/lib/sphinxtrain/scripts";
    $CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
    $CFG_WAVFILE_EXTENSION = 'wav';
    $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
    $CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
    $CFG_FEATFILE_EXTENSION = 'mfc';
    $CFG_WAVFILE_SRATE = 8000.0;
    $CFG_NUM_FILT = 15; # For wideband speech it's 25, for telephone 8khz reasonable value is 15
    $CFG_LO_FILT = 200; # For telephone 8kHz speech value is 200
    $CFG_HI_FILT = 3500; # For telephone 8kHz speech value is 3500
    $CFG_TRANSFORM = "dct"; # Previously legacy transform is used, but dct is more accurate
    $CFG_LIFTER = "22"; # Cepstrum lifter is smoothing to improve recognition
    $CFG_VECTOR_LENGTH = 13; # 13 is usually enough
    $CFG_MIN_ITERATIONS = 1;  # BW Iterate at least this many times
    $CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely wrong.
    $CFG_AGC = 'none';
    $CFG_CMN = 'batch';
    $CFG_VARNORM = 'no';
    $CFG_FULLVAR = 'no';
    $CFG_DIAGFULL = 'no';
    $CFG_VTLN = 'no';
    $CFG_VTLN_START = 0.80;
    $CFG_VTLN_END = 1.40;
    $CFG_VTLN_STEP = 0.05;
    $CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";
    $CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";
    $CFG_BWACCUM_DIR = "$CFG_BASE_DIR/bwaccumdir";
    $CFG_MODEL_DIR = "$CFG_BASE_DIR/model_parameters";
    $CFG_LIST_DIR = "$CFG_BASE_DIR/etc";
    $CFG_LANGUAGEWEIGHT = "11.5";
    $CFG_BEAMWIDTH      = "1e-100";
    $CFG_WORDBEAM       = "1e-80";
    $CFG_LANGUAGEMODEL  = "$CFG_LIST_DIR/$CFG_DB_NAME.lm.bin";
    $CFG_WORDPENALTY    = "0.2";
    $CFG_ABEAM              = "1e-50";
    $CFG_NBEAM              = "1e-10";
    $CFG_PRUNED_DENLAT_DIR  = "$CFG_BASE_DIR/pruned_denlat";
    $CFG_MMIE = "no";
    $CFG_MMIE_MAX_ITERATIONS = 5;
    $CFG_LATTICE_DIR = "$CFG_BASE_DIR/lattice";
    $CFG_MMIE_TYPE   = "rand"; # Valid values are "rand", "best" or "ci"
    $CFG_MMIE_CONSTE = "3.0";
    $CFG_NUMLAT_DIR  = "$CFG_BASE_DIR/numlat";
    $CFG_DENLAT_DIR  = "$CFG_BASE_DIR/denlat";
    $CFG_DICTIONARY     = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
    $CFG_RAWPHONEFILE   = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
    $CFG_FILLERDICT     = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
    $CFG_LISTOFFILES    = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
    $CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";
    $CFG_FEATPARAMS     = "$CFG_LIST_DIR/feat.params";
    $CFG_HMM_TYPE  = '.ptm.'; # PocketSphinx (larger data sets)
    $CFG_DIRLABEL = 'ptm';
    $CFG_FEATURE = "1s_c_d_dd";
    $CFG_NUM_STREAMS = 3;
    $CFG_SVSPEC = "0-12/13-25/26-38";
    $CFG_INITIAL_NUM_DENSITIES = 128;
    $CFG_FINAL_NUM_DENSITIES = 128;
    $CFG_STATESPERHMM = 3;
    $CFG_SKIPSTATE = 'no';
    $CFG_FALIGN_CI_MGAU = 'no';
    $CFG_CI_MGAU = 'no';
    $CFG_CD_TRAIN = 'yes';
    $CFG_N_TIED_STATES = 3000;
    $CFG_NPART = 10;
    $CFG_CROSS_PHONE_TREES = 'no';
    $CFG_FORCEDALIGN = 'yes';
    $CFG_FORCE_ALIGN_MODELDIR = "$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";
    $CFG_FORCE_ALIGN_BEAM = 1e-60;
    $CFG_LDA_MLLT = 'no';
    $CFG_LDA_DIMENSION = 29;
    $CFG_CONVERGENCE_RATIO = 0.1;
    $CFG_QUEUE_TYPE = "Queue";
    $CFG_QUEUE_NAME = "workq";
    $CFG_MAKE_QUESTS = "yes";
    $CFG_QUESTION_SET = "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";
    $CFG_CP_OPERATION = "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";
    $CFG_G2P_MODEL= 'no';
    $DEC_CFG_VERBOSE = 1;       # Determines how much goes to the screen.
    $DEC_CFG_SCRIPT = 'psdecode.pl';
    $DEC_CFG_EXPTNAME = "$CFG_EXPTNAME";
    $DEC_CFG_JOBNAME  = "$CFG_EXPTNAME"."_job";
    $DEC_CFG_MODEL_NAME = "$CFG_EXPTNAME.cd_${CFG_DIRLABEL}_${CFG_N_TIED_STATES}";
    $DEC_CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
    $DEC_CFG_FEATFILE_EXTENSION = '.mfc';
    $DEC_CFG_AGC = $CFG_AGC;
    $DEC_CFG_CMN = $CFG_CMN;
    $DEC_CFG_VARNORM = $CFG_VARNORM;
    $DEC_CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";
    $DEC_CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";
    $DEC_CFG_MODEL_DIR = "$CFG_MODEL_DIR";
    $DEC_CFG_DICTIONARY     = "$CFG_BASE_DIR/etc/$CFG_DB_NAME.dic";
    $DEC_CFG_FILLERDICT     = "$CFG_BASE_DIR/etc/$CFG_DB_NAME.filler";
    $DEC_CFG_LISTOFFILES    = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}_test.fileids";
    $DEC_CFG_TRANSCRIPTFILE = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}_test.transcription";
    $DEC_CFG_RESULT_DIR     = "$CFG_BASE_DIR/result";
    $DEC_CFG_PRESULT_DIR     = "$CFG_BASE_DIR/presult";
    $DEC_CFG_LANGUAGEMODEL  = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}.lm.bin";
    $DEC_CFG_LANGUAGEWEIGHT = "10";
    $DEC_CFG_BEAMWIDTH = "1e-80";
    $DEC_CFG_WORDBEAM = "1e-40";
    $DEC_CFG_WORDPENALTY = "0.65";
    $DEC_CFG_ALIGN = "builtin";
    $DEC_CFG_NPART = 10;        #  Define how many pieces to split decode in
    $CFG_DONE = 1;
    
     

    Last edit: Maria del Mar Martinez Sanchez 2017-08-22
    • Nickolay V. Shmyrev

      And I am getting this kind error for all of the data files at the stage 11 (force align):
      db2.5.falign.log:ERROR: "main_align.c", line 850: Uttid mismatch: ctlfile = "es-0042"; transcript = "jigdominguez-20100602-rxf/wav/es-0042"

      You can ignore this error, it is not critical. Otherwise you need to change the data preparation script so that the utterance id will be es-0042 in the transcription file, not full path as jigdominguez-20100602-rxf/wav/es-0042

      Searching in this forum I found a response "there is a mistake in the feature extraction" but I am not using any customized feature extraction software, but the sphinx-5prealpha one.
      Maybe the 1s_c_d_dd configuration is not suitable and I should use the default s2_4x instead?.

      This is irrelevant, you misunderstood.

       
  • Maria del Mar Martinez Sanchez

    Thanks a lot for the response Nickolay!
    I changed de data preparation and now it works ok.

    Only two small doubts:

    1. The samples are organised this way:
      speaker1/file_1
      speaker1/file_2
      speaker2/file_3
      ...
      if accidentaly in the same "speakerN" directory are wav files mixed from different speakers, what is the effect in the training?

    2. I want to try VLTN training, I would like to confirm that the pocketsphinx decoder will take it into account, because I have not found any explicit option to inform pocketsphinx about this circumstance (and I have the same concern about other training options that requires special decoding treatment)

    Thanks again.
    Mar

     

    Last edit: Maria del Mar Martinez Sanchez 2017-09-07
    • Nickolay V. Shmyrev

      if accidentaly in the same "speakerN" directory are wav files mixed from different speakers, what is the effect in the training?

      There is no effect right now

      I want to try VLTN training, I would like to confirm that the pocketsphinx decoder will take it into account, because I have not found any explicit option to inform pocketsphinx about this circumstance (and I have the same concern about other training options that requires special decoding treatment)

      vtln is not supported in pocketsphinx decoder, if you want vtln, you'd better try kaldi.

       
  • Maria del Mar Martinez Sanchez

    Thanks a lot Nickolay, I will take into consideration your advices.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.