Menu

Force-aligned transcripts

Help
2011-05-17
2012-09-22
  • Rafael Oliveira

    Rafael Oliveira - 2011-05-17

    I'm trying to train a acoustic model with force-aligned transcripts but, I'm
    confused about the steps should I take to do it. In the sphinx_train.cfg there
    are some properties that refers to this task but, the descriptions provided in
    this file were not sufficient to make me understand what should i do. For
    example, i can not realize the difference between the properties
    falign_ci_mgau and ci_mgau. In short, could you suggest some tutorial,
    documentation or information to take me off from the darkness?

     
  • Nickolay V. Shmyrev

    falign_ci_mgau enables multiple gaussian models on forced alignment stage 10
    and 11. ci_mgau enables multiple gaussian models on stage 20. That is needed
    if you want to tran CI multiple gaussian models for small vocabulary task.

    To understand what is gaussian and what are multiple gaussians you can check
    any book on HMM-based speech recognition.

     
  • Rafael Oliveira

    Rafael Oliveira - 2011-05-23

    Thanks, I read about what you suggested me and now I am trying to train a
    model with forced aligned transcripts but I've got a error on the
    03.force_align step.

    This is the last INFO given in the log:

    INFO: main_align.c(919): codDef-art001a: 2360 input frames
    lt-sphinx3_align: s3_align.c:927: align_build_sent_hmm: Assertion `stail.predlist' failed.
    

    but, it shows a error before:

    ERROR: "main_align.c", line 851: Uttid mismatch: ctlfile = "codDef-art001a"; transcript = "CodDefConsumidor16k/codDef-art001a"
    

    The codDef-art001a is the first transcription in my train.fileids file, and
    that is not the first time I'm using it in a train, so to the others programs
    that use it there is not any problem with it. Well, the fact is: I do not know
    what is happening.

    Here is entire log:

    INFO:   Initialization of the log add table
    INFO:   Log-Add table size = 29356 x 2 >> 0
    INFO:   
    INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO:   Reading HMM in Sphinx 3 Model format
    INFO:   Model Definition File: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_architecture/db.falign_ci.mdef
    INFO:   Mean File: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/means
    INFO:   Variance File: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/variances
    INFO:   Mixture Weight File: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/mixture_weights
    INFO:   Transition Matrices File: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/transition_matrices
    INFO: mdef.c(683): Reading model definition: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_architecture/db.falign_ci.mdef
    INFO:   Initialization of mdef_t, report:
    INFO:   38 CI-phone, 0 CD-phone, 3 emitstate/phone, 114 CI-sen, 114 Sen, 38 Sen-Seq
    INFO:   
    INFO: kbcore.c(299): Using optimized GMM computation for Continuous HMM, -topn will be ignored
    INFO: cont_mgau.c(164): Reading mixture gaussian file '/home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/means'
    INFO: cont_mgau.c(423): 114 mixture Gaussians, 1 components, 1 streams, veclen 39
    INFO: cont_mgau.c(164): Reading mixture gaussian file '/home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/variances'
    INFO: cont_mgau.c(423): 114 mixture Gaussians, 1 components, 1 streams, veclen 39
    INFO: cont_mgau.c(524): Reading mixture weights file '/home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/mixture_weights'
    WARNING: "cont_mgau.c", line 667: Weight normalization failed for 3 senones
    INFO: cont_mgau.c(679): Read 114 x 1 mixture weights
    INFO: cont_mgau.c(707): Removing uninitialized Gaussian densities
     0 1 2
    WARNING: "cont_mgau.c", line 781: 3 densities removed (3 mixtures removed entirely)
    INFO: cont_mgau.c(797): Applying variance floor
    INFO: cont_mgau.c(815): 0 variance values floored
    INFO: cont_mgau.c(863): Precomputing Mahalanobis distance invariants
    INFO: tmat.c(119): Reading HMM transition probability matrices: /home/20102000030/sphinx/acoustic_model/acoustic_model/model_parameters/db.falign_ci_cont/transition_matrices
    WARNING: "tmat.c", line 192: Normalization failed for tmat 0 from state 0
    WARNING: "tmat.c", line 192: Normalization failed for tmat 0 from state 1
    WARNING: "tmat.c", line 192: Normalization failed for tmat 0 from state 2
    INFO:   Initialization of tmat_t, report:
    INFO:   Read 38 transition matrices of size 3x4
    INFO:   
    INFO: dict.c(385): Reading main dictionary: /home/20102000030/sphinx/acoustic_model/acoustic_model/falignout/db.falign.dict
    INFO: dict.c(388): 12874 words read
    INFO: dict.c(393): Reading filler dictionary: /home/20102000030/sphinx/acoustic_model/acoustic_model/falignout/db.falign.fdict
    INFO: dict.c(396): 3 words read
    INFO: dict.c(429): Added 0 fillers from mdef file
    INFO: s3_align.c(1357): logs3(beam)= -460586
    ERROR: "main_align.c", line 851: Uttid mismatch: ctlfile = "codDef-art001a"; transcript = "CodDefConsumidor16k/codDef-art001a"
    INFO: feat.c(1176): At directory /home/20102000030/sphinx/acoustic_model/acoustic_model/feat
    INFO: feat.c(993): Reading mfc file: '/home/20102000030/sphinx/acoustic_model/acoustic_model/feat/CodDefConsumidor16k/codDef-art001a.mfc'[0..-1]
    INFO: cmn.c(175): CMN: 11.56 -0.08  0.05  0.25 -0.17 -0.08 -0.11 -0.31 -0.18 -0.29 -0.12 -0.19 -0.18 
    INFO: main_align.c(919): codDef-art001a: 2360 input frames
    lt-sphinx3_align: s3_align.c:927: align_build_sent_hmm: Assertion `stail.predlist' failed.
    

    Thanks

     
  • Nickolay V. Shmyrev

    The aligner fails to build HMM sequence for your transcription.

    Aligner might be not that robust to newlines, spaces and other characters. For
    example to UTF-8 BOM symbols and so on. Maybe you want to check the
    transcription file and the dictionary to find out if there are any issues like
    that.

     
  • Rafael Oliveira

    Rafael Oliveira - 2011-05-25

    Thanks for the replay, I will check my transcriptions.

     

Log in to post a comment.