Menu

Help on Sphinx3 Training and Decoding

Help
2016-03-06
2016-03-20
  • Senjam Shantirani

    Query 1: PLEASE COMMENT IF MY TRAING IS FINE BASE ON THE POINTS a) & b) MENTIONED BELOW.

    a) Training ended like shown below,


    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Normalization for iteration: 7
    Current Overall Likelihood Per Frame = 1.92801139707547
    Split Gaussians, increase by 0
    Training for 8 Gaussian(s) completed after 7 iterations
    MODULE: 90 deleted interpolation
    Skipped for continuous models
    MODULE: 99 Convert to Sphinx2 format models
    Can not create models used by Sphinx-II.
    If you intend to create models to use with Sphinx-II models, please rerun with:
    $ST::CFG_HMM_TYPE = '.semi.' or
    $ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and $ST::CFG_STATESPERHMM = '5'
    root@shanti-Satellite-C650:/home/PhoneModel/workspace#


    b) I got 20 folders in my model_parameters, each having the files mean, variance, transition_matrices and mixture weight.

    timit.cd_cont_1000
    timit.cd_cont_1000_1
    timit.cd_cont_1000_2
    timit.cd_cont_1000_4
    timit.cd_cont_1000_8
    timit.cd_cont_initial
    timit.cd_cont_untied
    timit.ci_cont
    timit.ci_cont_1
    timit.ci_cont_2
    timit.ci_cont_4
    timit.ci_cont_8
    timit.ci_cont_flatinitial
    timit.ci_cont_initial
    timit.falign_ci_cont
    timit.falign_ci_cont_1
    timit.falign_ci_cont_2
    timit.falign_ci_cont_4
    timit.falign_ci_cont_8
    timit.falign_ci_cont_initial

    Query 2: WITH THE ABOVE TRAINED DATA, I TRIED TO DO DECODING, BUT NOT SURE WHAT TO MODIFY IN THE

    This variables, used by the decoder, have to be user defined, and
    may affect the decoder output

    $DEC_CFG_LANGUAGEMODEL_DIR = "$DEC_CFG_BASE_DIR/etc";
    $DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/timit.lm.DMP"; (WHAT TO WRITE HERE)
    $DEC_CFG_LANGUAGEWEIGHT = "10";
    $DEC_CFG_BEAMWIDTH = "1e-80";
    $DEC_CFG_WORDBEAM = "1e-40";

    But when I ran the command "perl scripts_pl/decode/slave.pl ", the error came like shown below:

    root@shanti-Satellite-C650:/home/PhoneModel/workspace# perl scripts_pl/decode/slave.pl
    MODULE: DECODE Decoding using models previously trained
    Decoding 4620 segments starting at 0 (part 1 of 1)
    0%
    This step had 2 ERROR messages and 7 WARNING messages. Please check the log file for details.
    Aligning results to find error rate
    Can't open /home/PhoneModel/workspace/result/timit-1-1.match
    word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line 172.

    In the logdir/decode, the error reported is:

    INFO: cont_mgau.c(510): Reading mixture weights file '/home/PhoneModel/workspace/model_parameters/timit.cd_cont_1000/mixture_weights'
    ERROR: "cont_mgau.c", line 653: Weight normalization failed for 12 senones
    INFO: cont_mgau.c(665): Read 1132 x 15 mixture weights
    INFO: cont_mgau.c(693): Removing uninitialized Gaussian densities
    9 10 11 48 49 50 79 90 91 92 263 264 265 510 511 512 790 844 845 846
    WARNING: "cont_mgau.c", line 767: 464 densities removed (20 mixtures removed entirely)
    INFO: cont_mgau.c(783): Applying variance floor
    INFO: cont_mgau.c(801): 613 variance values floored
    INFO: cont_mgau.c(849): Precomputing Mahalanobis distance invariants
    INFO: tmat.c(169): Reading HMM transition probability matrices: /home/PhoneModel/workspace/model_parameters/timit.cd_cont_1000/transition_matrices
    WARNING: "tmat.c", line 242: Normalization failed for tmat 16 from state 0
    WARNING: "tmat.c", line 242: Normalization failed for tmat 16 from state 1
    WARNING: "tmat.c", line 242: Normalization failed for tmat 16 from state 2
    WARNING: "tmat.c", line 242: Normalization failed for tmat 30 from state 0
    WARNING: "tmat.c", line 242: Normalization failed for tmat 30 from state 1
    WARNING: "tmat.c", line 242: Normalization failed for tmat 30 from state 2
    INFO: Initialization of tmat_t, report:
    INFO: Read 44 transition matrices of size 3x4
    INFO:
    INFO: dict.c(475): Reading main dictionary: /home/PhoneModel/workspace/etc/timit.dic
    INFO: dict.c(478): 133011 words read
    INFO: dict.c(483): Reading filler dictionary: /home/PhoneModel/workspace/etc/timit.filler
    INFO: dict.c(486): 3 words read
    INFO: Initialization of dict_t, report:
    INFO: No of CI phone: 0
    INFO: Max word: 137110
    INFO: No of word: 133014
    INFO:
    INFO: lm.c(606): LM read('/home/PhoneModel/workspace/etc/timit.lm.DMP', lw= 10.00, wip= 0.20, uw= 0.70)
    INFO: lm.c(608): Reading LM file /home/PhoneModel/workspace/etc/timit.lm.DMP (LM name "default")
    SYSTEM_ERROR: "lm_3g_dmp.c", line 1270: fopen(/home/PhoneModel/workspace/etc/timit.lm.DMP,rb) failed
    ; No such file or directory
    Sun Mar 6 21:00:32 2016

    I cant find the file /result/timit-1-1.match in the result folder.

    Thanking you,
    Shanti

     
  • Senjam Shantirani

    After digging further, I learn the issue is absence if Language Model, I created one uisng the LMTOOL

    http://www.speech.cs.cmu.edu/tools/lmtool-new.html

    and modified the sphinx_decode.cfg with the new LM:
    $DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/timit.lm";

    Then, execution gave rise to the generation of timit-1-1.match in the result folder, in the main working folder and finally got the WER and SER.

    But I have a query if the LM I created is using Online Link mentioned above is okay to use for all cases.

    Regards,
    Shanti

     
    • Nickolay V. Shmyrev

      But I have a query if the LM I created is using Online Link mentioned above is okay to use for all cases.

      It is fine.

       
  • Senjam Shantirani

    I guess my training is a sucessful one as per the terminal output and the files obtained in my model_parameter folder. Please Comment.

    And thank you for your assistance.

     

    Last edit: Senjam Shantirani 2016-03-09

Log in to post a comment.