Menu

Aligning voxforge corpus for silence detection

Help
2018-01-19
2018-01-25
  • Dino The Dinosaur

    As I started the alignment with sphinx3_align I ran into some problems, although I have generated the feature files with sphinx_fe tool and using the feat.params of the acoustic model I planned aligning with.
    The terminal output is strange, mentioning various errors:

    Initialization of the log add table
    Log-Add table size = 29356 x 2 >> 0
    
    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none'
    Reading Feature Space Transform from: /home/dino/sphinx/cmusphinx-ru-5.2/feature_transform
    Reading HMM in Sphinx 3 Model format
    Model Definition File: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
    Mean File: /home/dino/sphinx/cmusphinx-ru-5.2/means
    Variance File: /home/dino/sphinx/cmusphinx-ru-5.2/variances
    Mixture Weight File: /home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights
    Transition Matrices File: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
    INFO: mdef.c(683): Reading model definition: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
    Initialization of mdef_t, report:
    49 CI-phone, 277118 CD-phone, 3 emitstate/phone, 147 CI-sen, 5147 Sen, 18668 Sen-Seq
    
    INFO: kbcore.c(300): Using optimized GMM computation for Continuous HMM, -topn will be ignored
    INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/means'
    INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
    INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/variances'
    INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
    INFO: cont_mgau.c(527): Reading mixture weights file '/home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights'
    INFO: cont_mgau.c(682): Read 5147 x 32 mixture weights
    INFO: cont_mgau.c(710): Removing uninitialized Gaussian densities
    INFO: cont_mgau.c(800): Applying variance floor
    INFO: cont_mgau.c(818): 0 variance values floored
    INFO: cont_mgau.c(866): Precomputing Mahalanobis distance invariants
    INFO: tmat.c(120): Reading HMM transition probability matrices: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
    Initialization of tmat_t, report:
    Read 49 transition matrices of size 3x4
    
    INFO: dict.c(385): Reading main dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/ru.dic
    INFO: dict.c(388): 545315 words read
    INFO: dict.c(393): Reading filler dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/noisedict
    INFO: dict.c(396): 3 words read
    INFO: dict.c(429): Added 0 fillers from mdef file
    INFO: s3_align.c(1357): logs3(beam)= -491291
    
    INFO: cmn_live.c(120): Update from <  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
    INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    INFO: main_align.c(916): ru_0022: 68 input frames
    
    ERROR: "main_align.c", line 762: Final state not reached; no alignment for ru_0022
    
        0.01x U    0.01x G    0.01x S    0.00x AEXECTIME:    68 frames,    0.04 sec CPU,   0.06 xRT;    0.04 sec elapsed,   0.06 xRT
    INFO: corpus.c(665): ru_0022:    0.0 sec CPU,    0.1 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk
    
    INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    ERROR: "main_align.c", line 907: Utt ru_0024: Input file read (1-20121125-pgp/wav/ru_0024) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
    INFO: corpus.c(665): ru_0024:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk
    
    INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    ERROR: "main_align.c", line 907: Utt ru_0025: Input file read (1-20121125-pgp/wav/ru_0025) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
    INFO: corpus.c(665): ru_0025:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk
    
    INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    ERROR: "main_align.c", line 907: Utt ru_0027: Input file read (1-20121125-pgp/wav/ru_0027) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
    INFO: corpus.c(665): ru_0027:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk
    
    INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
    ERROR: "main_align.c", line 907: Utt ru_0030: Input file read (1-20121125-pgp/wav/ru_0030) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
    INFO: corpus.c(665): ru_0030:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk
    

    So the model initialisation seems fine, but it is followed by strange errors. When I try to align those files separately (with strange "main_align.c" errors), it gives me a simple "final state is not reached" error.

    Could you please notify what are the meanings of those strange errors and what could be the problem with the feature files this time?

    I investigated the problem and understood, that this error ("final state not reached") originally occurs when there is a significant mismatch between the audio and transcript. Besides, I found out that this error might occur when the parameters of the feature files and the parameters of the acoustic model do not match, which seems to be the case. This way I do not really comprehend why I cannot align with this model, since I used its parameters. May there had been something I've missed?

    Concerning the "ERROR: "main_align.c", line 907: Utt: Input file read with dir and extension failed" - I could not find any information describing the issue.

    I can provide all data, if necessary.

    Thanks in advance,
    Olya

     

    Last edit: Dino The Dinosaur 2018-01-19
    • Nickolay V. Shmyrev

      Files are missing because they were not created properly.

      You can not align files for the same reason because 68 frames in ru_0022 means the file length is only 0.68s which is too small, something went wrong. You most likely used wrong file as input.

      You'd better investigate feature extraction with sphinx_fe more closely (command line, log output, input file structure, output file structure).

       
      • Dino The Dinosaur

        Okay, thank you, I will look into it!

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.