CMU Sphinx / Forums / Help: Aligning voxforge corpus for silence detection

As I started the alignment with sphinx3_align I ran into some problems, although I have generated the feature files with sphinx_fe tool and using the feat.params of the acoustic model I planned aligning with.
The terminal output is strange, mentioning various errors:

Initialization of the log add table
Log-Add table size = 29356 x 2 >> 0

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none'
Reading Feature Space Transform from: /home/dino/sphinx/cmusphinx-ru-5.2/feature_transform
Reading HMM in Sphinx 3 Model format
Model Definition File: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
Mean File: /home/dino/sphinx/cmusphinx-ru-5.2/means
Variance File: /home/dino/sphinx/cmusphinx-ru-5.2/variances
Mixture Weight File: /home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights
Transition Matrices File: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
INFO: mdef.c(683): Reading model definition: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
Initialization of mdef_t, report:
49 CI-phone, 277118 CD-phone, 3 emitstate/phone, 147 CI-sen, 5147 Sen, 18668 Sen-Seq

INFO: kbcore.c(300): Using optimized GMM computation for Continuous HMM, -topn will be ignored
INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/means'
INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/variances'
INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
INFO: cont_mgau.c(527): Reading mixture weights file '/home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights'
INFO: cont_mgau.c(682): Read 5147 x 32 mixture weights
INFO: cont_mgau.c(710): Removing uninitialized Gaussian densities
INFO: cont_mgau.c(800): Applying variance floor
INFO: cont_mgau.c(818): 0 variance values floored
INFO: cont_mgau.c(866): Precomputing Mahalanobis distance invariants
INFO: tmat.c(120): Reading HMM transition probability matrices: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
Initialization of tmat_t, report:
Read 49 transition matrices of size 3x4

INFO: dict.c(385): Reading main dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/ru.dic
INFO: dict.c(388): 545315 words read
INFO: dict.c(393): Reading filler dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/noisedict
INFO: dict.c(396): 3 words read
INFO: dict.c(429): Added 0 fillers from mdef file
INFO: s3_align.c(1357): logs3(beam)= -491291

INFO: cmn_live.c(120): Update from <  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: main_align.c(916): ru_0022: 68 input frames

ERROR: "main_align.c", line 762: Final state not reached; no alignment for ru_0022

    0.01x U    0.01x G    0.01x S    0.00x AEXECTIME:    68 frames,    0.04 sec CPU,   0.06 xRT;    0.04 sec elapsed,   0.06 xRT
INFO: corpus.c(665): ru_0022:    0.0 sec CPU,    0.1 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0024: Input file read (1-20121125-pgp/wav/ru_0024) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0024:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0025: Input file read (1-20121125-pgp/wav/ru_0025) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0025:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0027: Input file read (1-20121125-pgp/wav/ru_0027) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0027:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0030: Input file read (1-20121125-pgp/wav/ru_0030) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0030:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk

So the model initialisation seems fine, but it is followed by strange errors. When I try to align those files separately (with strange "main_align.c" errors), it gives me a simple "final state is not reached" error.

Could you please notify what are the meanings of those strange errors and what could be the problem with the feature files this time?

I investigated the problem and understood, that this error ("final state not reached") originally occurs when there is a significant mismatch between the audio and transcript. Besides, I found out that this error might occur when the parameters of the feature files and the parameters of the acoustic model do not match, which seems to be the case. This way I do not really comprehend why I cannot align with this model, since I used its parameters. May there had been something I've missed?

Concerning the "ERROR: "main_align.c", line 907: Utt: Input file read with dir and extension failed" - I could not find any information describing the issue.

I can provide all data, if necessary.

Thanks in advance,
Olya

Last edit: Dino The Dinosaur 2018-01-19

Aligning voxforge corpus for silence detection

Speech Recognition Toolkit

Forums

Help

Aligning voxforge corpus for silence detection document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Aligning voxforge corpus for silence detection