CMU Sphinx / Forums / Help: Aligning voxforge corpus for silence detection

Hello!

I had troubles upon using either ps_alignment or sphinx3_align: for the first I couldn't work out the location or the usage of the binary, and for the second - I couldn't run it on my data, it informed me that "Final state not reached; no alignment for audio.wav".
Running the sphinx3_align I tried putting the <sil> filler at the end and the beggining of the phrase, trying to run it on a single phrase, it didn't work :(
The command is the following:</sil>

sudo sphinx3_align \
    -hmm /home/dino/sphinx/AcousticModels/model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000 \
    -dict /home/dino/sphinx/AcousticModels/etc/msu_ru_nsh.dic \
    -ctl etc/idi \
    -adcin yes \
    -senmgau .cont. \
    -insent etc/prompt \
    -outsent aligned.transcriptions \
    -logfn dev/null \
    -cepdir /media/dino/DATA/corpus/www.repository.voxforge1.org/downloads/Russian/Trunk/Audio/MFCC/8kHz_16bit/MFCC_0_D/1-20121125-pgp \
    -remove_noise no -remove_silence no -upperf 8000 -lowerf 1 -round_filters no -remove_dc yes \
    -fdict /home/dino/sphinx/AcousticModels/etc/msu_ru_nsh.filler

Maybe there was something I missed? I cannot determine that mistake myself.
here are also my idi

mfc/ru_0022

and prompt file

<sil> над этой машиной он ткнул трубкой в сторону лесов работаю давно <sil> (ru_0022)

Last edit: Dino The Dinosaur 2017-12-22

MFC files you download from voxforge are incompatible with cmusphinx, they are for htk. You need to extract features properly first.

Also it is better to use more recent models.

Dino The Dinosaur - 2017-12-30

Understood, thank you!
I proceeded to trying to make feature files from the voxforge audio, but I get a segmentalion error. What could be a possible reason behind it, can you consult me, please?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dino The Dinosaur - 2017-12-30

Sorry to bother, I defined the arguments as such (for 8000 Hz audio):
sphinx_fe -i ru_0022.wav -o 1.mfc -upperf 3500 -samprate 8000
and everything worked.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

As I continued on with the alignment I ran into similar problems, although I have generated the feature files with sphinx_fe tool and using the feat.params of the acoustic model I planned aligning with. This whole situation really confuses me now.
The terminal output is also strange, mentioning various errors:

Initialization of the log add table
Log-Add table size = 29356 x 2 >> 0

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none'
Reading Feature Space Transform from: /home/dino/sphinx/cmusphinx-ru-5.2/feature_transform
Reading HMM in Sphinx 3 Model format
Model Definition File: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
Mean File: /home/dino/sphinx/cmusphinx-ru-5.2/means
Variance File: /home/dino/sphinx/cmusphinx-ru-5.2/variances
Mixture Weight File: /home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights
Transition Matrices File: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
INFO: mdef.c(683): Reading model definition: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
Initialization of mdef_t, report:
49 CI-phone, 277118 CD-phone, 3 emitstate/phone, 147 CI-sen, 5147 Sen, 18668 Sen-Seq

INFO: kbcore.c(300): Using optimized GMM computation for Continuous HMM, -topn will be ignored
INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/means'
INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/variances'
INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
INFO: cont_mgau.c(527): Reading mixture weights file '/home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights'
INFO: cont_mgau.c(682): Read 5147 x 32 mixture weights
INFO: cont_mgau.c(710): Removing uninitialized Gaussian densities
INFO: cont_mgau.c(800): Applying variance floor
INFO: cont_mgau.c(818): 0 variance values floored
INFO: cont_mgau.c(866): Precomputing Mahalanobis distance invariants
INFO: tmat.c(120): Reading HMM transition probability matrices: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
Initialization of tmat_t, report:
Read 49 transition matrices of size 3x4

INFO: dict.c(385): Reading main dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/ru.dic
INFO: dict.c(388): 545315 words read
INFO: dict.c(393): Reading filler dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/noisedict
INFO: dict.c(396): 3 words read
INFO: dict.c(429): Added 0 fillers from mdef file
INFO: s3_align.c(1357): logs3(beam)= -491291

INFO: cmn_live.c(120): Update from <  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: main_align.c(916): ru_0022: 68 input frames

ERROR: "main_align.c", line 762: Final state not reached; no alignment for ru_0022

    0.01x U    0.01x G    0.01x S    0.00x AEXECTIME:    68 frames,    0.04 sec CPU,   0.06 xRT;    0.04 sec elapsed,   0.06 xRT
INFO: corpus.c(665): ru_0022:    0.0 sec CPU,    0.1 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0024: Input file read (1-20121125-pgp/wav/ru_0024) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0024:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0025: Input file read (1-20121125-pgp/wav/ru_0025) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0025:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0027: Input file read (1-20121125-pgp/wav/ru_0027) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0027:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk

INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
ERROR: "main_align.c", line 907: Utt ru_0030: Input file read (1-20121125-pgp/wav/ru_0030) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
INFO: corpus.c(665): ru_0030:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk

FATAL: "bio.c", line 616: Failed to open file '/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit/1-20121125-pgp/wav/ru_0031.mfc' for reading: No such file or directory

So the model initialisation seems fine, but it is followed by strange errors, except for the last one - which is a simple file not found error. When I try to align the files with strange "main_align.c" errors, it gives me a simple "final state is not reached" error.

Could you please notify what are the meanings of those strange errors and what could be the problem with the feature files this time?

Thanks in advance,
Olya

Last edit: Dino The Dinosaur 2018-01-04

Nickolay V. Shmyrev - 2018-01-01

sphinx3_align tool and using the feat.params

The proper tool for feature extraction is sphinx_fe. It is important to be very accurate, otherwise you'll frequently experience problems like this one.

ERROR: "main_align.c", line 907: Utt ru_0027: Input file read (1-20121125-pgp/wav/ru_0027) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed

Files are missing because you haven't created them properly, you need to revisit the previous step.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dino The Dinosaur - 2018-01-02
  
  Sorry, it was a typo, I generated feats with sphinx_fe.
  Happy new year, by the way! :)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2018-01-02
    
    Happy New Year! Wish you get through this asap ;)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Dino The Dinosaur - 2018-01-04
      
      Thank you! :)
      I investigated the problem and understood, that this error originally occurs when there is a significant mismatch between the audio and transcript. Besides, I found out that this error might occur when the parameters of the feature files and the parameters of the acoustic model do not match, which seems to be the case. This way I do not really comprehend why I cannot align with this model, since I used its parameters. May there had been something I've missed?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aligning voxforge corpus for silence detection

Speech Recognition Toolkit

Forums

Help

Aligning voxforge corpus for silence detection document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Aligning voxforge corpus for silence detection