Menu

Aligning voxforge corpus for silence detection

Help
2017-12-22
2018-01-04
  • Dino The Dinosaur

    Hello!

    I had troubles upon using either ps_alignment or sphinx3_align: for the first I couldn't work out the location or the usage of the binary, and for the second - I couldn't run it on my data, it informed me that "Final state not reached; no alignment for audio.wav".
    Running the sphinx3_align I tried putting the <sil> filler at the end and the beggining of the phrase, trying to run it on a single phrase, it didn't work :(
    The command is the following:

    sudo sphinx3_align \
        -hmm /home/dino/sphinx/AcousticModels/model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000 \
        -dict /home/dino/sphinx/AcousticModels/etc/msu_ru_nsh.dic \
        -ctl etc/idi \
        -adcin yes \
        -senmgau .cont. \
        -insent etc/prompt \
        -outsent aligned.transcriptions \
        -logfn dev/null \
        -cepdir /media/dino/DATA/corpus/www.repository.voxforge1.org/downloads/Russian/Trunk/Audio/MFCC/8kHz_16bit/MFCC_0_D/1-20121125-pgp \
        -remove_noise no -remove_silence no -upperf 8000 -lowerf 1 -round_filters no -remove_dc yes \
        -fdict /home/dino/sphinx/AcousticModels/etc/msu_ru_nsh.filler
    

    Maybe there was something I missed? I cannot determine that mistake myself.
    here are also my idi

    mfc/ru_0022
    

    and prompt file

    <sil> над этой машиной он ткнул трубкой в сторону лесов работаю давно <sil> (ru_0022)
    
     

    Last edit: Dino The Dinosaur 2017-12-22
    • Nickolay V. Shmyrev

      MFC files you download from voxforge are incompatible with cmusphinx, they are for htk. You need to extract features properly first.

      Also it is better to use more recent models.

       
      • Dino The Dinosaur

        Understood, thank you!
        I proceeded to trying to make feature files from the voxforge audio, but I get a segmentalion error. What could be a possible reason behind it, can you consult me, please?

         
      • Dino The Dinosaur

        Sorry to bother, I defined the arguments as such (for 8000 Hz audio):
        sphinx_fe -i ru_0022.wav -o 1.mfc -upperf 3500 -samprate 8000
        and everything worked.

         
      • Dino The Dinosaur

        As I continued on with the alignment I ran into similar problems, although I have generated the feature files with sphinx_fe tool and using the feat.params of the acoustic model I planned aligning with. This whole situation really confuses me now.
        The terminal output is also strange, mentioning various errors:

        Initialization of the log add table
        Log-Add table size = 29356 x 2 >> 0
        
        INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none'
        Reading Feature Space Transform from: /home/dino/sphinx/cmusphinx-ru-5.2/feature_transform
        Reading HMM in Sphinx 3 Model format
        Model Definition File: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
        Mean File: /home/dino/sphinx/cmusphinx-ru-5.2/means
        Variance File: /home/dino/sphinx/cmusphinx-ru-5.2/variances
        Mixture Weight File: /home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights
        Transition Matrices File: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
        INFO: mdef.c(683): Reading model definition: /home/dino/sphinx/cmusphinx-ru-5.2/mdef
        Initialization of mdef_t, report:
        49 CI-phone, 277118 CD-phone, 3 emitstate/phone, 147 CI-sen, 5147 Sen, 18668 Sen-Seq
        
        INFO: kbcore.c(300): Using optimized GMM computation for Continuous HMM, -topn will be ignored
        INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/means'
        INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
        INFO: cont_mgau.c(167): Reading mixture gaussian file '/home/dino/sphinx/cmusphinx-ru-5.2/variances'
        INFO: cont_mgau.c(428): 5147 mixture Gaussians, 32 components, 1 streams, veclen 36
        INFO: cont_mgau.c(527): Reading mixture weights file '/home/dino/sphinx/cmusphinx-ru-5.2/mixture_weights'
        INFO: cont_mgau.c(682): Read 5147 x 32 mixture weights
        INFO: cont_mgau.c(710): Removing uninitialized Gaussian densities
        INFO: cont_mgau.c(800): Applying variance floor
        INFO: cont_mgau.c(818): 0 variance values floored
        INFO: cont_mgau.c(866): Precomputing Mahalanobis distance invariants
        INFO: tmat.c(120): Reading HMM transition probability matrices: /home/dino/sphinx/cmusphinx-ru-5.2/transition_matrices
        Initialization of tmat_t, report:
        Read 49 transition matrices of size 3x4
        
        INFO: dict.c(385): Reading main dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/ru.dic
        INFO: dict.c(388): 545315 words read
        INFO: dict.c(393): Reading filler dictionary: /home/dino/sphinx/cmusphinx-ru-5.2/noisedict
        INFO: dict.c(396): 3 words read
        INFO: dict.c(429): Added 0 fillers from mdef file
        INFO: s3_align.c(1357): logs3(beam)= -491291
        
        INFO: cmn_live.c(120): Update from <  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
        INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        INFO: main_align.c(916): ru_0022: 68 input frames
        
        ERROR: "main_align.c", line 762: Final state not reached; no alignment for ru_0022
        
            0.01x U    0.01x G    0.01x S    0.00x AEXECTIME:    68 frames,    0.04 sec CPU,   0.06 xRT;    0.04 sec elapsed,   0.06 xRT
        INFO: corpus.c(665): ru_0022:    0.0 sec CPU,    0.1 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk
        
        INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        ERROR: "main_align.c", line 907: Utt ru_0024: Input file read (1-20121125-pgp/wav/ru_0024) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
        INFO: corpus.c(665): ru_0024:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk
        
        INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        ERROR: "main_align.c", line 907: Utt ru_0025: Input file read (1-20121125-pgp/wav/ru_0025) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
        INFO: corpus.c(665): ru_0025:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.0 sec CPU,      0.1 sec Clk
        
        INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        ERROR: "main_align.c", line 907: Utt ru_0027: Input file read (1-20121125-pgp/wav/ru_0027) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
        INFO: corpus.c(665): ru_0027:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk
        
        INFO: cmn_live.c(120): Update from < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        INFO: cmn_live.c(138): Update to   < 18.42 -1.38 -0.42 -0.45 -0.32 -0.29 -0.20 -0.13 -0.09 -0.19 -0.30 -0.23 -0.16 >
        ERROR: "main_align.c", line 907: Utt ru_0030: Input file read (1-20121125-pgp/wav/ru_0030) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed 
        INFO: corpus.c(665): ru_0030:    0.0 sec CPU,    0.0 sec Clk;  TOT:      0.1 sec CPU,      0.1 sec Clk
        
        FATAL: "bio.c", line 616: Failed to open file '/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit/1-20121125-pgp/wav/ru_0031.mfc' for reading: No such file or directory
        

        So the model initialisation seems fine, but it is followed by strange errors, except for the last one - which is a simple file not found error. When I try to align the files with strange "main_align.c" errors, it gives me a simple "final state is not reached" error.

        Could you please notify what are the meanings of those strange errors and what could be the problem with the feature files this time?

        Thanks in advance,
        Olya

         

        Last edit: Dino The Dinosaur 2018-01-04
        • Nickolay V. Shmyrev

          sphinx3_align tool and using the feat.params

          The proper tool for feature extraction is sphinx_fe. It is important to be very accurate, otherwise you'll frequently experience problems like this one.

          ERROR: "main_align.c", line 907: Utt ru_0027: Input file read (1-20121125-pgp/wav/ru_0027) with dir (/media/dino/DATA/corpus/voxforge/repository/downloads/Russian/Trunk/Audio/Main/8kHz_16bit) and extension (.mfc) failed

          Files are missing because you haven't created them properly, you need to revisit the previous step.

           
          • Dino The Dinosaur

            Sorry, it was a typo, I generated feats with sphinx_fe.
            Happy new year, by the way! :)

             
            • Nickolay V. Shmyrev

              Happy New Year! Wish you get through this asap ;)

               
              • Dino The Dinosaur

                Thank you! :)
                I investigated the problem and understood, that this error originally occurs when there is a significant mismatch between the audio and transcript. Besides, I found out that this error might occur when the parameters of the feature files and the parameters of the acoustic model do not match, which seems to be the case. This way I do not really comprehend why I cannot align with this model, since I used its parameters. May there had been something I've missed?

                 

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.