Hi Nickolay
I saw that if in the file I have only a number pronounced I have this problem
while if I have four numbers pronounced I have not the Fatal error and it
decodes perfect
I have controlled the properties of my wave files:
Uncompressed 16-bit PCM audio
Mono
8000 Hz
Have you got any suggestion?
Regards
Marco
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, right now calibration is not stable enough. It can fail on some short
files. It's better to have longer period of silence (1s) in the start of the
file
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
FATAL_ERROR : "continuous.c", line 149: Failed to calibrate voice activity
detection
I am using pocketsphinx_continuous with -infile argument to decode a small set
of 'Urdu' language data from wav files
I have tried tried down sampling my files to 8000 Hz but it still gives error.
These are the files I want to decode:
My observation is that by default silence filtering requires atleast 196
frames for calibration of thresholds. If the file is shorter than that then
there will be a fatal error. When you have uttered only one number then your
file is too short, but with four numbers it is of sufficient length.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I wonder if it's possible give a wav file in input to pocketsphinx_continuous
or sphinx3_continuous to obtain recognition.
If it's possible what are the arguments command line?
Thanks
Regards
Marco
In recent version pocketsphinx_continuous has -infile argument to pass file to
decode.
Hi Nickolay!
thanks for your answer
I have launched this command
./pocketsphinx_continuous -infile
/home/marco/otto/evalita/wav/otto_training/clean1.wav -hmm
/home/marco/RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000
-lm /home/marco/RiconoscimentoVoce/evalita/etc/evalita.lm -dict
/home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic
INFO: cmd_ln.c(512): Parsing command line:
./pocketsphinx_continuous \
-infile /home/marco/otto/evalita/wav/otto_training/clean1.wav \
-hmm /home/marco/RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000 \
-lm /home/marco/RiconoscimentoVoce/evalita/etc/evalita.lm \
-dict /home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic
Current configuration:
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/marco/RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000
-infile /home/marco/otto/evalita/wav/otto_training/clean1.wav
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/marco/RiconoscimentoVoce/evalita/etc/evalita.lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-time no no
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-alpha 0.97 \
-dither yes \
-doublebw no \
-nfilt 40 \
-ncep 13 \
-lowerf 133.33334 \
-upperf 6855.4976 \
-nfft 512 \
-wlen 0.0256 \
-transform legacy \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no yes
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from /home/marco/
RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/feat.params
INFO: fe_interface.c(289): You are using the internal mechanism to generate
the seed.
INFO: feat.c(860): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition: /home/marco/RiconoscimentoVoce/ev
alita/model_parameters/evalita.cd_cont_1000/mdef
INFO: bin_mdef.c(173): Allocating 304 * 8 bytes (2 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices: /home/marco/Ri
conoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/transition_matr
ices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/means
INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/variances
INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 0 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/means
INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/variances
INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 0 variance values floored
INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 492
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/means
INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/variances
INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 0 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights: /home/marco/Riconoscim
entoVoce/evalita/model_parameters/evalita.cd_cont_1000/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 492 senones: 1 features x 8
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(122): The value of topn: 4
INFO: dict.c(306): Allocating 4109 * 20 bytes (80 KiB) for word entries
INFO: dict.c(321): Reading main dictionary:
/home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(324): 10 words read
INFO: dict.c(330): Reading filler dictionary: /home/marco/RiconoscimentoVoce/e
valita/model_parameters/evalita.cd_cont_1000/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 19^3 * 2 bytes (13 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 4408 bytes (4 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 4408 bytes (4 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=120, 3=1168
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 12 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533): 120 = #bigrams created
INFO: ngram_model_arpa.c(534): 116 = #prob2 entries
INFO: ngram_model_arpa.c(542): 29 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555): 1168 = #trigrams created
INFO: ngram_model_arpa.c(556): 747 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 9 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 147
INFO: ngram_search_fwdtree.c(338): after: 9 root, 19 non-root channels, 3
single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(373): ./pocketsphinx_continuous COMPILED ON: Dec 11 2010,
AT: 15:42:11
FATAL_ERROR: "continuous.c", line 149: Failed to calibrate voice activity
detection
I have this error
What Can I do?
Regards
Marco
Hi Nickolay
I saw that if in the file I have only a number pronounced I have this problem
while if I have four numbers pronounced I have not the Fatal error and it
decodes perfect
I have controlled the properties of my wave files:
Uncompressed 16-bit PCM audio
Mono
8000 Hz
Have you got any suggestion?
Regards
Marco
Yes, right now calibration is not stable enough. It can fail on some short
files. It's better to have longer period of silence (1s) in the start of the
file
I am also getting this error:
FATAL_ERROR : "continuous.c", line 149: Failed to calibrate voice activity
detection
I am using pocketsphinx_continuous with -infile argument to decode a small set
of 'Urdu' language data from wav files
I have tried tried down sampling my files to 8000 Hz but it still gives error.
These are the files I want to decode:
http://www.mediafire.com/file/wzvgzqn67reut1a/test.zip
live decoding with pocketsphinx is working fine
Hi marcu6,
My observation is that by default silence filtering requires atleast 196
frames for calibration of thresholds. If the file is shorter than that then
there will be a fatal error. When you have uttered only one number then your
file is too short, but with four numbers it is of sufficient length.