Menu

Pocketsphinx_continuous Input file

Help
Marco
2010-12-22
2012-09-22
  • Marco

    Marco - 2010-12-22

    Hi!

    I wonder if it's possible give a wav file in input to pocketsphinx_continuous
    or sphinx3_continuous to obtain recognition.

    If it's possible what are the arguments command line?

    Thanks

    Regards

    Marco

     
  • Nickolay V. Shmyrev

    In recent version pocketsphinx_continuous has -infile argument to pass file to
    decode.

     
  • Marco

    Marco - 2010-12-22

    Hi Nickolay!

    thanks for your answer

    I have launched this command

    ./pocketsphinx_continuous -infile
    /home/marco/otto/evalita/wav/otto_training/clean1.wav -hmm
    /home/marco/RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000
    -lm /home/marco/RiconoscimentoVoce/evalita/etc/evalita.lm -dict
    /home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic

    INFO: cmd_ln.c(512): Parsing command line:

    ./pocketsphinx_continuous \

    -infile /home/marco/otto/evalita/wav/otto_training/clean1.wav \

    -hmm /home/marco/RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000 \

    -lm /home/marco/RiconoscimentoVoce/evalita/etc/evalita.lm \

    -dict /home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic

    Current configuration:

    -adcdev

    -agc none none

    -agcthresh 2.0 2.000000e+00

    -alpha 0.97 9.700000e-01

    -argfile

    -ascale 20.0 2.000000e+01

    -aw 1 1

    -backtrace no no

    -beam 1e-48 1.000000e-48

    -bestpath yes yes

    -bestpathlw 9.5 9.500000e+00

    -bghist no no

    -ceplen 13 13

    -cmn current current

    -cmninit 8.0 8.0

    -compallsen no no

    -debug 0

    -dict /home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic

    -dictcase no no

    -dither no no

    -doublebw no no

    -ds 1 1

    -fdict

    -feat 1s_c_d_dd 1s_c_d_dd

    -featparams

    -fillprob 1e-8 1.000000e-08

    -frate 100 100

    -fsg

    -fsgusealtpron yes yes

    -fsgusefiller yes yes

    -fwdflat yes yes

    -fwdflatbeam 1e-64 1.000000e-64

    -fwdflatefwid 4 4

    -fwdflatlw 8.5 8.500000e+00

    -fwdflatsfwin 25 25

    -fwdflatwbeam 7e-29 7.000000e-29

    -fwdtree yes yes

    -hmm /home/marco/RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000

    -infile /home/marco/otto/evalita/wav/otto_training/clean1.wav

    -input_endian little little

    -jsgf

    -kdmaxbbi -1 -1

    -kdmaxdepth 0 0

    -kdtree

    -latsize 5000 5000

    -lda

    -ldadim 0 0

    -lextreedump 0 0

    -lifter 0 0

    -lm /home/marco/RiconoscimentoVoce/evalita/etc/evalita.lm

    -lmctl

    -lmname default default

    -logbase 1.0001 1.000100e+00

    -logfn

    -logspec no no

    -lowerf 133.33334 1.333333e+02

    -lpbeam 1e-40 1.000000e-40

    -lponlybeam 7e-29 7.000000e-29

    -lw 6.5 6.500000e+00

    -maxhmmpf -1 -1

    -maxnewoov 20 20

    -maxwpf -1 -1

    -mdef

    -mean

    -mfclogdir

    -min_endfr 0 0

    -mixw

    -mixwfloor 0.0000001 1.000000e-07

    -mllr

    -mmap yes yes

    -ncep 13 13

    -nfft 512 512

    -nfilt 40 40

    -nwpen 1.0 1.000000e+00

    -pbeam 1e-48 1.000000e-48

    -pip 1.0 1.000000e+00

    -pl_beam 1e-10 1.000000e-10

    -pl_pbeam 1e-5 1.000000e-05

    -pl_window 0 0

    -rawlogdir

    -remove_dc no no

    -round_filters yes yes

    -samprate 16000 1.600000e+04

    -seed -1 -1

    -sendump

    -senlogdir

    -senmgau

    -silprob 0.005 5.000000e-03

    -smoothspec no no

    -svspec

    -time no no

    -tmat

    -tmatfloor 0.0001 1.000000e-04

    -topn 4 4

    -topn_beam 0 0

    -toprule

    -transform legacy legacy

    -unit_area yes yes

    -upperf 6855.4976 6.855498e+03

    -usewdphones no no

    -uw 1.0 1.000000e+00

    -var

    -varfloor 0.0001 1.000000e-04

    -varnorm no no

    -verbose no no

    -warp_params

    -warp_type inverse_linear inverse_linear

    -wbeam 7e-29 7.000000e-29

    -wip 0.65 6.500000e-01

    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(512): Parsing command line:

    \

    -alpha 0.97 \

    -dither yes \

    -doublebw no \

    -nfilt 40 \

    -ncep 13 \

    -lowerf 133.33334 \

    -upperf 6855.4976 \

    -nfft 512 \

    -wlen 0.0256 \

    -transform legacy \

    -feat 1s_c_d_dd \

    -agc none \

    -cmn current \

    -varnorm no

    Current configuration:

    -agc none none

    -agcthresh 2.0 2.000000e+00

    -alpha 0.97 9.700000e-01

    -ceplen 13 13

    -cmn current current

    -cmninit 8.0 8.0

    -dither no yes

    -doublebw no no

    -feat 1s_c_d_dd 1s_c_d_dd

    -frate 100 100

    -input_endian little little

    -lda

    -ldadim 0 0

    -lifter 0 0

    -logspec no no

    -lowerf 133.33334 1.333333e+02

    -ncep 13 13

    -nfft 512 512

    -nfilt 40 40

    -remove_dc no no

    -round_filters yes yes

    -samprate 16000 1.600000e+04

    -seed -1 -1

    -smoothspec no no

    -svspec

    -transform legacy legacy

    -unit_area yes yes

    -upperf 6855.4976 6.855498e+03

    -varnorm no no

    -verbose no no

    -warp_params

    -warp_type inverse_linear inverse_linear

    -wlen 0.025625 2.560000e-02

    INFO: acmod.c(238): Parsed model-specific feature parameters from /home/marco/
    RiconoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/feat.params

    INFO: fe_interface.c(289): You are using the internal mechanism to generate
    the seed.

    INFO: feat.c(860): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'

    INFO: cmn.c(142): mean= 12.00, mean= 0.0

    INFO: mdef.c(520): Reading model definition: /home/marco/RiconoscimentoVoce/ev
    alita/model_parameters/evalita.cd_cont_1000/mdef

    INFO: bin_mdef.c(173): Allocating 304 * 8 bytes (2 KiB) for CD tree

    INFO: tmat.c(205): Reading HMM transition probability matrices: /home/marco/Ri
    conoscimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/transition_matr
    ices

    INFO: acmod.c(117): Attempting to use SCHMM computation module

    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
    scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/means

    INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:

    INFO: ms_gauden.c(294): 8x39

    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
    scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/variances

    INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:

    INFO: ms_gauden.c(294): 8x39

    INFO: ms_gauden.c(354): 0 variance values floored

    INFO: acmod.c(119): Attempting to use PTHMM computation module

    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
    scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/means

    INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:

    INFO: ms_gauden.c(294): 8x39

    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
    scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/variances

    INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:

    INFO: ms_gauden.c(294): 8x39

    INFO: ms_gauden.c(354): 0 variance values floored

    INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 492

    INFO: acmod.c(121): Falling back to general multi-stream GMM computation

    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
    scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/means

    INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:

    INFO: ms_gauden.c(294): 8x39

    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/marco/Ricono
    scimentoVoce/evalita/model_parameters/evalita.cd_cont_1000/variances

    INFO: ms_gauden.c(292): 492 codebook, 1 feature, size:

    INFO: ms_gauden.c(294): 8x39

    INFO: ms_gauden.c(354): 0 variance values floored

    INFO: ms_senone.c(160): Reading senone mixture weights: /home/marco/Riconoscim
    entoVoce/evalita/model_parameters/evalita.cd_cont_1000/mixture_weights

    INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits

    INFO: ms_senone.c(218): Not transposing mixture weights in memory

    INFO: ms_senone.c(277): Read mixture weights for 492 senones: 1 features x 8
    codewords

    INFO: ms_senone.c(331): Mapping senones to individual codebooks

    INFO: ms_mgau.c(122): The value of topn: 4

    INFO: dict.c(306): Allocating 4109 * 20 bytes (80 KiB) for word entries

    INFO: dict.c(321): Reading main dictionary:
    /home/marco/RiconoscimentoVoce/evalita/etc/evalita_5.dic

    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones

    INFO: dict.c(324): 10 words read

    INFO: dict.c(330): Reading filler dictionary: /home/marco/RiconoscimentoVoce/e
    valita/model_parameters/evalita.cd_cont_1000/noisedict

    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones

    INFO: dict.c(333): 3 words read

    INFO: dict2pid.c(396): Building PID tables for dictionary

    INFO: dict2pid.c(404): Allocating 19^3 * 2 bytes (13 KiB) for word-initial
    triphones

    INFO: dict2pid.c(131): Allocated 4408 bytes (4 KiB) for word-final triphones

    INFO: dict2pid.c(195): Allocated 4408 bytes (4 KiB) for single-phone word
    triphones

    INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=120, 3=1168

    INFO: ngram_model_arpa.c(135): Reading unigrams

    INFO: ngram_model_arpa.c(516): 12 = #unigrams created

    INFO: ngram_model_arpa.c(195): Reading bigrams

    INFO: ngram_model_arpa.c(533): 120 = #bigrams created

    INFO: ngram_model_arpa.c(534): 116 = #prob2 entries

    INFO: ngram_model_arpa.c(542): 29 = #bo_wt2 entries

    INFO: ngram_model_arpa.c(292): Reading trigrams

    INFO: ngram_model_arpa.c(555): 1168 = #trigrams created

    INFO: ngram_model_arpa.c(556): 747 = #prob3 entries

    INFO: ngram_search_fwdtree.c(99): 9 unique initial diphones

    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
    words

    INFO: ngram_search_fwdtree.c(186): Creating search tree

    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
    single-phone words

    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 147

    INFO: ngram_search_fwdtree.c(338): after: 9 root, 19 non-root channels, 3
    single-phone words

    INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25

    INFO: continuous.c(373): ./pocketsphinx_continuous COMPILED ON: Dec 11 2010,
    AT: 15:42:11

    FATAL_ERROR: "continuous.c", line 149: Failed to calibrate voice activity
    detection

    I have this error

    What Can I do?

    Regards

    Marco

     
  • Marco

    Marco - 2010-12-22

    Hi Nickolay
    I saw that if in the file I have only a number pronounced I have this problem
    while if I have four numbers pronounced I have not the Fatal error and it
    decodes perfect

    I have controlled the properties of my wave files:

    Uncompressed 16-bit PCM audio

    Mono

    8000 Hz

    Have you got any suggestion?

    Regards

    Marco

     
  • Nickolay V. Shmyrev

    Yes, right now calibration is not stable enough. It can fail on some short
    files. It's better to have longer period of silence (1s) in the start of the
    file

     
  • shqali

    shqali - 2011-07-11

    I am also getting this error:

    FATAL_ERROR : "continuous.c", line 149: Failed to calibrate voice activity
    detection

    I am using pocketsphinx_continuous with -infile argument to decode a small set
    of 'Urdu' language data from wav files
    I have tried tried down sampling my files to 8000 Hz but it still gives error.
    These are the files I want to decode:

    http://www.mediafire.com/file/wzvgzqn67reut1a/test.zip

    live decoding with pocketsphinx is working fine

     
  • Pankaj

    Pankaj - 2011-07-11

    Hi marcu6,

    My observation is that by default silence filtering requires atleast 196
    frames for calibration of thresholds. If the file is shorter than that then
    there will be a fatal error. When you have uttered only one number then your
    file is too short, but with four numbers it is of sufficient length.

     

Log in to post a comment.