Thank you!
I have updated both programs. It works.
Just a question with this version. In order to process file.wav, recorded with 48000 sample rate, I have to convert it to sample rate 16000. Otherwise I get the error ERROR: "fe_interface.c", line 105: FFT: Number of points must be greater or equal to frame size (1230 samples).
Is it the expected behaviour? So I must record several talks with a sample rate 16000 instead of 48000? Or is it expected that new versions accept samplrate 48000?
Last edit: emilio torres 2017-02-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Sir,
I am trying to use the Spanish language, but I get an error when using the es-20k.lm.gz (2016-07-16) file. I
run this instruction:
pocketsphinx_continuous -infile file.wav -hmm ~/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000 -lm ~/audiolibros/cmusphinx/spanish/es-20k.lm.gz -dict ~/audiolibros/cmusphinx/spanish/es.dict 2> pocketsphinx.log > file2.txt
and I get the following log (see below).
Are you so kind as to check if the Spanish model is fine? Thank you
Best regards
Emilio
INFO: cmd_ln.c(691): Parsing command line:
pocketsphinx_continuous \ -infile file.wav \ -hmm /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000 \ -lm /home/emilio/audiolibros/cmusphinx/spanish/es-20k.lm.gz \ -dict /home/emilio/audiolibros/cmusphinx/spanish/es.dict
Current configuration:
[NAME] [DEFLT] [VALUE]
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/emilio/audiolibros/cmusphinx/spanish/es.dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000
-infile file.wav
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/emilio/audiolibros/cmusphinx/spanish/es-20k.lm.gz
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-time no no
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\ -lowerf 130 \ -upperf 6800 \ -nfilt 25 \ -transform dct \ -lifter 22 \ -feat 1s_c_d_dd \ -svspec 0-12/13-25/26-38 \ -agc none \ -cmn current \ -varnorm no \ -model ptm
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 22
-logspec no no
-lowerf 133.33334 1.300000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/mdef
INFO: bin_mdef.c(179): Allocating 28277 * 8 bytes (220 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/means
INFO: ms_gauden.c(292): 26 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/variances
INFO: ms_gauden.c(292): 26 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 105 variance values floored
INFO: acmod.c(123): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/means
INFO: ms_gauden.c(292): 26 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/variances
INFO: ms_gauden.c(292): 26 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 105 variance values floored
INFO: ptm_mgau.c(467): Loading senones from dump file /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/sendump
INFO: ptm_mgau.c(491): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(554): Rows: 128, Columns: 4078
INFO: ptm_mgau.c(586): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(826): Maximum top-N: 4
INFO: dict.c(317): Allocating 27597 * 20 bytes (539 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /home/emilio/audiolibros/cmusphinx/spanish/es.dict
INFO: dict.c(211): Allocated 190 KiB for strings, 366 KiB for phones
INFO: dict.c(335): 23498 words read
INFO: dict.c(341): Reading filler dictionary: /home/emilio/audiolibros/cmusphinx/spanish/cmusphinx-es-5.2/model_parameters/voxforge_es_sphinx.cd_ptm_4000/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 26^3 * 2 bytes (34 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 8216 bytes (8 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 8216 bytes (8 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=23500, 2=1463799, 3=1109989
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 23500 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
......................INFO: ngram_model_arpa.c(533): 1463799 = #bigrams created
INFO: ngram_model_arpa.c(534): 48439 = #prob2 entries
INFO: ngram_model_arpa.c(542): 11175 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
ERROR: "ngram_model_arpa.c", line 351: Trigrams not in bigram order
ERROR: "pio.c", line 161: Compressed file operation for mode rb is not supportedERROR: "ngram_model_dmp.c", line 106: Dump file /home/emilio/audiolibros/cmusphinx/spanish/es-20k.lm.gz not found
ERROR: "ngram_search.c", line 208: Failed to read language model file: /home/emilio/audiolibros/cmusphinx/spanish/es-20k.lm.gz
Update sphinxbase and pocketsphinx from github.
Thank you!
I have updated both programs. It works.
Just a question with this version. In order to process file.wav, recorded with 48000 sample rate, I have to convert it to sample rate 16000. Otherwise I get the error ERROR: "fe_interface.c", line 105: FFT: Number of points must be greater or equal to frame size (1230 samples).
Is it the expected behaviour? So I must record several talks with a sample rate 16000 instead of 48000? Or is it expected that new versions accept samplrate 48000?
Last edit: emilio torres 2017-02-07
It is better to convert. You can also process 48khz file without conversion with
-infile file48.wav -nfft 2048 -samprate 48000
.Dear Nickilay V. Shmyrev,
Thank you for your continued support.
Best regards
Emilio