Hi, so I get this error when i try to make pocketsphinx recognize an 8000 sample rate audiofile ERROR: "continuous.c", line 136: Input audio file has sample rate [8000], but decoder expects [16000]
FATAL: "continuous.c", line 165: Failed to process file '/home/andreas/Documents/Taledatabase/wav/soundfile_1.wav' due to format mismatch.
]
although i trained my acoustic model for audiofiles with 8000 sample rate... so why do I get this error?
Here is the output from my terminal window:
andreas@andreas-MS-7817:~$ pocketsphinx_continuous -hmm /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200 -lm /home/andreas/Documents/Taledatabase/etc/tesLm.lm.DMP -dict /home/andreas/Documents/Taledatabase/etc/test.dic -infile /home/andreas/Documents/Taledatabase/wav/soundfile_1.wavINFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/andreas/Documents/Taledatabase/etc/test.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/noisedict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/feat.params
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm /home/andreas/Documents/Taledatabase/etc/tesLm.lm.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 2.000000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mdef
-mean /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
-mfclogdir
-min_endfr 0 0
-mixw /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mixture_weights
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 15
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/transition_matrices
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 3.500000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(518): Reading model definition: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mdef
INFO: bin_mdef.c(181): Allocating 146535 * 8 bytes (1144 KiB) for CD tree
INFO: tmat.c(206): Reading HMM transition probability matrices: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
INFO: ms_gauden.c(292): 374 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
INFO: ms_gauden.c(292): 374 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 80 variance values floored
INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 374
INFO: acmod.c(119): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
INFO: ms_gauden.c(292): 374 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
INFO: ms_gauden.c(292): 374 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 80 variance values floored
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/means
INFO: ms_gauden.c(292): 374 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/variances
INFO: ms_gauden.c(292): 374 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 80 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 374 senones: 1 features x 8 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 60406 * 32 bytes (1887 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/andreas/Documents/Taledatabase/etc/test.dic
INFO: dict.c(213): Allocated 500 KiB for strings, 819 KiB for phones
INFO: dict.c(336): 56302 words read
INFO: dict.c(358): Reading filler dictionary: /home/andreas/Documents/Taledatabase/model_parameters/test.cd_cont_200/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 8 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 58^3 * 2 bytes (381 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 81200 bytes (79 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 81200 bytes (79 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(467): Header doesn't match
INFO: ngram_model_trie.c(189): Trying to read LM in arpa format
INFO: ngram_model_trie.c(205): LM of order 3
INFO: ngram_model_trie.c(207): #1-grams: 16002
INFO: ngram_model_trie.c(207): #2-grams: 65457
INFO: ngram_model_trie.c(207): #3-grams: 95363
INFO: lm_trie.c(399): Training quantizer
INFO: lm_trie.c(407): Building LM trie
INFO: ngram_search_fwdtree.c(99): 900 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 58 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 58 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 71380
INFO: ngram_search_fwdtree.c(339): after: 885 root, 71252 non-root channels, 56 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(305): pocketsphinx_continuous COMPILED ON: Nov 8 2015, AT: 19:42:05
ERROR: "continuous.c", line 136: Input audio file has sample rate [8000], but decoder expects [16000]
FATAL: "continuous.c", line 165: Failed to process file '/home/andreas/Documents/Taledatabase/wav/soundfile_1.wav' due to format mismatch.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks! that worked, but how come that the accuracy is very low even though I got 100% correct on this audiofile in the decodoing part of my training?
har du noen gang sett stokkmaur ogsaa kalt hestemaur plageaanden som liker aa spise seg inn i treverket (user-SOUNDFILE_1)
har du noen gang sett stokkmaur ogsaa kalt hestemaur plageaanden som liker aa spise seg inn i treverket (user-SOUNDFILE_1)
Words: 18 Correct: 18 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Insertions: 0 Deletions: 0 Substitutions: 0
Here is what i get using pocketsphinx on same audiofile: FOR AA GJOERE OM PAA HOEYRE OG ER FOR DAARLIG REGJERINGEN
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The things is that your model is pretty small and not very stable in unseen conditions. Second issue is that batch training handles silence quite differently from continuous decoding. Both try to remove silence but the effect might be slightly different for both, training handles your file as a single utterance, continuous tries to split on many utterances.
You need more data for training and bigger model basically.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I know, but my resources are limited since there is not many freely available speech databases for norwegian out there =/ So I have to just do the best out of the resources I have. Would you recommend training a ptm model instead of a continuous model?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I know, but that database is incomplete(since NST went bakrupt) and som studends tried to worki it out last year, but they did not succeed training an acoustic model with that database. Of course i could give it another shot, and I am probably going to since the speech database I use know(Database from the same resource site you linked above. produced by a company named Lingit) seems to be to small for any practical use. Thank you for your help so far!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, so I get this error when i try to make pocketsphinx recognize an 8000 sample rate audiofile
ERROR: "continuous.c", line 136: Input audio file has sample rate [8000], but decoder expects [16000] FATAL: "continuous.c", line 165: Failed to process file '/home/andreas/Documents/Taledatabase/wav/soundfile_1.wav' due to format mismatch. ]
although i trained my acoustic model for audiofiles with 8000 sample rate... so why do I get this error?
Here is the output from my terminal window:
You need to add
-samprate 8000
to configure decoder to process 8khz data.It has no relation to the model.
thanks! that worked, but how come that the accuracy is very low even though I got 100% correct on this audiofile in the decodoing part of my training?
Here is what i get using pocketsphinx on same audiofile:
FOR AA GJOERE OM PAA HOEYRE OG ER FOR DAARLIG REGJERINGEN
You need to configure cmninit value in feat.params in the model, add a line
-cmninit 40,3,-1
in a text editor.ok, I tried that and it did not make any difference. I still get
FOR AA GJOERE OM PAA HOEYRE OG ER FOR DAARLIG REGJERINGEN
Ok, you are welcome to provide all the files - model training folder, audio file you are trying to decode, pocketsphinx log.
Ok, here are the files. Thank you so much for your help Nickolay! May I ask what cminit defines and how you came up with these numbers?
You'd better share the model training folder too.
When you say model training folder do you mean the model_parameters folder? Or all the folders generated by the training(fat,trees,qmanage...)?
All the folders
You can download the folders here: http://we.tl/eB4dh89mkF
are all the folders you need in the link? or have I forgotten something this time as well? XD
Last edit: Andreas Ravndal 2016-02-08
Hello
The following line should give you good results:
The things is that your model is pretty small and not very stable in unseen conditions. Second issue is that batch training handles silence quite differently from continuous decoding. Both try to remove silence but the effect might be slightly different for both, training handles your file as a single utterance, continuous tries to split on many utterances.
You need more data for training and bigger model basically.
I know, but my resources are limited since there is not many freely available speech databases for norwegian out there =/ So I have to just do the best out of the resources I have. Would you recommend training a ptm model instead of a continuous model?
There is very big Norwegian database availalbe here:
http://www.nb.no/sprakbanken/show?serial=oai%3Anb.no%3Asbr-13&lang=en
You could work on that, there are also Swedish and Danish corpora.
I know, but that database is incomplete(since NST went bakrupt) and som studends tried to worki it out last year, but they did not succeed training an acoustic model with that database. Of course i could give it another shot, and I am probably going to since the speech database I use know(Database from the same resource site you linked above. produced by a company named Lingit) seems to be to small for any practical use. Thank you for your help so far!