Sir:
I've got an error when I transting .wav to a txt:
I used this command on ubuntu 16.04 (64bit):
pocketsphinx_continuous -hmm zh_broadcastnews_ptm256_8000/ -lm zh_broadcastnews_64000_utf8.DMP -dict zh_broadcastnews_utf8.dic -infile /src/wav/1.wav -inmic yes > 1.txt
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: zh_broadcastnews_ptm256_8000//mdef
INFO: bin_mdef.c(181): Allocating 68760 * 8 bytes (537 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: zh_broadcastnews_ptm256_8000//transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: zh_broadcastnews_ptm256_8000//means
INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: zh_broadcastnews_ptm256_8000//variances
INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(304): 24440 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file zh_broadcastnews_ptm256_8000//sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 256, Columns: 8210
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 101599 * 32 bytes (3174 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: zh_broadcastnews_utf8.dic
INFO: dict.c(213): Dictionary size 97495, allocated 737 KiB for strings, 977 KiB for phones
INFO: dict.c(336): 97495 words read
INFO: dict.c(358): Reading filler dictionary: zh_broadcastnews_ptm256_8000//noisedict
INFO: dict.c(213): Dictionary size 97503, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 8 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 70^3 * 2 bytes (669 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 118160 bytes (115 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 118160 bytes (115 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(70): No \data\ mark in LM file
INFO: ngram_model_trie.c(445): Trying to read LM in dmp format
INFO: ngram_model_trie.c(527): ngrams 1=63944, 2=16600781, 3=20708460
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 476 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 75539
INFO: ngram_search_fwdtree.c(333): Created 461 root, 75411 non-root channels, 27 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Mar 23 2017, AT: 18:01:09
ERROR: "continuous.c", line 123: Input audio file has [0] bits per sample instead of 16
FATAL: "continuous.c", line 165: Failed to process file '/src/wav/1.wav' due to format mismatch.
I don't know this error means what,Would you help me? thank you very much!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sir:
I've got an error when I transting .wav to a txt:
I used this command on ubuntu 16.04 (64bit):
pocketsphinx_continuous -hmm zh_broadcastnews_ptm256_8000/ -lm zh_broadcastnews_64000_utf8.DMP -dict zh_broadcastnews_utf8.dic -infile /src/wav/1.wav -inmic yes > 1.txt
And the command shows like this:
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from zh_broadcastnews_ptm256_8000//feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live current
-cmninit 40,3,-1 40,3,-1
-compallsen no no
-debug 0
-dict zh_broadcastnews_utf8.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd s2_4x
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm zh_broadcastnews_ptm256_8000/
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm zh_broadcastnews_64000_utf8.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.560000e-02
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: zh_broadcastnews_ptm256_8000//mdef
INFO: bin_mdef.c(181): Allocating 68760 * 8 bytes (537 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: zh_broadcastnews_ptm256_8000//transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: zh_broadcastnews_ptm256_8000//means
INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: zh_broadcastnews_ptm256_8000//variances
INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(304): 24440 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file zh_broadcastnews_ptm256_8000//sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 256, Columns: 8210
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 101599 * 32 bytes (3174 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: zh_broadcastnews_utf8.dic
INFO: dict.c(213): Dictionary size 97495, allocated 737 KiB for strings, 977 KiB for phones
INFO: dict.c(336): 97495 words read
INFO: dict.c(358): Reading filler dictionary: zh_broadcastnews_ptm256_8000//noisedict
INFO: dict.c(213): Dictionary size 97503, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 8 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 70^3 * 2 bytes (669 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 118160 bytes (115 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 118160 bytes (115 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(70): No \data\ mark in LM file
INFO: ngram_model_trie.c(445): Trying to read LM in dmp format
INFO: ngram_model_trie.c(527): ngrams 1=63944, 2=16600781, 3=20708460
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 476 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 75539
INFO: ngram_search_fwdtree.c(333): Created 461 root, 75411 non-root channels, 27 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Mar 23 2017, AT: 18:01:09
ERROR: "continuous.c", line 123: Input audio file has [0] bits per sample instead of 16
FATAL: "continuous.c", line 165: Failed to process file '/src/wav/1.wav' due to format mismatch.
I don't know this error means what,Would you help me? thank you very much!
-infile and -inmic are mutually exclusive
also, you shoud bake sure your file is in the right format (16khz 16bit little-endian mono)