Menu

A mistake for translating wav to txt

Help
Jack Ma
2017-03-24
2017-03-25
  • Jack Ma

    Jack Ma - 2017-03-24

    Sir:
    I've got an error when I transting .wav to a txt:
    I used this command on ubuntu 16.04 (64bit):
    pocketsphinx_continuous -hmm zh_broadcastnews_ptm256_8000/ -lm zh_broadcastnews_64000_utf8.DMP -dict zh_broadcastnews_utf8.dic -infile /src/wav/1.wav -inmic yes > 1.txt

    And the command shows like this:

    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from zh_broadcastnews_ptm256_8000//feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn live current
    -cmninit 40,3,-1 40,3,-1
    -compallsen no no
    -debug 0
    -dict zh_broadcastnews_utf8.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd s2_4x
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm zh_broadcastnews_ptm256_8000/
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm zh_broadcastnews_64000_utf8.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.560000e-02

    INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
    INFO: mdef.c(518): Reading model definition: zh_broadcastnews_ptm256_8000//mdef
    INFO: bin_mdef.c(181): Allocating 68760 * 8 bytes (537 KiB) for CD tree
    INFO: tmat.c(149): Reading HMM transition probability matrices: zh_broadcastnews_ptm256_8000//transition_matrices
    INFO: acmod.c(113): Attempting to use PTM computation module
    INFO: ms_gauden.c(127): Reading mixture gaussian parameter: zh_broadcastnews_ptm256_8000//means
    INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
    INFO: ms_gauden.c(244): 256x12
    INFO: ms_gauden.c(244): 256x24
    INFO: ms_gauden.c(244): 256x3
    INFO: ms_gauden.c(244): 256x12
    INFO: ms_gauden.c(127): Reading mixture gaussian parameter: zh_broadcastnews_ptm256_8000//variances
    INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
    INFO: ms_gauden.c(244): 256x12
    INFO: ms_gauden.c(244): 256x24
    INFO: ms_gauden.c(244): 256x3
    INFO: ms_gauden.c(244): 256x12
    INFO: ms_gauden.c(304): 24440 variance values floored
    INFO: ptm_mgau.c(476): Loading senones from dump file zh_broadcastnews_ptm256_8000//sendump
    INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(563): Rows: 256, Columns: 8210
    INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(838): Maximum top-N: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 101599 * 32 bytes (3174 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: zh_broadcastnews_utf8.dic
    INFO: dict.c(213): Dictionary size 97495, allocated 737 KiB for strings, 977 KiB for phones
    INFO: dict.c(336): 97495 words read
    INFO: dict.c(358): Reading filler dictionary: zh_broadcastnews_ptm256_8000//noisedict
    INFO: dict.c(213): Dictionary size 97503, allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 8 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 70^3 * 2 bytes (669 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 118160 bytes (115 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 118160 bytes (115 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(365): Header doesn't match
    INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(70): No \data\ mark in LM file
    INFO: ngram_model_trie.c(445): Trying to read LM in dmp format
    INFO: ngram_model_trie.c(527): ngrams 1=63944, 2=16600781, 3=20708460
    INFO: lm_trie.c(474): Training quantizer
    INFO: lm_trie.c(482): Building LM trie
    INFO: ngram_search_fwdtree.c(74): Initializing search tree
    INFO: ngram_search_fwdtree.c(101): 476 unique initial diphones
    INFO: ngram_search_fwdtree.c(186): Creating search channels
    INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 75539
    INFO: ngram_search_fwdtree.c(333): Created 461 root, 75411 non-root channels, 27 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Mar 23 2017, AT: 18:01:09

    ERROR: "continuous.c", line 123: Input audio file has [0] bits per sample instead of 16
    FATAL: "continuous.c", line 165: Failed to process file '/src/wav/1.wav' due to format mismatch.

    I don't know this error means what,Would you help me? thank you very much!

     
  • Arseniy Gorin

    Arseniy Gorin - 2017-03-24

    -infile and -inmic are mutually exclusive
    also, you shoud bake sure your file is in the right format (16khz 16bit little-endian mono)

     

Log in to post a comment.