I have used pocketsphinx-5prealpha and sphinxbase-5prealpha to transform English wav-format file into text successfully,
but I cannot use them to transform any Mandarin wav-format file into text.
I have downloaded Mandarin acoustic and language model at "https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/".
Maybe I place Mandarin acoustic and language model at a wrong folder or path.
My test environment and error message are described as follows.
Would you please help me solve the problem? Thanks.
Test Enviroment:
1.Operation System: Windows 10 Home Edition.
2.pocketsphinx-5prealpha.tar.gz with modified date 2016-09-24.
3.sphinxbase-5prealpha.tar.gz with modified date 2016-09-24.
4.Mandarin Acoustic and Language Model with modified date 2016-05-31.
5.The Path of "pocketsphinx_continuous.exe" file:
E:\pocketsphinxnew\pocketsphinx\bin\Debug\Win32
6.The Location of Mandarin Acoustic and Language Model:
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\feat.params
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\mdef
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\means
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\mixture_weights
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\noisedict
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\sendump
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\transition_matrices
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\variances
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\LICENSE
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_64000_utf8.DMP
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_utf8.dic
I have used pocketsphinx-5prealpha and sphinxbase-5prealpha to transform English wav-format file into text successfully,
but I cannot use them to transform any Mandarin wav-format file into text.
I have downloaded Mandarin acoustic and language model at "https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/".
Maybe I place Mandarin acoustic and language model at a wrong folder or path.
My test environment and error message are described as follows.
Would you please help me solve the problem? Thanks.
Test Enviroment:
1.Operation System: Windows 10 Home Edition.
2.pocketsphinx-5prealpha.tar.gz with modified date 2016-09-24.
3.sphinxbase-5prealpha.tar.gz with modified date 2016-09-24.
4.Mandarin Acoustic and Language Model with modified date 2016-05-31.
5.The Path of "pocketsphinx_continuous.exe" file:
E:\pocketsphinxnew\pocketsphinx\bin\Debug\Win32
6.The Location of Mandarin Acoustic and Language Model:
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\feat.params
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\mdef
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\means
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\mixture_weights
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\noisedict
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\sendump
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\transition_matrices
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000\variances
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\LICENSE
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_64000_utf8.DMP
E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_utf8.dic
Error Message:
E:\TestVoice>ffmpeg -i Test.wav
ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 7.2.0 (GCC)
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-cuda --enable-cuvid --enable-d3d11va --enable-nvenc --enable-dxva2 --enable-avisynth --enable-libmfx
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'Test.wav':
Duration: 00:00:24.48, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
At least one output file must be specified
E:\TestVoice>pocketsphinx_continuous.exe -infile Test.wav -hmm E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000 -lm E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_64000_utf8.DMP -dict E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_utf8.dic
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-001
-ascale 20.0 2.000000e+001
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-048
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-ceplen 13 13
-cmn live current
-cmninit 40,3,-1 40,3,-1
-compallsen no no
-debug 0
-dict E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_utf8.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd s2_4x
-featparams
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-001
-kws_threshold 1 1.000000e+000
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_64000_utf8.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 6.500000e+000
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-048
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-10 1.000000e-010
-pl_pip 1.0 1.000000e+000
-pl_weight 3.0 3.000000e+000
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+003
-uw 1.0 1.000000e+000
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+000
-var
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.560000e-002
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/mdef
INFO: bin_mdef.c(181): Allocating 68760 * 8 bytes (537 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/means
INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/variances
INFO: ms_gauden.c(242): 70 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(304): 24440 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 256, Columns: 8210
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 101599 * 20 bytes (1984 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_utf8.dic
INFO: dict.c(213): Dictionary size 97495, allocated 737 KiB for strings, 977 KiB for phones
INFO: dict.c(336): 97495 words read
INFO: dict.c(358): Reading filler dictionary: E:\pocketsphinxnew\pocketsphinx\model\Mandarin\zh_broadcastnews_ptm256_8000/noisedict
INFO: dict.c(213): Dictionary size 97503, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 8 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 70^3 * 2 bytes (669 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 59080 bytes (57 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 59080 bytes (57 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(70): No \data\ mark in LM file
INFO: ngram_model_trie.c(445): Trying to read LM in dmp format
INFO: ngram_model_trie.c(527): ngrams 1=63944, 2=16600781, 3=20708460
calloc(3,4) failed from e:\pocketsphinxnew\sphinxbase\src\libsphinxbase\lm\ngrams_raw.c(312)
E:\TestVoice>