after some work I managed to train a german acoustic model with SphinxTrain in
Windows XP. When testing it with scripts_pl/decode/slave.pl ->
pocketsphinx_batch.exe I get a correct word quota of about 66 %, not too good
but a beginning.
Now I'd like to do some live tests with pocketsphinx_continuous.exe and a
microphone. I tested with two different mics, one internal of my laptop, one
external via USB.
But whatever I do, not one recognized word is correct. When I get the prompt
"READY" from pocketsphinx_continous and I say some words, it runs through the
recognition routines. So I guess something from the mic gets through.
I experience the same problem when testing pocketsphinx_continuous with the
delivered english example models.
I made some tests with the command line arguments, e.g. -adcdev default and -input_endian little, but nothing worked. I have googled a lot but found only hints for parametrizing the mic input in Linux. Any ideas how to find the problem in Windows?
I pass the same -hmm, -lm and -dict parameters as in the successful pocketsphinx_batch tests. Is there something else I could have done wrong in that context?
Here is my version information:
-SphinxTrain 1.0
-SphinxBase 0.6
-PocketSphinx 0.6.0
-Visual Studio 2008
Here is one output from pocketsphinx_continuous.exe:
you can dump audio with -rawlogdir and check if it's corrupted for you too
To fix it we'll need to get access to win32 and it will take some time. The
root of the problem is in ad_read function in
sphinxbase/lib/libsphinxad/rec_win32.c
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
after some work I managed to train a german acoustic model with SphinxTrain in
Windows XP. When testing it with scripts_pl/decode/slave.pl ->
pocketsphinx_batch.exe I get a correct word quota of about 66 %, not too good
but a beginning.
Now I'd like to do some live tests with pocketsphinx_continuous.exe and a
microphone. I tested with two different mics, one internal of my laptop, one
external via USB.
But whatever I do, not one recognized word is correct. When I get the prompt
"READY" from pocketsphinx_continous and I say some words, it runs through the
recognition routines. So I guess something from the mic gets through.
I experience the same problem when testing pocketsphinx_continuous with the
delivered english example models.
I made some tests with the command line arguments, e.g. -adcdev default and -input_endian little, but nothing worked. I have googled a lot but found only hints for parametrizing the mic input in Linux. Any ideas how to find the problem in Windows?
I pass the same -hmm, -lm and -dict parameters as in the successful pocketsphinx_batch tests. Is there something else I could have done wrong in that context?
Here is my version information:
-SphinxTrain 1.0
-SphinxBase 0.6
-PocketSphinx 0.6.0
-Visual Studio 2008
Here is one output from pocketsphinx_continuous.exe:
_C:\temp\tutorial\voxforge\bin>pocketsphinx_continuous
pocketsphinx_continuous_ar
gs.txt
INFO: cmd_ln.c(512): Parsing command line:
\
-hmm C:/temp/tutorial/voxforge/model_parameters/voxforge.cd_semi_1000 \
-dict C:/temp/tutorial/voxforge/etc/voxforge.dic \
-lm C:/temp/tutorial/voxforge/etc/voxforge.ug.lm
Current configuration:
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+000
-alpha 0.97 9.700000e-001
-argfile
-ascale 20.0 2.000000e+001
-backtrace no no
-beam 1e-48 1.000000e-048
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict C:/temp/tutorial/voxforge/etc/voxforge.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm C:/temp/tutorial/voxforge/model_parameters/voxfo
rge.cd_semi_1000
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm C:/temp/tutorial/voxforge/etc/voxforge.ug.lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 6.500000e+000
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-048
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-5 1.000000e-005
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+003
-usewdphones no no
-uw 1.0 1.000000e+000
-var
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002
INFO: cmd_ln.c(512): Parsing command line:
\
-alpha 0.97 \
-dither yes \
-doublebw no \
-nfilt 40 \
-ncep 13 \
-lowerf 133.33334 \
-upperf 6855.4976 \
-nfft 512 \
-wlen 0.0256 \
-transform legacy \
-feat s2_4x \
-agc none \
-cmn current \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+000
-alpha 0.97 9.700000e-001
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no yes
-doublebw no no
-feat 1s_c_d_dd s2_4x
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+002
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+003
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-002
INFO: acmod.c(238): Parsed model-specific feature parameters from
C:/temp/tutori
al/voxforge/model_parameters/voxforge.cd_semi_1000/feat.params
INFO: fe_interface.c(288): You are using the internal mechanism to generate
the
seed.
INFO: feat.c(848): Initializing feature stream to type: 's2_4x', ceplen=13,
CMN=
'current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition:
C:/temp/tutorial/voxforge/model_par
ameters/voxforge.cd_semi_1000/mdef
INFO: bin_mdef.c(173): Allocating 19470 * 8 bytes (152 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
C:/temp/tutorial
/voxforge/model_parameters/voxforge.cd_semi_1000/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
C:/temp/tutorial/vox
forge/model_parameters/voxforge.cd_semi_1000/means
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
256x12 256x24 256x3 256x12
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
C:/temp/tutorial/vox
forge/model_parameters/voxforge.cd_semi_1000/variances
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
256x12 256x24 256x3 256x12
INFO: ms_gauden.c(356): 135 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file
C:/temp/tutorial/voxfo
rge/model_parameters/voxforge.cd_semi_1000/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(984): Rows: 256, Columns: 1208
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0 0
INFO: dict.c(294): Allocating 7115 * 20 bytes (138 KiB) for word entries
INFO: dict.c(306): Reading main dictionary:
C:/temp/tutorial/voxforge/etc/voxfor
ge.dic
INFO: dict.c(206): Allocated 26 KiB for strings, 50 KiB for phones
INFO: dict.c(309): 3016 words read
INFO: dict.c(314): Reading filler dictionary:
C:/temp/tutorial/voxforge/model_pa
rameters/voxforge.cd_semi_1000/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 3 words read
INFO: dict2pid.c(402): Building PID tables for dictionary
INFO: dict2pid.c(409): Allocating 3019 * 4 bytes (11 KiB) for word-internal
arra
ys
INFO: dict2pid.c(414): Allocating 41^3 * 2 bytes (134 KiB) for word-initial
trip
hones
INFO: dict2pid.c(453): Allocating 19833 entries of 2 bytes (38 KiB) for
internal
ssids
INFO: dict2pid.c(130): Allocated 20336 bytes (19 KiB) for word-final triphones
INFO: dict2pid.c(193): Allocated 20336 bytes (19 KiB) for single-phone word
trip
hones
INFO: ngram_model_arpa.c(476): ngrams 1=945, 2=2146, 3=2699
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 945 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 2146 = #bigrams created
INFO: ngram_model_arpa.c(532): 153 = #prob2 entries
INFO: ngram_model_arpa.c(539): 288 = #bo_wt2 entries
INFO: ngram_model_arpa.c(291): Reading trigrams
INFO: ngram_model_arpa.c(552): 2699 = #trigrams created
INFO: ngram_model_arpa.c(553): 77 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 239 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
w
ords
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
single
-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 4218
INFO: ngram_search_fwdtree.c(333): after: 180 root, 4090 non-root channels, 3
si
ngle-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
Allocating 32 buffers of 2500 samples each
INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: May 5 2010, AT:
1
8:00:23
READY....
Listening...
Stopped listening, please wait...
INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 7.54 -0.43 0.18 0.42 0.10 -
0.24 0.00 -0.16 -0.00 -0.07 -0.26 -0.04 -0.12 >
INFO: ngram_search_fwdtree.c(1502): 3042 words recognized (37/fr)
INFO: ngram_search_fwdtree.c(1504): 70230 senones evaluated (846/fr)
INFO: ngram_search_fwdtree.c(1506): 182540 channels searched (2199/fr), 14179
1st, 50550 last
INFO: ngram_search_fwdtree.c(1510): 4879 words for which last channels evalu
ated (58/fr)
INFO: ngram_search_fwdtree.c(1513): 11173 candidate words for entering last p
hone (134/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 160 words
INFO: ngram_search_fwdflat.c(912): 2174 words recognized (26/fr)
INFO: ngram_search_fwdflat.c(914): 60620 senones evaluated (730/fr)
INFO: ngram_search_fwdflat.c(916): 148514 channels searched (1789/fr)
INFO: ngram_search_fwdflat.c(918): 11393 words searched (137/fr)
INFO: ngram_search_fwdflat.c(920): 7724 word transitions (93/fr)
WARNING: "ngram_search.c", line 1082: not found in last frame, using
<sil>
instead
INFO: ngram_search.c(1132): lattice start node
.0 end node <sil>.62</sil>INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(<sil>:62:81) = -906817
INFO: ps_lattice.c(1266): Joint P(O,S) = -927457 P(S|O) = -20640
000000000: JUNI (-17379361)
READY...._ </sil></sil>
Any help will be appreciated, thanks!
Kind regardsMichael
There is a known bug in sphinxbase on win32, see this thread for example:
http://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3698792
you can dump audio with -rawlogdir and check if it's corrupted for you too
To fix it we'll need to get access to win32 and it will take some time. The
root of the problem is in ad_read function in
sphinxbase/lib/libsphinxad/rec_win32.c
Thanks for the quick reply.
I dumped audio as described and also get corrupted files with repeated samples
and clicks. It sounds as if maybe the buffer switching is buggy.
Is anybody working on this? Can I contribute to track down the problem? I have
some C knowledge but am far from a Windows API expert.
Michael
no
better submit a patch to fix it
same we are