Menu

Mic problems with pocketsphinx_continuous Win

Help
newhost2
2010-05-11
2012-09-22
  • newhost2

    newhost2 - 2010-05-11

    Hi all,

    after some work I managed to train a german acoustic model with SphinxTrain in
    Windows XP. When testing it with scripts_pl/decode/slave.pl ->
    pocketsphinx_batch.exe I get a correct word quota of about 66 %, not too good
    but a beginning.

    Now I'd like to do some live tests with pocketsphinx_continuous.exe and a
    microphone. I tested with two different mics, one internal of my laptop, one
    external via USB.

    But whatever I do, not one recognized word is correct. When I get the prompt
    "READY" from pocketsphinx_continous and I say some words, it runs through the
    recognition routines. So I guess something from the mic gets through.

    I experience the same problem when testing pocketsphinx_continuous with the
    delivered english example models.

    1. I made some tests with the command line arguments, e.g. -adcdev default and -input_endian little, but nothing worked. I have googled a lot but found only hints for parametrizing the mic input in Linux. Any ideas how to find the problem in Windows?

    2. I pass the same -hmm, -lm and -dict parameters as in the successful pocketsphinx_batch tests. Is there something else I could have done wrong in that context?

    Here is my version information:
    -SphinxTrain 1.0
    -SphinxBase 0.6
    -PocketSphinx 0.6.0
    -Visual Studio 2008

    Here is one output from pocketsphinx_continuous.exe:


    _C:\temp\tutorial\voxforge\bin>pocketsphinx_continuous
    pocketsphinx_continuous_ar
    gs.txt
    INFO: cmd_ln.c(512): Parsing command line:
    \
    -hmm C:/temp/tutorial/voxforge/model_parameters/voxforge.cd_semi_1000 \
    -dict C:/temp/tutorial/voxforge/etc/voxforge.dic \
    -lm C:/temp/tutorial/voxforge/etc/voxforge.ug.lm

    Current configuration:

    -adcdev
    -agc none none
    -agcthresh 2.0 2.000000e+000
    -alpha 0.97 9.700000e-001
    -argfile
    -ascale 20.0 2.000000e+001
    -backtrace no no
    -beam 1e-48 1.000000e-048
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+000
    -bghist no no
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict C:/temp/tutorial/voxforge/etc/voxforge.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-008
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-064
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+000
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-029
    -fwdtree yes yes
    -hmm C:/temp/tutorial/voxforge/model_parameters/voxfo
    rge.cd_semi_1000
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm C:/temp/tutorial/voxforge/etc/voxforge.ug.lm
    -lmctl
    -lmname default default
    -logbase 1.0001 1.000100e+000
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+002
    -lpbeam 1e-40 1.000000e-040
    -lponlybeam 7e-29 7.000000e-029
    -lw 6.5 6.500000e+000
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -mixw
    -mixwfloor 0.0000001 1.000000e-007
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+000
    -pbeam 1e-48 1.000000e-048
    -pip 1.0 1.000000e+000
    -pl_beam 1e-10 1.000000e-010
    -pl_pbeam 1e-5 1.000000e-005
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+004
    -seed -1 -1
    -sendump
    -senmgau
    -silprob 0.005 5.000000e-003
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-004
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+003
    -usewdphones no no
    -uw 1.0 1.000000e+000
    -var
    -varfloor 0.0001 1.000000e-004
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-029
    -wip 0.65 6.500000e-001
    -wlen 0.025625 2.562500e-002

    INFO: cmd_ln.c(512): Parsing command line:
    \
    -alpha 0.97 \
    -dither yes \
    -doublebw no \
    -nfilt 40 \
    -ncep 13 \
    -lowerf 133.33334 \
    -upperf 6855.4976 \
    -nfft 512 \
    -wlen 0.0256 \
    -transform legacy \
    -feat s2_4x \
    -agc none \
    -cmn current \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+000
    -alpha 0.97 9.700000e-001
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no yes
    -doublebw no no
    -feat 1s_c_d_dd s2_4x
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.333333e+002
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+004
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+003
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.560000e-002

    INFO: acmod.c(238): Parsed model-specific feature parameters from
    C:/temp/tutori
    al/voxforge/model_parameters/voxforge.cd_semi_1000/feat.params
    INFO: fe_interface.c(288): You are using the internal mechanism to generate
    the
    seed.
    INFO: feat.c(848): Initializing feature stream to type: 's2_4x', ceplen=13,
    CMN=
    'current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: mdef.c(520): Reading model definition:
    C:/temp/tutorial/voxforge/model_par
    ameters/voxforge.cd_semi_1000/mdef
    INFO: bin_mdef.c(173): Allocating 19470 * 8 bytes (152 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    C:/temp/tutorial
    /voxforge/model_parameters/voxforge.cd_semi_1000/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    C:/temp/tutorial/vox
    forge/model_parameters/voxforge.cd_semi_1000/means
    INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
    256x12 256x24 256x3 256x12
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    C:/temp/tutorial/vox
    forge/model_parameters/voxforge.cd_semi_1000/variances
    INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
    256x12 256x24 256x3 256x12
    INFO: ms_gauden.c(356): 135 variance values floored
    INFO: s2_semi_mgau.c(897): Loading senones from dump file
    C:/temp/tutorial/voxfo
    rge/model_parameters/voxforge.cd_semi_1000/sendump
    INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(984): Rows: 256, Columns: 1208
    INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0 0
    INFO: dict.c(294): Allocating 7115 * 20 bytes (138 KiB) for word entries
    INFO: dict.c(306): Reading main dictionary:
    C:/temp/tutorial/voxforge/etc/voxfor
    ge.dic
    INFO: dict.c(206): Allocated 26 KiB for strings, 50 KiB for phones
    INFO: dict.c(309): 3016 words read
    INFO: dict.c(314): Reading filler dictionary:
    C:/temp/tutorial/voxforge/model_pa
    rameters/voxforge.cd_semi_1000/noisedict
    INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(317): 3 words read
    INFO: dict2pid.c(402): Building PID tables for dictionary
    INFO: dict2pid.c(409): Allocating 3019 * 4 bytes (11 KiB) for word-internal
    arra
    ys
    INFO: dict2pid.c(414): Allocating 41^3 * 2 bytes (134 KiB) for word-initial
    trip
    hones
    INFO: dict2pid.c(453): Allocating 19833 entries of 2 bytes (38 KiB) for
    internal
    ssids
    INFO: dict2pid.c(130): Allocated 20336 bytes (19 KiB) for word-final triphones
    INFO: dict2pid.c(193): Allocated 20336 bytes (19 KiB) for single-phone word
    trip
    hones
    INFO: ngram_model_arpa.c(476): ngrams 1=945, 2=2146, 3=2699
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(515): 945 = #unigrams created
    INFO: ngram_model_arpa.c(194): Reading bigrams
    INFO: ngram_model_arpa.c(531): 2146 = #bigrams created
    INFO: ngram_model_arpa.c(532): 153 = #prob2 entries
    INFO: ngram_model_arpa.c(539): 288 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(291): Reading trigrams
    INFO: ngram_model_arpa.c(552): 2699 = #trigrams created
    INFO: ngram_model_arpa.c(553): 77 = #prob3 entries
    INFO: ngram_search_fwdtree.c(99): 239 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
    w
    ords
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
    single
    -phone words
    INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 4218
    INFO: ngram_search_fwdtree.c(333): after: 180 root, 4090 non-root channels, 3
    si
    ngle-phone words
    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    Allocating 32 buffers of 2500 samples each
    INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: May 5 2010, AT:
    1
    8:00:23

    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00
    0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 7.54 -0.43 0.18 0.42 0.10 -
    0.24 0.00 -0.16 -0.00 -0.07 -0.26 -0.04 -0.12 >
    INFO: ngram_search_fwdtree.c(1502): 3042 words recognized (37/fr)
    INFO: ngram_search_fwdtree.c(1504): 70230 senones evaluated (846/fr)
    INFO: ngram_search_fwdtree.c(1506): 182540 channels searched (2199/fr), 14179
    1st, 50550 last
    INFO: ngram_search_fwdtree.c(1510): 4879 words for which last channels evalu
    ated (58/fr)
    INFO: ngram_search_fwdtree.c(1513): 11173 candidate words for entering last p
    hone (134/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 160 words
    INFO: ngram_search_fwdflat.c(912): 2174 words recognized (26/fr)
    INFO: ngram_search_fwdflat.c(914): 60620 senones evaluated (730/fr)
    INFO: ngram_search_fwdflat.c(916): 148514 channels searched (1789/fr)
    INFO: ngram_search_fwdflat.c(918): 11393 words searched (137/fr)
    INFO: ngram_search_fwdflat.c(920): 7724 word transitions (93/fr)
    WARNING: "ngram_search.c", line 1082: not found in last frame, using
    <sil>
    instead
    INFO: ngram_search.c(1132): lattice start node .0 end node <sil>.62
    INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(<sil>:62:81) = -906817
    INFO: ps_lattice.c(1266): Joint P(O,S) = -927457 P(S|O) = -20640
    000000000: JUNI (-17379361)
    READY...._ </sil></sil>
    </sil>


    Any help will be appreciated, thanks!

    Kind regards
    Michael

     
  • Nickolay V. Shmyrev

    There is a known bug in sphinxbase on win32, see this thread for example:

    http://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3698792

    you can dump audio with -rawlogdir and check if it's corrupted for you too

    To fix it we'll need to get access to win32 and it will take some time. The
    root of the problem is in ad_read function in
    sphinxbase/lib/libsphinxad/rec_win32.c

     
  • newhost2

    newhost2 - 2010-05-12

    Thanks for the quick reply.

    I dumped audio as described and also get corrupted files with repeated samples
    and clicks. It sounds as if maybe the buffer switching is buggy.

    Is anybody working on this? Can I contribute to track down the problem? I have
    some C knowledge but am far from a Windows API expert.

    Michael

     
  • Nickolay V. Shmyrev

    Is anybody working on this?

    no

    Can I contribute to track down the problem?

    better submit a patch to fix it

    I have some C knowledge but am far from a Windows API expert.

    same we are

     

Log in to post a comment.