Menu

Is it working?

Help
2016-06-11
2016-06-11
  • Effy Stonem

    Effy Stonem - 2016-06-11

    Hi everyone,

    I'm new to CMUSphinx and have so far installed Debian Jessie on a VM and configured and installed all the required libraries including getting my USB and internal microphone to work with pulseaudio etc.

    I have tested arecord -f S16_LE -r 16000 /tmp/sample.wav and aplay/tmp/sample.wav and all is well :)

    (/tmp/sample.wav is a 5 second recording with a single utterance of the word "Testing").

    I'm running the following command:

    pocketsphinx_continuous -infile /tmp/sample.wav

    and getting the following output:

    emmanuel@raspberrypi:~$ pocketsphinx_continuous -infile /tmp/sample.wav
    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
    Current configuration:
    [NAME]          [DEFLT]     [VALUE]
    -agc            none        none
    -agcthresh      2.0     2.000000e+00
    -allphone               
    -allphone_ci        no      no
    -alpha          0.97        9.700000e-01
    -ascale         20.0        2.000000e+01
    -aw         1       1
    -backtrace      no      no
    -beam           1e-48       1.000000e-48
    -bestpath       yes     yes
    -bestpathlw     9.5     9.500000e+00
    -ceplen         13      13
    -cmn            current     current
    -cmninit        8.0     40,3,-1
    -compallsen     no      no
    -debug                  0
    -dict                   /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
    -dictcase       no      no
    -dither         no      no
    -doublebw       no      no
    -ds         1       1
    -fdict                  
    -feat           1s_c_d_dd   1s_c_d_dd
    -featparams             
    -fillprob       1e-8        1.000000e-08
    -frate          100     100
    -fsg                    
    -fsgusealtpron      yes     yes
    -fsgusefiller       yes     yes
    -fwdflat        yes     yes
    -fwdflatbeam        1e-64       1.000000e-64
    -fwdflatefwid       4       4
    -fwdflatlw      8.5     8.500000e+00
    -fwdflatsfwin       25      25
    -fwdflatwbeam       7e-29       7.000000e-29
    -fwdtree        yes     yes
    -hmm                    /usr/local/share/pocketsphinx/model/en-us/en-us
    -input_endian       little      little
    -jsgf                   
    -keyphrase              
    -kws                    
    -kws_delay      10      10
    -kws_plp        1e-1        1.000000e-01
    -kws_threshold      1       1.000000e+00
    -latsize        5000        5000
    -lda                    
    -ldadim         0       0
    -lifter         0       22
    -lm                 /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin
    -lmctl                  
    -lmname                 
    -logbase        1.0001      1.000100e+00
    -logfn                  
    -logspec        no      no
    -lowerf         133.33334   1.300000e+02
    -lpbeam         1e-40       1.000000e-40
    -lponlybeam     7e-29       7.000000e-29
    -lw         6.5     6.500000e+00
    -maxhmmpf       30000       30000
    -maxwpf         -1      -1
    -mdef                   
    -mean                   
    -mfclogdir              
    -min_endfr      0       0
    -mixw                   
    -mixwfloor      0.0000001   1.000000e-07
    -mllr                   
    -mmap           yes     yes
    -ncep           13      13
    -nfft           512     512
    -nfilt          40      25
    -nwpen          1.0     1.000000e+00
    -pbeam          1e-48       1.000000e-48
    -pip            1.0     1.000000e+00
    -pl_beam        1e-10       1.000000e-10
    -pl_pbeam       1e-10       1.000000e-10
    -pl_pip         1.0     1.000000e+00
    -pl_weight      3.0     3.000000e+00
    -pl_window      5       5
    -rawlogdir              
    -remove_dc      no      no
    -remove_noise       yes     yes
    -remove_silence     yes     yes
    -round_filters      yes     yes
    -samprate       16000       1.600000e+04
    -seed           -1      -1
    -sendump                
    -senlogdir              
    -senmgau                
    -silprob        0.005       5.000000e-03
    -smoothspec     no      no
    -svspec                 0-12/13-25/26-38
    -tmat                   
    -tmatfloor      0.0001      1.000000e-04
    -topn           4       4
    -topn_beam      0       0
    -toprule                
    -transform      legacy      dct
    -unit_area      yes     yes
    -upperf         6855.4976   6.800000e+03
    -uw         1.0     1.000000e+00
    -vad_postspeech     50      50
    -vad_prespeech      20      20
    -vad_startspeech    10      10
    -vad_threshold      2.0     2.000000e+00
    -var                    
    -varfloor       0.0001      1.000000e-04
    -varnorm        no      no
    -verbose        no      no
    -warp_params                
    -warp_type      inverse_linear  inverse_linear
    -wbeam          7e-29       7.000000e-29
    -wip            0.65        6.500000e-01
    -wlen           0.025625    2.562500e-02
    
    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
    INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
    INFO: ms_gauden.c(292): 42 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
    INFO: ms_gauden.c(292): 42 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(354): 222 variance values floored
    INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
    INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
    INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(835): Maximum top-N: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 138623 * 20 bytes (2707 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
    INFO: dict.c(213): Allocated 1014 KiB for strings, 1677 KiB for phones
    INFO: dict.c(336): 134522 words read
    INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 5 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
    INFO: ngram_search_fwdtree.c(99): 790 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 57 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 57 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 152144
    INFO: ngram_search_fwdtree.c(339): after: 722 root, 152016 non-root channels, 53 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Jun 11 2016, AT: 16:08:35
    
    INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to   < 43.21 -6.39 -2.18  0.60 -3.39 -1.73 -6.46 -2.40 -3.69  0.47 -1.74  2.08  2.01 >
    INFO: ngram_search_fwdtree.c(1553):      800 words recognized (9/fr)
    INFO: ngram_search_fwdtree.c(1555):   331725 senones evaluated (3727/fr)
    INFO: ngram_search_fwdtree.c(1559):  1505333 channels searched (16913/fr), 61370 1st, 23826 last
    INFO: ngram_search_fwdtree.c(1562):     2306 words for which last channels evaluated (25/fr)
    INFO: ngram_search_fwdtree.c(1564):    70622 candidate words for entering last phone (793/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 1.12 CPU 1.254 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 1.12 wall 1.262 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 12 words
    INFO: ngram_search_fwdflat.c(948):      533 words recognized (6/fr)
    INFO: ngram_search_fwdflat.c(950):    17327 senones evaluated (195/fr)
    INFO: ngram_search_fwdflat.c(952):    10430 channels searched (117/fr)
    INFO: ngram_search_fwdflat.c(954):      946 words searched (10/fr)
    INFO: ngram_search_fwdflat.c(957):      354 word transitions (3/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.03 CPU 0.031 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.03 wall 0.032 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.5
    INFO: ngram_search.c(1279): Eliminated 1 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 113 nodes, 1 links
    INFO: ps_lattice.c(1380): Bestpath score: -1236
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:5:87) = -152743
    INFO: ps_lattice.c(1441): Joint P(O,S) = -152743 P(S|O) = 0
    INFO: ngram_search.c(875): bestpath -0.00 CPU -0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.003 xRT
    
    INFO: ngram_search.c(467): Resized score stack to 200000 entries
    INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
    INFO: cmn_prior.c(131): cmn_prior_update: from < 43.21 -6.39 -2.18  0.60 -3.39 -1.73 -6.46 -2.40 -3.69  0.47 -1.74  2.08  2.01 >
    INFO: cmn_prior.c(149): cmn_prior_update: to   < 54.07 -5.74  1.83  1.31 -3.88 -9.85 -3.91 -2.37 -3.00 -0.65 -2.71  0.12  1.26 >
    INFO: ngram_search_fwdtree.c(1553):     6091 words recognized (48/fr)
    INFO: ngram_search_fwdtree.c(1555):   452706 senones evaluated (3593/fr)
    INFO: ngram_search_fwdtree.c(1559):  2491102 channels searched (19770/fr), 79274 1st, 215731 last
    INFO: ngram_search_fwdtree.c(1562):    11354 words for which last channels evaluated (90/fr)
    INFO: ngram_search_fwdtree.c(1564):   167663 candidate words for entering last phone (1330/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 1.73 CPU 1.371 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 1.73 wall 1.374 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 278 words
    INFO: ngram_search_fwdflat.c(948):     4804 words recognized (38/fr)
    INFO: ngram_search_fwdflat.c(950):   177496 senones evaluated (1409/fr)
    INFO: ngram_search_fwdflat.c(952):   422283 channels searched (3351/fr)
    INFO: ngram_search_fwdflat.c(954):    22832 words searched (181/fr)
    INFO: ngram_search_fwdflat.c(957):    14656 word transitions (116/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.24 CPU 0.187 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.24 wall 0.189 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.98
    INFO: ngram_search.c(1279): Eliminated 2 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 625 nodes, 6042 links
    INFO: ps_lattice.c(1380): Bestpath score: -4690
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:98:124) = -275897
    INFO: ps_lattice.c(1441): Joint P(O,S) = -343150 P(S|O) = -67253
    INFO: ngram_search.c(875): bestpath 0.03 CPU 0.022 xRT
    INFO: ngram_search.c(878): bestpath 0.03 wall 0.023 xRT
    dang
    INFO: cmn_prior.c(131): cmn_prior_update: from < 54.07 -5.74  1.83  1.31 -3.88 -9.85 -3.91 -2.37 -3.00 -0.65 -2.71  0.12  1.26 >
    INFO: cmn_prior.c(149): cmn_prior_update: to   < 54.07 -5.74  1.83  1.31 -3.88 -9.85 -3.91 -2.37 -3.00 -0.65 -2.71  0.12  1.26 >
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 2.85 CPU 1.339 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 2.87 wall 1.346 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.26 CPU 0.124 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.27 wall 0.125 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.03 CPU 0.013 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.03 wall 0.015 xRT
    

    Honestly, I can't even tell where in the output it's reporting the speech recogitino or if it worked? Is it the word "dang" that's being reported? (which would be incorrect).

    I have followed the official tutorials and a dozen more on Google but this is as far as I can get.

    Many thanks

     
    • Nickolay V. Shmyrev

      Is it the word "dang" that's being reported? (which would be incorrect).

      Yes

      To get help on accuracy issues you need to provide the file sample.wav

       
  • Effy Stonem

    Effy Stonem - 2016-06-11

    I apologise, I've attached it. Thank you

     
    • Nickolay V. Shmyrev

      Pocketsphinx is not very good in recognizing short phrases of high volume. If you repeat this word several times, other instances will be recognized correctly. For example, you can concatenate two-three copies of the same file and try.

       
  • Effy Stonem

    Effy Stonem - 2016-06-11

    I understand.

    My actual use case, it to be continually listening and record (and transcribe) speach to text, from anything from thirty minute to two hour conversations.

    Is perhaps Pocketsphinx not the best technology for this case?

    Thank you Nickolay

     
    • Nickolay V. Shmyrev

      Something like

      https://github.com/srvk/eesen-transcriber

      should work better

       
  • Effy Stonem

    Effy Stonem - 2016-06-11

    I'll take a look.

    Many thanks for your help

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.