Menu

Weird behaviour using Arch Linux

Help
2016-05-17
2016-05-17
  • Telis Papageo

    Telis Papageo - 2016-05-17

    Hi, i am new with cmu sphinx. I did exactly as the website says to install and configure pocketsphinx with sphinxbase, but when i use it is really weird. If i give it a file, it is not accurate at all, and speaking from the mic has the same result. Also, the "output" comes in a kind of weird way as it starts listening and stops all the time, like a loop. I provide an example, when i try to give it a wav file which says something simple like "read my lips" the output is something totally different.

    This wav says: "My biggest job is to prevent the enemy from hitting us again".

    pocketsphinx_continuous -infile converted.wav

    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
    Current configuration:
    [NAME]          [DEFLT]     [VALUE]
    -agc            none        none
    -agcthresh      2.0     2.000000e+00
    -allphone               
    -allphone_ci        no      no
    -alpha          0.97        9.700000e-01
    -ascale         20.0        2.000000e+01
    -aw         1       1
    -backtrace      no      no
    -beam           1e-48       1.000000e-48
    -bestpath       yes     yes
    -bestpathlw     9.5     9.500000e+00
    -ceplen         13      13
    -cmn            current     current
    -cmninit        8.0     40,3,-1
    -compallsen     no      no
    -debug                  0
    -dict                   /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
    -dictcase       no      no
    -dither         no      no
    -doublebw       no      no
    -ds         1       1
    -fdict                  
    -feat           1s_c_d_dd   1s_c_d_dd
    -featparams             
    -fillprob       1e-8        1.000000e-08
    -frate          100     100
    -fsg                    
    -fsgusealtpron      yes     yes
    -fsgusefiller       yes     yes
    -fwdflat        yes     yes
    -fwdflatbeam        1e-64       1.000000e-64
    -fwdflatefwid       4       4
    -fwdflatlw      8.5     8.500000e+00
    -fwdflatsfwin       25      25
    -fwdflatwbeam       7e-29       7.000000e-29
    -fwdtree        yes     yes
    -hmm                    /usr/local/share/pocketsphinx/model/en-us/en-us
    -input_endian       little      little
    -jsgf                   
    -keyphrase              
    -kws                    
    -kws_delay      10      10
    -kws_plp        1e-1        1.000000e-01
    -kws_threshold      1       1.000000e+00
    -latsize        5000        5000
    -lda                    
    -ldadim         0       0
    -lifter         0       22
    -lm                 /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin
    -lmctl                  
    -lmname                 
    -logbase        1.0001      1.000100e+00
    -logfn                  
    -logspec        no      no
    -lowerf         133.33334   1.300000e+02
    -lpbeam         1e-40       1.000000e-40
    -lponlybeam     7e-29       7.000000e-29
    -lw         6.5     6.500000e+00
    -maxhmmpf       30000       30000
    -maxwpf         -1      -1
    -mdef                   
    -mean                   
    -mfclogdir              
    -min_endfr      0       0
    -mixw                   
    -mixwfloor      0.0000001   1.000000e-07
    -mllr                   
    -mmap           yes     yes
    -ncep           13      13
    -nfft           512     512
    -nfilt          40      20
    -nwpen          1.0     1.000000e+00
    -pbeam          1e-48       1.000000e-48
    -pip            1.0     1.000000e+00
    -pl_beam        1e-10       1.000000e-10
    -pl_pbeam       1e-10       1.000000e-10
    -pl_pip         1.0     1.000000e+00
    -pl_weight      3.0     3.000000e+00
    -pl_window      5       5
    -rawlogdir              
    -remove_dc      no      no
    -remove_noise       yes     yes
    -remove_silence     yes     yes
    -round_filters      yes     yes
    -samprate       16000       1.600000e+04
    -seed           -1      -1
    -sendump                
    -senlogdir              
    -senmgau                
    -silprob        0.005       5.000000e-03
    -smoothspec     no      no
    -svspec                 0-12/13-25/26-38
    -tmat                   
    -tmatfloor      0.0001      1.000000e-04
    -topn           4       4
    -topn_beam      0       0
    -toprule                
    -transform      legacy      dct
    -unit_area      yes     yes
    -upperf         6855.4976   3.700000e+03
    -uw         1.0     1.000000e+00
    -vad_postspeech     50      50
    -vad_prespeech      20      20
    -vad_startspeech    10      10
    -vad_threshold      2.0     2.000000e+00
    -var                    
    -varfloor       0.0001      1.000000e-04
    -varnorm        no      no
    -verbose        no      no
    -warp_params                
    -warp_type      inverse_linear  inverse_linear
    -wbeam          7e-29       7.000000e-29
    -wip            0.65        6.500000e-01
    -wlen           0.025625    2.562500e-02
    
    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
    INFO: bin_mdef.c(181): Allocating 142108 * 8 bytes (1110 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
    INFO: ms_gauden.c(292): 42 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
    INFO: ms_gauden.c(292): 42 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(354): 98 variance values floored
    INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
    INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
    INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(835): Maximum top-N: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 138623 * 32 bytes (4331 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
    INFO: dict.c(213): Allocated 1014 KiB for strings, 1677 KiB for phones
    INFO: dict.c(336): 134522 words read
    INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 5 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
    INFO: ngram_search_fwdtree.c(99): 790 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 57 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 57 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 151818
    INFO: ngram_search_fwdtree.c(339): after: 722 root, 151690 non-root channels, 53 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: May 16 2016, AT: 09:56:53
    
    INFO: ngram_search.c(467): Resized score stack to 200000 entries
    INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
    INFO: ngram_search.c(467): Resized score stack to 400000 entries
    INFO: ngram_search.c(459): Resized backpointer table to 20000 entries
    INFO: ngram_search.c(467): Resized score stack to 800000 entries
    INFO: ngram_search.c(459): Resized backpointer table to 40000 entries
    INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to   < 72.28 -9.49 -15.81  0.30 -10.39  1.27  1.87 -2.02  0.89  4.83  4.42 -0.95  1.62 >
    INFO: ngram_search_fwdtree.c(1553):    20900 words recognized (30/fr)
    INFO: ngram_search_fwdtree.c(1555):  2582646 senones evaluated (3705/fr)
    INFO: ngram_search_fwdtree.c(1559): 12673442 channels searched (18182/fr), 452124 1st, 768272 last
    INFO: ngram_search_fwdtree.c(1562):    44446 words for which last channels evaluated (63/fr)
    INFO: ngram_search_fwdtree.c(1564):   785263 candidate words for entering last phone (1126/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 6.24 CPU 0.896 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 6.25 wall 0.896 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 682 words
    INFO: ngram_search_fwdflat.c(948):    14076 words recognized (20/fr)
    INFO: ngram_search_fwdflat.c(950):   737931 senones evaluated (1059/fr)
    INFO: ngram_search_fwdflat.c(952):  1646265 channels searched (2361/fr)
    INFO: ngram_search_fwdflat.c(954):    88026 words searched (126/fr)
    INFO: ngram_search_fwdflat.c(957):    50071 word transitions (71/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.77 CPU 0.110 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.77 wall 0.110 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.385
    INFO: ngram_search.c(1279): Eliminated 1 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 2250 nodes, 30894 links
    INFO: ps_lattice.c(1380): Bestpath score: -19782
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:385:695) = -1353858
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1489706 P(S|O) = -135848
    INFO: ngram_search.c(875): bestpath 0.30 CPU 0.043 xRT
    INFO: ngram_search.c(878): bestpath 0.30 wall 0.043 xRT
    not a jar that whit yeah
    INFO: cmn_prior.c(131): cmn_prior_update: from < 72.28 -9.49 -15.81  0.30 -10.39  1.27  1.87 -2.02  0.89  4.83  4.42 -0.95  1.62 >
    INFO: cmn_prior.c(149): cmn_prior_update: to   < 70.99 -10.14 -17.01 -2.08 -9.84  2.20  2.98 -2.16  0.62  4.90  3.10 -0.91  0.22 >
    INFO: ngram_search_fwdtree.c(1553):     2072 words recognized (25/fr)
    INFO: ngram_search_fwdtree.c(1555):   228144 senones evaluated (2782/fr)
    INFO: ngram_search_fwdtree.c(1559):   728505 channels searched (8884/fr), 40708 1st, 71827 last
    INFO: ngram_search_fwdtree.c(1562):     4184 words for which last channels evaluated (51/fr)
    INFO: ngram_search_fwdtree.c(1564):    39652 candidate words for entering last phone (483/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.37 CPU 0.447 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.36 wall 0.445 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 87 words
    INFO: ngram_search_fwdflat.c(948):     1361 words recognized (17/fr)
    INFO: ngram_search_fwdflat.c(950):    72801 senones evaluated (888/fr)
    INFO: ngram_search_fwdflat.c(952):   123569 channels searched (1506/fr)
    INFO: ngram_search_fwdflat.c(954):     6416 words searched (78/fr)
    INFO: ngram_search_fwdflat.c(957):     2763 word transitions (33/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.06 CPU 0.073 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.06 wall 0.072 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.53
    INFO: ngram_search.c(1279): Eliminated 1 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 203 nodes, 686 links
    INFO: ps_lattice.c(1380): Bestpath score: -2264
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:53:80) = -157810
    INFO: ps_lattice.c(1441): Joint P(O,S) = -188950 P(S|O) = -31140
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT
    whoa
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 6.61 CPU 0.851 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 6.61 wall 0.851 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.83 CPU 0.107 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.83 wall 0.107 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.30 CPU 0.038 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.30 wall 0.038 xRT
    

    Any help appreciated!

     
    • Nickolay V. Shmyrev

      You could provide the file

       
  • Telis Papageo

    Telis Papageo - 2016-05-17

    It is just an example. i randomly found it in wavsource.

     

    Last edit: Telis Papageo 2016-05-17
    • Nickolay V. Shmyrev

      This audio has very bad sound quality, it contains reverberation, clipping noise, reduced bandwidth to 5khz. It is not easy to recognize such samples, you have to build a specialized system for this. Or you need to find a way to recieve more high-quality audio.

       
  • Telis Papageo

    Telis Papageo - 2016-05-17

    Ok, i managed to find something better, but it understood only 2 words of 10. I have two questions. Is this the normal output?(kind of messy), and how can i improve the quality of the sounds i give to it?

     
    • Nickolay V. Shmyrev

      It is hard to give you advise on accuracy without seeing the sample in question. If you want high quality audio example, take an audiobook from librivox.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.