Menu

PocketSphinx Android configuration has no effect

Help
Luc Gilot
2017-06-27
2017-06-28
  • Luc Gilot

    Luc Gilot - 2017-06-27

    Hi
    First I apologize for my bad english (I am french), and I thank you for this great recognition tool.

    I am building a tool (Pocket Sphinx on Android) to reconize latin names of some animals (microscopic worms, about 200 names/words), seeking for the maximal accuracy : in this case, it is more important to avoid word confusion (a word is understood while another was pronounced) than to maximize number of times where a word is recognized.
    So I tried to build a tool to measure influence of changing decoder parameters in that context.

    My problem is that changing parameters has absolutely no effect for some of them which are supposed to be used for pruning (pbeam, wbeam, lpbeam, lponlybeam, maxwpf, maxhmmpf, lw), while some others have (even if sometimes low) effect (threshold, beam, ds, topn, pl_window). For example, giving wBeam following values (1E-80, 1E-60, 1E-48, 1E-30, 1E-20, 1E-18, 1E-15, 1E-12, 1E-10, 0,00000001, 0,00001, 0,01, 1, 100000, 10000000000) does not change anything to the recognition result (either when other parameters are left to default or when beam and pbeam are also modified).

    I tried to change those parameters in parameter file (using SpeechRecognizerSetup.setupFromFile()) and without parameter file (using SpeechRecognizerSetup.defaultSetup().setFloat("-pbeam", <value>)). Both give the same results.

    Other parameters are :
    Threshold 1,00E-15
    Beam 1,00E-70
    Pbeam 1,00E-80
    Wbeam 7,00E-20
    DS 1
    TopN 4
    LPBeam 1,00E-40
    LPOnlyBeam 7,00E-29
    MaxWPF -1
    MaxHMMPF 29940
    Pl_Window 21
    lw 0

    Am I missing something ?

    Thank you

    Luc

     
    • Nickolay V. Shmyrev

      CMUSphinx expects dot "." for decimal numbers like "7.0e-29", French-style comma make the system drop the rest.

      It is better to tune recognition on desktop, not on Android.

       
  • Luc Gilot

    Luc Gilot - 2017-06-27

    Thank you for your answer.

    I was wrong with dots : values shown on the upper message were traduced in order to be correctly read by Excel french version (I pasted values from that version). Real values sent to decoder use dot and not commas, and are correctly injected into decoder (decoder.getConfig().getFloat("-wbeam") returns the value I injected) .

    I think pruning should be the same on desktop than on Android, which works fine with emulator and I manage better Java than C.

    Regards

    Luc

     

    Last edit: Luc Gilot 2017-06-27
    • Nickolay V. Shmyrev

      Ok, this is exactly why you need to tune the system on desktop first - it enables others to see and reproduce your problems.

      If you need assistance on accuracy you need to provide a test database with the scripts for desktop to reproduce your results as described in http://cmusphinx.github.io/wiki/tutorialtuning

       

      Last edit: Nickolay V. Shmyrev 2017-06-27
  • Luc Gilot

    Luc Gilot - 2017-06-27

    OK, I am trying to follow http://cmusphinx.github.io/wiki/tutorialtuning, witch needs to follow https://cmusphinx.github.io/wiki/tutoriallm/.

    In that one I stop at creation of the statistical language model : "ARPA model training with SRILM". At that step, executing "./ngram-count -kndiscount -interpolate -text nematodesSelection.txt -lm nematodesSelection.lm", results in error message : "one of required modified KneserNey count-of-counts is zero / error in discount estimator for order 1".
    This occurs whatever the file content is. For example with an example I found :
    "<doc id="2" url="http://it.wikipedia.org/wiki/Harmonium">
    L'harmonium è uno strumento musicale azionato con una tastiera, detta manuale.
    Sono stati costruiti anche alcuni harmonium con due manuali.
    </doc>"

    My wish is to make it work witha list of that kind (each word having the same probability, and without link between words) :
    "stop annuler achromadora acro acrobeles acrolobus actino aglenchus alaimidae alaimus allodorylaimus amphidelus amplimerlinius ..."

    Regards

    Luc

     
    • Nickolay V. Shmyrev

      sults in error message : "one of required modified KneserNey count-of-counts is zero / error in discount estimator for order 1".

      This is not an error, just an information about sufficient training data. You can also try

        ./ngram-count -cdiscount 0.1 -text nematodesSelection.txt -lm nematodesSelection.lm
      

      stop annuler achromadora acro acrobeles acrolobus actino aglenchus alaimidae alaimus allodorylaimus amphidelus amplimerlinius

      You probably need to list such words one per line, not together.

       
  • Luc Gilot

    Luc Gilot - 2017-06-28

    Thank you again for your answer
    The command you proposed worked well and the lm file has been created.
    However I should have used the site http://www.speech.cs.cmu.edu/tools/lmtool-new.html which is quite more simple.
    Luc

     
  • Luc Gilot

    Luc Gilot - 2017-06-28

    Hello again.
    I am now in the step of running the test, which leads on 0% success.
    Test data are attached. WAV files used are recognized with 35% success in the android app.

    Test commands are :
    "D:\Luc\Lycée ORT\TSII\2 TS\2016_2017\Projets\Elisol\Projet VC\sphinxtrain\bin\Release\x64\pocketsphinx_batch.exe" ^
    -adcin yes ^
    -cepdir wav ^
    -cepext .wav ^
    -ctl test.fileids ^
    -lm nematodes.lm ^
    -dict nematodes.dict ^
    -hmm fr-ptm-5.2 ^
    -hyp test.hyp

    "D:\Programmes\Perl\bin\Perl.exe" ^
    "D:\Luc\Lycée ORT\TSII\2 TS\2016_2017\Projets\Elisol\Projet VC\sphinxtrain\scripts\decode\word_align.pl" ^
    test.transcription test.hyp

    Thank you for your help

    Luc


    Additional data
    word_align.pl outputs
    STOP ANNULER ACRO ACROBELES ALAIMUS ANAPLECTUS APHEL APHELENCHOIDES BOLEO CEPHALO DIDAE DITYL HELICO MELOIDOGYNE MESODORY MONONCHUS PANAGROLAIMUS PARATYLENCHUS PLECTUS PRATYLENCHUS PSILENCHUS R ACTIF SEINURA T I TYLENCHO XIPHI (CalibrageRapide)
    (CalibrageRapide)
    Words: 28 Correct: 0 Errors: 28 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
    Insertions: 0 Deletions: 28 Substitutions: 0
    STOP (stop)
    (stop)
    Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
    Insertions: 0 Deletions: 1 Substitutions: 0
    ANNULER (annuler)
    (annuler)
    Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
    Insertions: 0 Deletions: 1 Substitutions: 0
    ACRO (acro)
    *** (acro)
    Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
    Insertions: 0 Deletions: 1 Substitutions: 0
    TOTAL Words: 31 Correct: 0 Errors: 31
    TOTAL Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
    TOTAL Insertions: 0 Deletions: 31 Substitutions: 0

    While pocketsphinx_batch.exe outputs
    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from fr-ptm-5.2/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+000
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-001
    -ascale 20.0 2.000000e+001
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-048
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+000
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict nematodes.dict
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-008
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-064
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+000
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-029
    -fwdtree yes yes
    -hmm fr-ptm-5.2
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-001
    -kws_threshold 1 1.000000e+000
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 22
    -lm nematodes.lm
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+000
    -logfn
    -logspec no no
    -lowerf 133.33334 1.320000e+002
    -lpbeam 1e-40 1.000000e-040
    -lponlybeam 7e-29 7.000000e-029
    -lw 6.5 6.500000e+000
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-007
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -nwpen 1.0 1.000000e+000
    -pbeam 1e-48 1.000000e-048
    -pip 1.0 1.000000e+000
    -pl_beam 1e-10 1.000000e-010
    -pl_pbeam 1e-10 1.000000e-010
    -pl_pip 1.0 1.000000e+000
    -pl_weight 3.0 3.000000e+000
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+004
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-003
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -tmat
    -tmatfloor 0.0001 1.000000e-004
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+003
    -uw 1.0 1.000000e+000
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+000
    -var
    -varfloor 0.0001 1.000000e-004
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-029
    -wip 0.65 6.500000e-001
    -wlen 0.025625 2.562500e-002

    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: fr-ptm-5.2/mdef
    INFO: bin_mdef.c(181): Allocating 101051 * 8 bytes (789 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: fr-ptm-5.2/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: fr-ptm-5.2/means
    INFO: ms_gauden.c(292): 36 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 128x13
    INFO: ms_gauden.c(294): 128x13
    INFO: ms_gauden.c(294): 128x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: fr-ptm-5.2/variances
    INFO: ms_gauden.c(292): 36 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 128x13
    INFO: ms_gauden.c(294): 128x13
    INFO: ms_gauden.c(294): 128x13
    INFO: ms_gauden.c(354): 65 variance values floored
    INFO: ptm_mgau.c(476): Loading senones from dump file fr-ptm-5.2/sendump
    INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(563): Rows: 128, Columns: 2108
    INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(835): Maximum top-N: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4290 * 32 bytes (134 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: nematodes.dict
    INFO: dict.c(213): Allocated 2 KiB for strings, 3 KiB for phones
    INFO: dict.c(336): 191 words read
    INFO: dict.c(358): Reading filler dictionary: fr-ptm-5.2/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 36^3 * 2 bytes (91 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 31392 bytes (30 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 31392 bytes (30 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(358): Header doesn't match
    INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(192): LM of order 3
    INFO: ngram_model_trie.c(194): #1-grams: 193
    INFO: ngram_model_trie.c(194): #2-grams: 382
    INFO: ngram_model_trie.c(194): #3-grams: 191
    INFO: lm_trie.c(473): Training quantizer
    INFO: lm_trie.c(481): Building LM trie
    INFO: ngram_search_fwdtree.c(99): 79 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128
    ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary
    INFO: ngram_search_fwdtree.c(339): after: 0 root, 0 non-root channels, 3 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: batch.c(729): Decoding 'CalibrageRapide'
    INFO: cmn.c(183): CMN: 43.70 -1.38 -6.20 10.95 -2.01 2.61 -0.97 4.62 1.84 -5.77 -0.86 -1.69 0.51
    INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
    INFO: ngram_search_fwdtree.c(1553): 7302 words recognized (2/fr)
    INFO: ngram_search_fwdtree.c(1555): 10524 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1559): 7515 channels searched (2/fr), 0 1st, 7515 last
    INFO: ngram_search_fwdtree.c(1562): 7515 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 1.70 CPU 0.049 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 1.79 wall 0.051 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search.c(459): Resized backpointer table to 20000 entries
    INFO: ngram_search_fwdflat.c(948): 10515 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 10527 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(952): 10521 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 10521 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(957): 76 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.14 CPU 0.004 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.13 wall 0.004 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .3409
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 45 nodes, 81 links
    INFO: ps_lattice.c(1380): Bestpath score: -7350
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:3409:3508) = -359823
    INFO: ps_lattice.c(1441): Joint P(O,S) = -393600 P(S|O) = -33777
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
    INFO: batch.c(761): CalibrageRapide: 35.09 seconds speech, 1.84 seconds CPU, 1.93 seconds wall
    INFO: batch.c(763): CalibrageRapide: 0.05 xRT (CPU), 0.05 xRT (elapsed)
    (CalibrageRapide -7458)
    CalibrageRapide done --------------------------------------
    INFO: batch.c(729): Decoding 'stop'
    INFO: cmn.c(183): CMN: 43.75 -8.46 -10.26 -7.51 -11.70 -0.86 -0.37 7.49 5.61 -2.53 -12.44 -5.48 7.23
    INFO: ngram_search_fwdtree.c(1553): 249 words recognized (3/fr)
    INFO: ngram_search_fwdtree.c(1555): 261 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1559): 255 channels searched (2/fr), 0 1st, 255 last
    INFO: ngram_search_fwdtree.c(1562): 255 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.03 CPU 0.036 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.06 wall 0.066 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(948): 249 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 261 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(952): 255 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 255 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(957): 61 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.00 wall 0.005 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .9
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 5 nodes, 3 links
    INFO: ps_lattice.c(1380): Bestpath score: -330
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:9:86) = -33243
    INFO: ps_lattice.c(1441): Joint P(O,S) = -37103 P(S|O) = -3860
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.01 wall 0.007 xRT
    INFO: batch.c(761): stop: 0.87 seconds speech, 0.03 seconds CPU, 0.10 seconds wall
    INFO: batch.c(763): stop: 0.04 xRT (CPU), 0.11 xRT (elapsed)
    (stop -495)
    stop done --------------------------------------
    INFO: batch.c(729): Decoding 'annuler'
    INFO: cmn.c(183): CMN: 43.79 -2.53 -6.35 14.13 3.54 5.48 -0.25 4.77 9.97 -6.77 -4.51 -3.32 0.25
    INFO: ngram_search_fwdtree.c(1553): 330 words recognized (3/fr)
    INFO: ngram_search_fwdtree.c(1555): 342 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1559): 336 channels searched (2/fr), 0 1st, 336 last
    INFO: ngram_search_fwdtree.c(1562): 336 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.05 CPU 0.041 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.07 wall 0.057 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(948): 330 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 342 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(952): 336 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 336 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(957): 61 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.01 wall 0.005 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .9
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 7 nodes, 3 links
    INFO: ps_lattice.c(1380): Bestpath score: -324
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:9:113) = -29165
    INFO: ps_lattice.c(1441): Joint P(O,S) = -32801 P(S|O) = -3636
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.01 wall 0.006 xRT
    INFO: batch.c(761): annuler: 1.14 seconds speech, 0.05 seconds CPU, 0.09 seconds wall
    INFO: batch.c(763): annuler: 0.04 xRT (CPU), 0.08 xRT (elapsed)
    (annuler -411)
    annuler done --------------------------------------
    INFO: batch.c(729): Decoding 'acro'
    INFO: cmn.c(183): CMN: 46.97 4.95 -1.08 -1.16 -6.24 -8.49 -3.10 2.97 1.11 -10.05 -5.33 -8.92 3.29
    INFO: ngram_search_fwdtree.c(1553): 225 words recognized (3/fr)
    INFO: ngram_search_fwdtree.c(1555): 237 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1559): 231 channels searched (2/fr), 0 1st, 231 last
    INFO: ngram_search_fwdtree.c(1562): 231 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.03 CPU 0.039 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.04 wall 0.055 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(948): 225 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 237 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(952): 231 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 231 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(957): 65 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.00 wall 0.005 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .64
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 11 nodes, 11 links
    INFO: ps_lattice.c(1380): Bestpath score: -328
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:64:78) = -29463
    INFO: ps_lattice.c(1441): Joint P(O,S) = -34952 P(S|O) = -5489
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.01 wall 0.008 xRT
    INFO: batch.c(761): acro: 0.79 seconds speech, 0.03 seconds CPU, 0.07 seconds wall
    INFO: batch.c(763): acro: 0.04 xRT (CPU), 0.09 xRT (elapsed)
    (acro -453)
    acro done --------------------------------------
    INFO: batch.c(778): TOTAL 37.89 seconds speech, 1.95 seconds CPU, 2.19 seconds wall
    INFO: batch.c(780): AVERAGE 0.05 xRT (CPU), 0.06 xRT (elapsed)
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 1.81 CPU 0.048 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 1.95 wall 0.052 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.14 CPU 0.004 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.14 wall 0.004 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.02 wall 0.001 xRT

     
    • Nickolay V. Shmyrev

      Your lm is uppercase and dictionary is lowercase. The tutorial clearly says that online LM is only for US English.

       
  • Luc Gilot

    Luc Gilot - 2017-06-28

    OK Thank you
    Luc

     
  • Luc Gilot

    Luc Gilot - 2017-08-08

    Hi, its me again.

    There is some progress :
    the .lm file is correctly generated
    pocketsphinx_batch.exe used in combination with word_align.pl (as indicated in https://cmusphinx.github.io/wiki/tutorialtuning/) show a good accuracy : about 81%

    However accuracy is poor in the Android app : about 30%. Thus I tried to use in this app the nematodes.lm file that works in pocketsphinx_batch.exe.
    That for :
    I added the files nematodes.lm and nematodes.lm.bin (may be not necessary but this makes no difference in the result) in the device
    I added the command MySpeechRecognizerSetup.setString("-lm", <PathForTheLMFile>)
    I verified that a wrong value for <PathForTheLMFile> leads to a specific error
    When <PathForTheLMFile> is correct, the (partial) output is :
    [...]
    I/cmusphinx: INFO: dict2pid.c(196): Allocated 15696 bytes (15 KiB) for single-phone word triphones
    I/cmusphinx: INFO: ngrammodeltrie.c(399): Trying to read LM in trie binary format
    I/cmusphinx: INFO: ngrammodeltrie.c(410): Header doesn't match
    I/cmusphinx: INFO: ngrammodeltrie.c(177): Trying to read LM in arpa format
    E/cmusphinx: ERROR: "ngrammodeltrie.c", line 103: Bad ngram count
    I/cmusphinx: INFO: ngrammodeltrie.c(489): Trying to read LM in DMP format
    E/cmusphinx: ERROR: "ngrammodeltrie.c", line 500: Wrong magic header size number a5c6461: /storage/emulated/0/Android/data/net.Limoog.NemaPhone/files/sync/nematodes.lm is not a dump file
    and output stops here

    while using the same files with pocketsphinx_batch.exe, the output is :
    [...]
    INFO: dict2pid.c(196): Allocated 31392 bytes (30 KiB) for single-phone word triphones
    INFO: ngrammodeltrie.c(347): Trying to read LM in trie binary format
    INFO: ngrammodeltrie.c(358): Header doesn't match
    INFO: ngrammodeltrie.c(176): Trying to read LM in arpa format
    INFO: ngrammodeltrie.c(192): LM of order 3
    INFO: ngrammodeltrie.c(194): #1-grams: 193
    INFO: ngrammodeltrie.c(194): #2-grams: 382
    INFO: ngrammodeltrie.c(194): #3-grams: 0
    INFO: lmtrie.c(473): Training quantizer
    INFO: lmtrie.c(481): Building LM trie
    INFO: ngramsearchfwdtree.c(99): 79 unique initial diphones
    [...] (output continues)

    Do you have any explanation for the difference when using the same files in both contexts, and what should I do to use the .lm file inside the Android app ?

    Thank you

    Luc

    PS : attached you find outputs, feat.params and nematode.lm

     
    • Nickolay V. Shmyrev

      Hi Luc

      Most likely your desktop version is too old and Android demo is already updated. Please build from github and try again.

      For accuracy you need to check logs if recognition is done faster than realtime, otherwise it could just fail to recognize things properly.

       
  • Luc Gilot

    Luc Gilot - 2017-08-08

    Hi Nickolay

    I thank you for the answer, but I am not sure to understand it quite well.

    When you say "deskop version" I understand "pocketsphinx_batch.exe", which is the tool which actually works : why do you say it "too old" (although their date is 2016/01, they work fine with the newly created lm file) ?

    When you say to "build from github", I suppose you speak about c libraries compiled for android (like "libpocketsphinx_jni.so"). I am afraid this will be difficult, as I read in https://cmusphinx.github.io/wiki/tutorialandroid/#building-pocketsphinx-android "You shouldn’t build it unless you understand what you are doing. Use prebuilt binaries instead."
    The compiled files I currently use ("libpocketsphinx_jni.so") are from july 2015, so the first thing I will try is to rebuild my app in order to make it work on the basis of files provided in the latest pocketsphinx-android-demo release ("pocketsphinx-android-5prealpha-release.aar").

    Then I wonder where I can find logs resulting of execution of Android app.

    Finally, I don't think there can be a problem of recognition being done faster than realtime because the accuracy is the same either with real-time speaking than using files saved during the latter real-time speaking.

    Thank you for hour help

    Luc

     
    • Nickolay V. Shmyrev

      Ok, if you haven't upgraded to using aar yet, you need to upgrade first.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.