Hi
First I apologize for my bad english (I am french), and I thank you for this great recognition tool.
I am building a tool (Pocket Sphinx on Android) to reconize latin names of some animals (microscopic worms, about 200 names/words), seeking for the maximal accuracy : in this case, it is more important to avoid word confusion (a word is understood while another was pronounced) than to maximize number of times where a word is recognized.
So I tried to build a tool to measure influence of changing decoder parameters in that context.
My problem is that changing parameters has absolutely no effect for some of them which are supposed to be used for pruning (pbeam, wbeam, lpbeam, lponlybeam, maxwpf, maxhmmpf, lw), while some others have (even if sometimes low) effect (threshold, beam, ds, topn, pl_window). For example, giving wBeam following values (1E-80, 1E-60, 1E-48, 1E-30, 1E-20, 1E-18, 1E-15, 1E-12, 1E-10, 0,00000001, 0,00001, 0,01, 1, 100000, 10000000000) does not change anything to the recognition result (either when other parameters are left to default or when beam and pbeam are also modified).
I tried to change those parameters in parameter file (using SpeechRecognizerSetup.setupFromFile()) and without parameter file (using SpeechRecognizerSetup.defaultSetup().setFloat("-pbeam", <value>)). Both give the same results.
I was wrong with dots : values shown on the upper message were traduced in order to be correctly read by Excel french version (I pasted values from that version). Real values sent to decoder use dot and not commas, and are correctly injected into decoder (decoder.getConfig().getFloat("-wbeam") returns the value I injected) .
I think pruning should be the same on desktop than on Android, which works fine with emulator and I manage better Java than C.
Regards
Luc
Last edit: Luc Gilot 2017-06-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, this is exactly why you need to tune the system on desktop first - it enables others to see and reproduce your problems.
If you need assistance on accuracy you need to provide a test database with the scripts for desktop to reproduce your results as described in http://cmusphinx.github.io/wiki/tutorialtuning
Last edit: Nickolay V. Shmyrev 2017-06-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In that one I stop at creation of the statistical language model : "ARPA model training with SRILM". At that step, executing "./ngram-count -kndiscount -interpolate -text nematodesSelection.txt -lm nematodesSelection.lm", results in error message : "one of required modified KneserNey count-of-counts is zero / error in discount estimator for order 1".
This occurs whatever the file content is. For example with an example I found :
"<doc id="2" url="http://it.wikipedia.org/wiki/Harmonium">
L'harmonium è uno strumento musicale azionato con una tastiera, detta manuale.
Sono stati costruiti anche alcuni harmonium con due manuali.
</doc>"
My wish is to make it work witha list of that kind (each word having the same probability, and without link between words) :
"stop annuler achromadora acro acrobeles acrolobus actino aglenchus alaimidae alaimus allodorylaimus amphidelus amplimerlinius ..."
Regards
Luc
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you again for your answer
The command you proposed worked well and the lm file has been created.
However I should have used the site http://www.speech.cs.cmu.edu/tools/lmtool-new.html which is quite more simple.
Luc
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello again.
I am now in the step of running the test, which leads on 0% success.
Test data are attached. WAV files used are recognized with 35% success in the android app.
There is some progress : the .lm file is correctly generated pocketsphinx_batch.exe used in combination with word_align.pl (as indicated in https://cmusphinx.github.io/wiki/tutorialtuning/) show a good accuracy : about 81%
However accuracy is poor in the Android app : about 30%. Thus I tried to use in this app the nematodes.lm file that works in pocketsphinx_batch.exe.
That for : I added the files nematodes.lm and nematodes.lm.bin (may be not necessary but this makes no difference in the result) in the device I added the command MySpeechRecognizerSetup.setString("-lm", <PathForTheLMFile>) I verified that a wrong value for <PathForTheLMFile> leads to a specific error When <PathForTheLMFile> is correct, the (partial) output is : [...]
I/cmusphinx: INFO: dict2pid.c(196): Allocated 15696 bytes (15 KiB) for single-phone word triphones
I/cmusphinx: INFO: ngrammodeltrie.c(399): Trying to read LM in trie binary format
I/cmusphinx: INFO: ngrammodeltrie.c(410): Header doesn't match
I/cmusphinx: INFO: ngrammodeltrie.c(177): Trying to read LM in arpa format
E/cmusphinx: ERROR: "ngrammodeltrie.c", line 103: Bad ngram count
I/cmusphinx: INFO: ngrammodeltrie.c(489): Trying to read LM in DMP format
E/cmusphinx: ERROR: "ngrammodeltrie.c", line 500: Wrong magic header size number a5c6461: /storage/emulated/0/Android/data/net.Limoog.NemaPhone/files/sync/nematodes.lm is not a dump file
and output stops here
while using the same files with pocketsphinx_batch.exe, the output is : [...]
INFO: dict2pid.c(196): Allocated 31392 bytes (30 KiB) for single-phone word triphones
INFO: ngrammodeltrie.c(347): Trying to read LM in trie binary format
INFO: ngrammodeltrie.c(358): Header doesn't match
INFO: ngrammodeltrie.c(176): Trying to read LM in arpa format
INFO: ngrammodeltrie.c(192): LM of order 3
INFO: ngrammodeltrie.c(194): #1-grams: 193
INFO: ngrammodeltrie.c(194): #2-grams: 382
INFO: ngrammodeltrie.c(194): #3-grams: 0
INFO: lmtrie.c(473): Training quantizer
INFO: lmtrie.c(481): Building LM trie
INFO: ngramsearchfwdtree.c(99): 79 unique initial diphones [...] (output continues)
Do you have any explanation for the difference when using the same files in both contexts, and what should I do to use the .lm file inside the Android app ?
Thank you
Luc
PS : attached you find outputs, feat.params and nematode.lm
I thank you for the answer, but I am not sure to understand it quite well.
When you say "deskop version" I understand "pocketsphinx_batch.exe", which is the tool which actually works : why do you say it "too old" (although their date is 2016/01, they work fine with the newly created lm file) ?
When you say to "build from github", I suppose you speak about c libraries compiled for android (like "libpocketsphinx_jni.so"). I am afraid this will be difficult, as I read in https://cmusphinx.github.io/wiki/tutorialandroid/#building-pocketsphinx-android "You shouldn’t build it unless you understand what you are doing. Use prebuilt binaries instead."
The compiled files I currently use ("libpocketsphinx_jni.so") are from july 2015, so the first thing I will try is to rebuild my app in order to make it work on the basis of files provided in the latest pocketsphinx-android-demo release ("pocketsphinx-android-5prealpha-release.aar").
Then I wonder where I can find logs resulting of execution of Android app.
Finally, I don't think there can be a problem of recognition being done faster than realtime because the accuracy is the same either with real-time speaking than using files saved during the latter real-time speaking.
Thank you for hour help
Luc
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
First I apologize for my bad english (I am french), and I thank you for this great recognition tool.
I am building a tool (Pocket Sphinx on Android) to reconize latin names of some animals (microscopic worms, about 200 names/words), seeking for the maximal accuracy : in this case, it is more important to avoid word confusion (a word is understood while another was pronounced) than to maximize number of times where a word is recognized.
So I tried to build a tool to measure influence of changing decoder parameters in that context.
My problem is that changing parameters has absolutely no effect for some of them which are supposed to be used for pruning (pbeam, wbeam, lpbeam, lponlybeam, maxwpf, maxhmmpf, lw), while some others have (even if sometimes low) effect (threshold, beam, ds, topn, pl_window). For example, giving wBeam following values (1E-80, 1E-60, 1E-48, 1E-30, 1E-20, 1E-18, 1E-15, 1E-12, 1E-10, 0,00000001, 0,00001, 0,01, 1, 100000, 10000000000) does not change anything to the recognition result (either when other parameters are left to default or when beam and pbeam are also modified).
I tried to change those parameters in parameter file (using SpeechRecognizerSetup.setupFromFile()) and without parameter file (using SpeechRecognizerSetup.defaultSetup().setFloat("-pbeam", <value>)). Both give the same results.
Other parameters are :
Threshold 1,00E-15
Beam 1,00E-70
Pbeam 1,00E-80
Wbeam 7,00E-20
DS 1
TopN 4
LPBeam 1,00E-40
LPOnlyBeam 7,00E-29
MaxWPF -1
MaxHMMPF 29940
Pl_Window 21
lw 0
Am I missing something ?
Thank you
Luc
CMUSphinx expects dot "." for decimal numbers like "7.0e-29", French-style comma make the system drop the rest.
It is better to tune recognition on desktop, not on Android.
Thank you for your answer.
I was wrong with dots : values shown on the upper message were traduced in order to be correctly read by Excel french version (I pasted values from that version). Real values sent to decoder use dot and not commas, and are correctly injected into decoder (decoder.getConfig().getFloat("-wbeam") returns the value I injected) .
I think pruning should be the same on desktop than on Android, which works fine with emulator and I manage better Java than C.
Regards
Luc
Last edit: Luc Gilot 2017-06-27
Ok, this is exactly why you need to tune the system on desktop first - it enables others to see and reproduce your problems.
If you need assistance on accuracy you need to provide a test database with the scripts for desktop to reproduce your results as described in http://cmusphinx.github.io/wiki/tutorialtuning
Last edit: Nickolay V. Shmyrev 2017-06-27
OK, I am trying to follow http://cmusphinx.github.io/wiki/tutorialtuning, witch needs to follow https://cmusphinx.github.io/wiki/tutoriallm/.
In that one I stop at creation of the statistical language model : "ARPA model training with SRILM". At that step, executing "./ngram-count -kndiscount -interpolate -text nematodesSelection.txt -lm nematodesSelection.lm", results in error message : "one of required modified KneserNey count-of-counts is zero / error in discount estimator for order 1".
This occurs whatever the file content is. For example with an example I found :
"<doc id="2" url="http://it.wikipedia.org/wiki/Harmonium">
L'harmonium è uno strumento musicale azionato con una tastiera, detta manuale.
Sono stati costruiti anche alcuni harmonium con due manuali.
</doc>"
My wish is to make it work witha list of that kind (each word having the same probability, and without link between words) :
"stop annuler achromadora acro acrobeles acrolobus actino aglenchus alaimidae alaimus allodorylaimus amphidelus amplimerlinius ..."
Regards
Luc
This is not an error, just an information about sufficient training data. You can also try
You probably need to list such words one per line, not together.
Thank you again for your answer
The command you proposed worked well and the lm file has been created.
However I should have used the site http://www.speech.cs.cmu.edu/tools/lmtool-new.html which is quite more simple.
Luc
Hello again.
I am now in the step of running the test, which leads on 0% success.
Test data are attached. WAV files used are recognized with 35% success in the android app.
Test commands are :
"D:\Luc\Lycée ORT\TSII\2 TS\2016_2017\Projets\Elisol\Projet VC\sphinxtrain\bin\Release\x64\pocketsphinx_batch.exe" ^
-adcin yes ^
-cepdir wav ^
-cepext .wav ^
-ctl test.fileids ^
-lm nematodes.lm ^
-dict nematodes.dict ^
-hmm fr-ptm-5.2 ^
-hyp test.hyp
"D:\Programmes\Perl\bin\Perl.exe" ^
"D:\Luc\Lycée ORT\TSII\2 TS\2016_2017\Projets\Elisol\Projet VC\sphinxtrain\scripts\decode\word_align.pl" ^
test.transcription test.hyp
Thank you for your help
Luc
Additional data
word_align.pl outputs
STOP ANNULER ACRO ACROBELES ALAIMUS ANAPLECTUS APHEL APHELENCHOIDES BOLEO CEPHALO DIDAE DITYL HELICO MELOIDOGYNE MESODORY MONONCHUS PANAGROLAIMUS PARATYLENCHUS PLECTUS PRATYLENCHUS PSILENCHUS R ACTIF SEINURA T I TYLENCHO XIPHI (CalibrageRapide)
(CalibrageRapide)
Words: 28 Correct: 0 Errors: 28 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 28 Substitutions: 0
STOP (stop)
(stop)
Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 1 Substitutions: 0
ANNULER (annuler) (annuler)
Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 1 Substitutions: 0
ACRO (acro)
*** (acro)
Words: 1 Correct: 0 Errors: 1 Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
Insertions: 0 Deletions: 1 Substitutions: 0
TOTAL Words: 31 Correct: 0 Errors: 31
TOTAL Percent correct = 0.00% Error = 100.00% Accuracy = 0.00%
TOTAL Insertions: 0 Deletions: 31 Substitutions: 0
While pocketsphinx_batch.exe outputs
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from fr-ptm-5.2/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-001
-ascale 20.0 2.000000e+001
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-048
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict nematodes.dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm fr-ptm-5.2
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-001
-kws_threshold 1 1.000000e+000
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm nematodes.lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.320000e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 6.500000e+000
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-048
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-10 1.000000e-010
-pl_pip 1.0 1.000000e+000
-pl_weight 3.0 3.000000e+000
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+003
-uw 1.0 1.000000e+000
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+000
-var
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: fr-ptm-5.2/mdef
INFO: bin_mdef.c(181): Allocating 101051 * 8 bytes (789 KiB) for CD tree
INFO: tmat.c(206): Reading HMM transition probability matrices: fr-ptm-5.2/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: fr-ptm-5.2/means
INFO: ms_gauden.c(292): 36 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: fr-ptm-5.2/variances
INFO: ms_gauden.c(292): 36 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 65 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file fr-ptm-5.2/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 2108
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4290 * 32 bytes (134 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: nematodes.dict
INFO: dict.c(213): Allocated 2 KiB for strings, 3 KiB for phones
INFO: dict.c(336): 191 words read
INFO: dict.c(358): Reading filler dictionary: fr-ptm-5.2/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 36^3 * 2 bytes (91 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 31392 bytes (30 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 31392 bytes (30 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(358): Header doesn't match
INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
INFO: ngram_model_trie.c(192): LM of order 3
INFO: ngram_model_trie.c(194): #1-grams: 193
INFO: ngram_model_trie.c(194): #2-grams: 382
INFO: ngram_model_trie.c(194): #3-grams: 191
INFO: lm_trie.c(473): Training quantizer
INFO: lm_trie.c(481): Building LM trie
INFO: ngram_search_fwdtree.c(99): 79 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 4 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 4 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128
ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary
INFO: ngram_search_fwdtree.c(339): after: 0 root, 0 non-root channels, 3 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: batch.c(729): Decoding 'CalibrageRapide'
INFO: cmn.c(183): CMN: 43.70 -1.38 -6.20 10.95 -2.01 2.61 -0.97 4.62 1.84 -5.77 -0.86 -1.69 0.51
INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
INFO: ngram_search_fwdtree.c(1553): 7302 words recognized (2/fr)
INFO: ngram_search_fwdtree.c(1555): 10524 senones evaluated (3/fr)
INFO: ngram_search_fwdtree.c(1559): 7515 channels searched (2/fr), 0 1st, 7515 last
INFO: ngram_search_fwdtree.c(1562): 7515 words for which last channels evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 1.70 CPU 0.049 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 1.79 wall 0.051 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
INFO: ngram_search.c(459): Resized backpointer table to 20000 entries
INFO: ngram_search_fwdflat.c(948): 10515 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(950): 10527 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(952): 10521 channels searched (2/fr)
INFO: ngram_search_fwdflat.c(954): 10521 words searched (2/fr)
INFO: ngram_search_fwdflat.c(957): 76 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.14 CPU 0.004 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.13 wall 0.004 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.3409INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
INFO: ngram_search.c(1384): Lattice has 45 nodes, 81 links
INFO: ps_lattice.c(1380): Bestpath score: -7350
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:3409:3508) = -359823
INFO: ps_lattice.c(1441): Joint P(O,S) = -393600 P(S|O) = -33777
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
INFO: batch.c(761): CalibrageRapide: 35.09 seconds speech, 1.84 seconds CPU, 1.93 seconds wall
INFO: batch.c(763): CalibrageRapide: 0.05 xRT (CPU), 0.05 xRT (elapsed)
(CalibrageRapide -7458)
CalibrageRapide done --------------------------------------
INFO: batch.c(729): Decoding 'stop'
INFO: cmn.c(183): CMN: 43.75 -8.46 -10.26 -7.51 -11.70 -0.86 -0.37 7.49 5.61 -2.53 -12.44 -5.48 7.23
INFO: ngram_search_fwdtree.c(1553): 249 words recognized (3/fr)
INFO: ngram_search_fwdtree.c(1555): 261 senones evaluated (3/fr)
INFO: ngram_search_fwdtree.c(1559): 255 channels searched (2/fr), 0 1st, 255 last
INFO: ngram_search_fwdtree.c(1562): 255 words for which last channels evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.03 CPU 0.036 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 0.06 wall 0.066 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
INFO: ngram_search_fwdflat.c(948): 249 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(950): 261 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(952): 255 channels searched (2/fr)
INFO: ngram_search_fwdflat.c(954): 255 words searched (2/fr)
INFO: ngram_search_fwdflat.c(957): 61 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.00 wall 0.005 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.9INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
INFO: ngram_search.c(1384): Lattice has 5 nodes, 3 links
INFO: ps_lattice.c(1380): Bestpath score: -330
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:9:86) = -33243
INFO: ps_lattice.c(1441): Joint P(O,S) = -37103 P(S|O) = -3860
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.01 wall 0.007 xRT
INFO: batch.c(761): stop: 0.87 seconds speech, 0.03 seconds CPU, 0.10 seconds wall
INFO: batch.c(763): stop: 0.04 xRT (CPU), 0.11 xRT (elapsed)
(stop -495)
stop done --------------------------------------
INFO: batch.c(729): Decoding 'annuler'
INFO: cmn.c(183): CMN: 43.79 -2.53 -6.35 14.13 3.54 5.48 -0.25 4.77 9.97 -6.77 -4.51 -3.32 0.25
INFO: ngram_search_fwdtree.c(1553): 330 words recognized (3/fr)
INFO: ngram_search_fwdtree.c(1555): 342 senones evaluated (3/fr)
INFO: ngram_search_fwdtree.c(1559): 336 channels searched (2/fr), 0 1st, 336 last
INFO: ngram_search_fwdtree.c(1562): 336 words for which last channels evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.05 CPU 0.041 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 0.07 wall 0.057 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
INFO: ngram_search_fwdflat.c(948): 330 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(950): 342 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(952): 336 channels searched (2/fr)
INFO: ngram_search_fwdflat.c(954): 336 words searched (2/fr)
INFO: ngram_search_fwdflat.c(957): 61 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.01 wall 0.005 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.9INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
INFO: ngram_search.c(1384): Lattice has 7 nodes, 3 links
INFO: ps_lattice.c(1380): Bestpath score: -324
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:9:113) = -29165
INFO: ps_lattice.c(1441): Joint P(O,S) = -32801 P(S|O) = -3636
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.01 wall 0.006 xRT
INFO: batch.c(761): annuler: 1.14 seconds speech, 0.05 seconds CPU, 0.09 seconds wall
INFO: batch.c(763): annuler: 0.04 xRT (CPU), 0.08 xRT (elapsed)
(annuler -411)
annuler done --------------------------------------
INFO: batch.c(729): Decoding 'acro'
INFO: cmn.c(183): CMN: 46.97 4.95 -1.08 -1.16 -6.24 -8.49 -3.10 2.97 1.11 -10.05 -5.33 -8.92 3.29
INFO: ngram_search_fwdtree.c(1553): 225 words recognized (3/fr)
INFO: ngram_search_fwdtree.c(1555): 237 senones evaluated (3/fr)
INFO: ngram_search_fwdtree.c(1559): 231 channels searched (2/fr), 0 1st, 231 last
INFO: ngram_search_fwdtree.c(1562): 231 words for which last channels evaluated (2/fr)
INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.03 CPU 0.039 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 0.04 wall 0.055 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
INFO: ngram_search_fwdflat.c(948): 225 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(950): 237 senones evaluated (3/fr)
INFO: ngram_search_fwdflat.c(952): 231 channels searched (2/fr)
INFO: ngram_search_fwdflat.c(954): 231 words searched (2/fr)
INFO: ngram_search_fwdflat.c(957): 65 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.00 wall 0.005 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.64INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
INFO: ngram_search.c(1384): Lattice has 11 nodes, 11 links
INFO: ps_lattice.c(1380): Bestpath score: -328
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:64:78) = -29463
INFO: ps_lattice.c(1441): Joint P(O,S) = -34952 P(S|O) = -5489
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.01 wall 0.008 xRT
INFO: batch.c(761): acro: 0.79 seconds speech, 0.03 seconds CPU, 0.07 seconds wall
INFO: batch.c(763): acro: 0.04 xRT (CPU), 0.09 xRT (elapsed)
(acro -453)
acro done --------------------------------------
INFO: batch.c(778): TOTAL 37.89 seconds speech, 1.95 seconds CPU, 2.19 seconds wall
INFO: batch.c(780): AVERAGE 0.05 xRT (CPU), 0.06 xRT (elapsed)
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 1.81 CPU 0.048 xRT
INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 1.95 wall 0.052 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.14 CPU 0.004 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.14 wall 0.004 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.02 wall 0.001 xRT
Your lm is uppercase and dictionary is lowercase. The tutorial clearly says that online LM is only for US English.
OK Thank you
Luc
Hi, its me again.
There is some progress :
the .lm file is correctly generated
pocketsphinx_batch.exe used in combination with word_align.pl (as indicated in https://cmusphinx.github.io/wiki/tutorialtuning/) show a good accuracy : about 81%
However accuracy is poor in the Android app : about 30%. Thus I tried to use in this app the nematodes.lm file that works in pocketsphinx_batch.exe.
That for :
I added the files nematodes.lm and nematodes.lm.bin (may be not necessary but this makes no difference in the result) in the device
I added the command MySpeechRecognizerSetup.setString("-lm", <PathForTheLMFile>)
I verified that a wrong value for <PathForTheLMFile> leads to a specific error
When <PathForTheLMFile> is correct, the (partial) output is :
[...]
I/cmusphinx: INFO: dict2pid.c(196): Allocated 15696 bytes (15 KiB) for single-phone word triphones
I/cmusphinx: INFO: ngrammodeltrie.c(399): Trying to read LM in trie binary format
I/cmusphinx: INFO: ngrammodeltrie.c(410): Header doesn't match
I/cmusphinx: INFO: ngrammodeltrie.c(177): Trying to read LM in arpa format
E/cmusphinx: ERROR: "ngrammodeltrie.c", line 103: Bad ngram count
I/cmusphinx: INFO: ngrammodeltrie.c(489): Trying to read LM in DMP format
E/cmusphinx: ERROR: "ngrammodeltrie.c", line 500: Wrong magic header size number a5c6461: /storage/emulated/0/Android/data/net.Limoog.NemaPhone/files/sync/nematodes.lm is not a dump file
and output stops here
while using the same files with pocketsphinx_batch.exe, the output is :
[...]
INFO: dict2pid.c(196): Allocated 31392 bytes (30 KiB) for single-phone word triphones
INFO: ngrammodeltrie.c(347): Trying to read LM in trie binary format
INFO: ngrammodeltrie.c(358): Header doesn't match
INFO: ngrammodeltrie.c(176): Trying to read LM in arpa format
INFO: ngrammodeltrie.c(192): LM of order 3
INFO: ngrammodeltrie.c(194): #1-grams: 193
INFO: ngrammodeltrie.c(194): #2-grams: 382
INFO: ngrammodeltrie.c(194): #3-grams: 0
INFO: lmtrie.c(473): Training quantizer
INFO: lmtrie.c(481): Building LM trie
INFO: ngramsearchfwdtree.c(99): 79 unique initial diphones
[...] (output continues)
Do you have any explanation for the difference when using the same files in both contexts, and what should I do to use the .lm file inside the Android app ?
Thank you
Luc
PS : attached you find outputs, feat.params and nematode.lm
Hi Luc
Most likely your desktop version is too old and Android demo is already updated. Please build from github and try again.
For accuracy you need to check logs if recognition is done faster than realtime, otherwise it could just fail to recognize things properly.
Hi Nickolay
I thank you for the answer, but I am not sure to understand it quite well.
When you say "deskop version" I understand "pocketsphinx_batch.exe", which is the tool which actually works : why do you say it "too old" (although their date is 2016/01, they work fine with the newly created lm file) ?
When you say to "build from github", I suppose you speak about c libraries compiled for android (like "libpocketsphinx_jni.so"). I am afraid this will be difficult, as I read in https://cmusphinx.github.io/wiki/tutorialandroid/#building-pocketsphinx-android "You shouldn’t build it unless you understand what you are doing. Use prebuilt binaries instead."
The compiled files I currently use ("libpocketsphinx_jni.so") are from july 2015, so the first thing I will try is to rebuild my app in order to make it work on the basis of files provided in the latest pocketsphinx-android-demo release ("pocketsphinx-android-5prealpha-release.aar").
Then I wonder where I can find logs resulting of execution of Android app.
Finally, I don't think there can be a problem of recognition being done faster than realtime because the accuracy is the same either with real-time speaking than using files saved during the latter real-time speaking.
Thank you for hour help
Luc
Ok, if you haven't upgraded to using aar yet, you need to upgrade first.