Hi all,
I am new to Pockesphinx, but quite used to cross compile and running on
embedded devices. I am having trouble get word recognition when running on an
arm platform (but its OK on the i386 platform).
I am trying to run pocketsphinx on an 200Mhz ARM processor.
I have used pocketsphinx-0.6.1.tar.gz and sphinxbase-0.6.1.tar.gz
I compiled pocketsphinx and sphinxbase on Ubuntu 10.1 and ‘make check’ works
OK.
Next I crosscompiled sphinxbase and pocketsphinx by using this configure
command line:-
./configure --prefix=/home/user/sphinx_arm/install --exec-
prefix=/home/user/sphinx_arm/install --host=i386 --target=arm CC=arm-
softfloat-linux-gnu-gcc CFLAGS=”-march=armv4”
This successfully allows me to make a statically linked executables for the
arm processor.
I then created a minimum/basic test case with these steps:-
I copied the hmm and lm directories from the install directory (specified in
the .configure) and created a test.ctl file that just contained
“woman.ak.276317oa”
I copied the woman.ak.276317oa file.
Then I ran the following command:-
pocketsphinx_batch \
-hmm hmm \
-lm lm/tidigits.DMP \
-dict lm/tidigits.dic \
-ctl test.ctl \
-cepdir . \
-hyp test.match
And I can successfully this test case in Ubuntu 10.1 – it shows it recognized
TWO SEVEN SIX THREE ONE SEVEN OH in test.match
Finally I copied the arm binaries and the test case (same files that worked
for Ubuntu) to the Arm target.
The executable runs successfully but word recognition fails.
I compared the output of the arm and i386 runs. I can see the a difference on
the arm output is :-
….
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
-1.34 -0.21 -0.42 -0.74
INFO: ngram_search_fwdtree.c(1513): 0 words recognized (0/fr)
INFO: ngram_search_fwdtree.c(1515): 1 senones evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1517): 1 channels searched (0/fr), 0 1st, 1last
INFO: ngram_search_fwdtree.c(1521): 1 words for which last channels evaluated
(0/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0 words
INFO: ngram_search_fwdflat.c(912): 0 words recognized (0/fr)
INFO: ngram_search_fwdflat.c(914): 2121 senones evaluated (5/fr)
INFO: ngram_search_fwdflat.c(916): 425 channels searched (0/fr)
INFO: ngram_search_fwdflat.c(918): 425 words searched (0/fr)
INFO: ngram_search_fwdflat.c(920): 0 word transitions (0/fr)
ERROR: "ngram_search.c", line 1034: Couldn't find in first frame…
(NOTE FULL output for the unsuccessful arm execution is included below.)
Whereas the i386 (ubuntu10.1) output correctly shows :-
…
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
-1.34 -0.21 -0.42 -0.74
INFO: ngram_search_fwdtree.c(1513): 545 words recognized (0/fr)
INFO: ngram_search_fwdtree.c(1515): 32626 senones evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1517): 7590 channels searched (0/fr), 0 1st, 1
last
INFO: ngram_search_fwdtree.c(1521): 1342 words for which last channels
evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1524): 427 candidate words for entering last
phone (0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 7 words
INFO: ngram_search_fwdflat.c(912): 301 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(914): 14042 senones evaluated (33/fr)
INFO: ngram_search_fwdflat.c(916): 3679 channels searched (8/fr)
INFO: ngram_search_fwdflat.c(918): 1065 words searched (2/fr)
INFO: ngram_search_fwdflat.c(920): 548 word transitions (1/fr)
WARNING: "ngram_search.c", line 1087: not found in last frame, using
<sil> instead
(NOTE FULL output for the unsuccessful arm execution is included below.)
…
Seems that the forward tree search has failed on ARM but OK on the i386? </sil>
Q1) Can anyone suggest the reason why the same test case runs on i386 but
fails on arm?
I expect it is related to one of these:-
• the test case setup
• inherent problem running on the Arm (maybe RAM or other limitation on my
platform)
• crosscompile issue (maybe related to endianness or libraries or floating
point?)
There doesn’t seem to be any error message indicating whats gone wrong, so I
am not sure where to start to debug this issue.
I have searched the help files but haven’t found any similar problems.
So I am hoping someone with more expereince with Pocketsphinx can point me in
the right direction? Any ideas?
(A few extra notes
- When running on the arm processor there seems to be some very big pauses in the output (twice for upto 4-5 seconds?). I have assumed this is because the processor is much slower than the i386 machine- but maybe this is wrong? Maybe when running on arm something is going wrong so this is actually a symptom of my problem?)
– I have compiled arm versions that use fixed point, softfloat and hard float
arithmetic . Fixed Point and softfloat fail with the same error. Hard float
version crashes with illegal memory access near the start of execution….I
think this is likely an embedded linux environment issue though)
Finally a couple of quick ‘newbie’ questions:-
Q2) What are my chances of achieving realtime recognition using the Turtle
model/dictionary with an ARM 200Mhz?.
Q3) Would I be best to use fixed point , soft float or hard float to best
achieve realtime recognition?
Best Regards
Helibot
*Full output of the unsuccessful execution on arm processor*
Just an update .....I didnt mange to trace the problem using version 0.6. But
I went back to sphinxbase0.3 and pocketsphinx0.3 and the recognition will now
run successfully on my arm target (its very slow but it works!).
Now working on getting live detection to work - but I am having truobles with
the OSS implementation on my platform - but I expected this - the sound
drivers and/or hardware is not fully implemented on my arm device :-(.
Also I found that I couldnt use '--with-oss' in the .configure file when
crosscompiling? (I always ended up with no backend supported) Can anyone tell
me the correct procedure to set the sound hardware when cross compiling?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's better to consider config.log in order to solve cross-compilation issues
including the issues with alsa. For example, modern practice is to use --host
and --build options. --target means something different.
As for initial issue, it's better to investigate it comparing the computation
on arm and on i386 and looking for the difference.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
I am new to Pockesphinx, but quite used to cross compile and running on
embedded devices. I am having trouble get word recognition when running on an
arm platform (but its OK on the i386 platform).
I am trying to run pocketsphinx on an 200Mhz ARM processor.
I have used pocketsphinx-0.6.1.tar.gz and sphinxbase-0.6.1.tar.gz
I compiled pocketsphinx and sphinxbase on Ubuntu 10.1 and ‘make check’ works
OK.
Next I crosscompiled sphinxbase and pocketsphinx by using this configure
command line:-
./configure --prefix=/home/user/sphinx_arm/install --exec-
prefix=/home/user/sphinx_arm/install --host=i386 --target=arm CC=arm-
softfloat-linux-gnu-gcc CFLAGS=”-march=armv4”
This successfully allows me to make a statically linked executables for the
arm processor.
I then created a minimum/basic test case with these steps:-
I copied the hmm and lm directories from the install directory (specified in
the .configure) and created a test.ctl file that just contained
“woman.ak.276317oa”
I copied the woman.ak.276317oa file.
Then I ran the following command:-
pocketsphinx_batch \
-hmm hmm \
-lm lm/tidigits.DMP \
-dict lm/tidigits.dic \
-ctl test.ctl \
-cepdir . \
-hyp test.match
And I can successfully this test case in Ubuntu 10.1 – it shows it recognized
TWO SEVEN SIX THREE ONE SEVEN OH in test.match
Finally I copied the arm binaries and the test case (same files that worked
for Ubuntu) to the Arm target.
The executable runs successfully but word recognition fails.
I compared the output of the arm and i386 runs. I can see the a difference on
the arm output is :-
….
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
-1.34 -0.21 -0.42 -0.74
INFO: ngram_search_fwdtree.c(1513): 0 words recognized (0/fr)
INFO: ngram_search_fwdtree.c(1515): 1 senones evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1517): 1 channels searched (0/fr), 0 1st, 1last
INFO: ngram_search_fwdtree.c(1521): 1 words for which last channels evaluated
(0/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0 words
INFO: ngram_search_fwdflat.c(912): 0 words recognized (0/fr)
INFO: ngram_search_fwdflat.c(914): 2121 senones evaluated (5/fr)
INFO: ngram_search_fwdflat.c(916): 425 channels searched (0/fr)
INFO: ngram_search_fwdflat.c(918): 425 words searched (0/fr)
INFO: ngram_search_fwdflat.c(920): 0 word transitions (0/fr)
ERROR: "ngram_search.c", line 1034: Couldn't find
in first frame…(NOTE FULL output for the unsuccessful arm execution is included below.)
Whereas the i386 (ubuntu10.1) output correctly shows :-not found in last frame, using…
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
-1.34 -0.21 -0.42 -0.74
INFO: ngram_search_fwdtree.c(1513): 545 words recognized (0/fr)
INFO: ngram_search_fwdtree.c(1515): 32626 senones evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1517): 7590 channels searched (0/fr), 0 1st, 1
last
INFO: ngram_search_fwdtree.c(1521): 1342 words for which last channels
evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1524): 427 candidate words for entering last
phone (0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 7 words
INFO: ngram_search_fwdflat.c(912): 301 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(914): 14042 senones evaluated (33/fr)
INFO: ngram_search_fwdflat.c(916): 3679 channels searched (8/fr)
INFO: ngram_search_fwdflat.c(918): 1065 words searched (2/fr)
INFO: ngram_search_fwdflat.c(920): 548 word transitions (1/fr)
WARNING: "ngram_search.c", line 1087:
<sil> instead
(NOTE FULL output for the unsuccessful arm execution is included below.)
…
Seems that the forward tree search has failed on ARM but OK on the i386? </sil>
Q1) Can anyone suggest the reason why the same test case runs on i386 but
fails on arm?
I expect it is related to one of these:-
• the test case setup
• inherent problem running on the Arm (maybe RAM or other limitation on my
platform)
• crosscompile issue (maybe related to endianness or libraries or floating
point?)
There doesn’t seem to be any error message indicating whats gone wrong, so I
am not sure where to start to debug this issue.
I have searched the help files but haven’t found any similar problems.
So I am hoping someone with more expereince with Pocketsphinx can point me in
the right direction? Any ideas?
(A few extra notes
- When running on the arm processor there seems to be some very big pauses in the output (twice for upto 4-5 seconds?). I have assumed this is because the processor is much slower than the i386 machine- but maybe this is wrong? Maybe when running on arm something is going wrong so this is actually a symptom of my problem?)
– I have compiled arm versions that use fixed point, softfloat and hard float
arithmetic . Fixed Point and softfloat fail with the same error. Hard float
version crashes with illegal memory access near the start of execution….I
think this is likely an embedded linux environment issue though)
Finally a couple of quick ‘newbie’ questions:-
Q2) What are my chances of achieving realtime recognition using the Turtle
model/dictionary with an ARM 200Mhz?.
Q3) Would I be best to use fixed point , soft float or hard float to best
achieve realtime recognition?
Best Regards
Helibot
*Full output of the unsuccessful execution on arm processor*
./run_test.sh
/bin/sh
INFO: cmd_ln.c(512): Parsing command line:
../pocketsphinx_batch \
-hmm hmm \
-lm lm/tidigits.DMP \
-dict dict/tidigits.dic \
-ctl hmtest.ctl \
-cepdir . \
-hyp hmtest.match
Current configuration:
-adchdr 0 0
-adcin no no
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 nan
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-build_outdirs yes yes
-cepdir .
-cepext .mfc .mfc
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-ctl hmtest.ctl
-ctlcount -1 -1
-ctlincr 1 1
-ctloffset 0 0
-ctm
-debug 0
-dict dict/tidigits.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgctl
-fsgdir
-fsgext
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 nan
-fwdtree yes yes
-hmm hmm
-hyp hmtest.match
-hypseg
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm lm/tidigits.DMP
-lmctl
-lmname default default
-lmnamectl
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 nan
-lponlybeam 7e-29 nan
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mllrctl
-mllrdir
-mllrext
-mmap yes yes
-nbest 0 0
-nbestdir
-nbestext .hyp .hyp
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-outlatdir
-pbeam 1e-48 nan
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 nan
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-dither yes \
-lowerf 1 \
-upperf 4000 \
-nfilt 20 \
-transform dct \
-round_filters no \
-remove_dc yes \
-wlen 0.025 \
-feat s2_4x \
-agc none \
-cmn current \
-cmninit 63,-1,1 \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 63,-1,1
-dither no yes
-doublebw no no
-feat 1s_c_d_dd s2_4x
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from
hmm/feat.params
INFO: fe_interface.c(288): You are using the internal mechanism to generate
theseed.
INFO: feat.c(848): Initializing feature stream to type: 's2_4x', ceplen=13,
CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition: hmm/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef
file
INFO: bin_mdef.c(330): Reading binary model definition: hmm/mdef
INFO: bin_mdef.c(508): 34 CI-phone, 396 CD-phone, 5 emitstate/phone, 170 CI-
sen, 670 Sen, 222 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices:
hmm/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: hmm/means
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
256x12 256x24 256x3 256x12
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: hmm/variances
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
256x12 256x24 256x3 256x12
INFO: ms_gauden.c(356): 90 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file hmm/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0 0
INFO: dict.c(294): Allocating 4107 * 20 bytes (80 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: dict/tidigits.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 34^3 * 2 bytes (76 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 14008 bytes (13 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 14008 bytes (13 KiB) for single-phone word
triphones
ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(195): ngrams 1=14, 2=1, 3=0
INFO: ngram_model_dmp.c(241): 14 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(289): 1 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(338): 2 = LM.prob2 entries read
INFO: ngram_model_dmp.c(461): 14 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 10 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 5 single-phone
words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 5
single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 140
INFO: ngram_search_fwdtree.c(333): after: 10 root, 12 non-root channels, 4
single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
-1.34 -0.21 -0.42 -0.74
INFO: ngram_search_fwdtree.c(1513): 0 words recognized (0/fr)
INFO: ngram_search_fwdtree.c(1515): 1 senones evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1517): 1 channels searched (0/fr), 0 1st, 1last
INFO: ngram_search_fwdtree.c(1521): 1 words for which last channels evaluated
(0/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0 words
INFO: ngram_search_fwdflat.c(912): 0 words recognized (0/fr)
INFO: ngram_search_fwdflat.c(914): 2121 senones evaluated (5/fr)
INFO: ngram_search_fwdflat.c(916): 425 channels searched (0/fr)
INFO: ngram_search_fwdflat.c(918): 425 words searched (0/fr)
INFO: ngram_search_fwdflat.c(920): 0 word transitions (0/fr)
ERROR: "ngram_search.c", line 1034: Couldn't find
in first frameINFO: batch.c(661): woman.ak.276317oa: 4.25 seconds speech, 11.11 seconds CPU,
11.37 seconds wall
INFO: batch.c(663): woman.ak.276317oa: 2.61 xRT (CPU), 2.68 xRT (elapsed)
INFO: batch.c(675): TOTAL 4.25 seconds speech, 11.11 seconds CPU, 11.37
secondswall
INFO: batch.c(677): AVERAGE 2.61 xRT (CPU), 2.68 xRT (elapsed)
*END of Full output of the unsuccessful execution*
Just an update .....I didnt mange to trace the problem using version 0.6. But
I went back to sphinxbase0.3 and pocketsphinx0.3 and the recognition will now
run successfully on my arm target (its very slow but it works!).
Now working on getting live detection to work - but I am having truobles with
the OSS implementation on my platform - but I expected this - the sound
drivers and/or hardware is not fully implemented on my arm device :-(.
Also I found that I couldnt use '--with-oss' in the .configure file when
crosscompiling? (I always ended up with no backend supported) Can anyone tell
me the correct procedure to set the sound hardware when cross compiling?
Hello
It's better to consider config.log in order to solve cross-compilation issues
including the issues with alsa. For example, modern practice is to use --host
and --build options. --target means something different.
As for initial issue, it's better to investigate it comparing the computation
on arm and on i386 and looking for the difference.