Menu

Failed word recognition when running on ARM

Help
Helibot
2011-03-06
2012-09-22
  • Helibot

    Helibot - 2011-03-06

    Hi all,
    I am new to Pockesphinx, but quite used to cross compile and running on
    embedded devices. I am having trouble get word recognition when running on an
    arm platform (but its OK on the i386 platform).
    I am trying to run pocketsphinx on an 200Mhz ARM processor.
    I have used pocketsphinx-0.6.1.tar.gz and sphinxbase-0.6.1.tar.gz

    I compiled pocketsphinx and sphinxbase on Ubuntu 10.1 and ‘make check’ works
    OK.
    Next I crosscompiled sphinxbase and pocketsphinx by using this configure
    command line:-
    ./configure --prefix=/home/user/sphinx_arm/install --exec-
    prefix=/home/user/sphinx_arm/install --host=i386 --target=arm CC=arm-
    softfloat-linux-gnu-gcc CFLAGS=”-march=armv4”
    This successfully allows me to make a statically linked executables for the
    arm processor.

    I then created a minimum/basic test case with these steps:-
    I copied the hmm and lm directories from the install directory (specified in
    the .configure) and created a test.ctl file that just contained
    “woman.ak.276317oa”
    I copied the woman.ak.276317oa file.
    Then I ran the following command:-
    pocketsphinx_batch \
    -hmm hmm \
    -lm lm/tidigits.DMP \
    -dict lm/tidigits.dic \
    -ctl test.ctl \
    -cepdir . \
    -hyp test.match

    And I can successfully this test case in Ubuntu 10.1 – it shows it recognized
    TWO SEVEN SIX THREE ONE SEVEN OH in test.match
    Finally I copied the arm binaries and the test case (same files that worked
    for Ubuntu) to the Arm target.
    The executable runs successfully but word recognition fails.

    I compared the output of the arm and i386 runs. I can see the a difference on
    the arm output is :-
    ….
    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
    -1.34 -0.21 -0.42 -0.74
    INFO: ngram_search_fwdtree.c(1513): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1515): 1 senones evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1517): 1 channels searched (0/fr), 0 1st, 1last
    INFO: ngram_search_fwdtree.c(1521): 1 words for which last channels evaluated
    (0/fr)
    INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(912): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(914): 2121 senones evaluated (5/fr)
    INFO: ngram_search_fwdflat.c(916): 425 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(918): 425 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(920): 0 word transitions (0/fr)
    ERROR: "ngram_search.c", line 1034: Couldn't find in first frame…
    (NOTE FULL output for the unsuccessful arm execution is included below.)

    Whereas the i386 (ubuntu10.1) output correctly shows :-

    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
    -1.34 -0.21 -0.42 -0.74
    INFO: ngram_search_fwdtree.c(1513): 545 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1515): 32626 senones evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1517): 7590 channels searched (0/fr), 0 1st, 1
    last
    INFO: ngram_search_fwdtree.c(1521): 1342 words for which last channels
    evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1524): 427 candidate words for entering last
    phone (0/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 7 words
    INFO: ngram_search_fwdflat.c(912): 301 words recognized (1/fr)
    INFO: ngram_search_fwdflat.c(914): 14042 senones evaluated (33/fr)
    INFO: ngram_search_fwdflat.c(916): 3679 channels searched (8/fr)
    INFO: ngram_search_fwdflat.c(918): 1065 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(920): 548 word transitions (1/fr)
    WARNING: "ngram_search.c", line 1087:
    not found in last frame, using
    <sil> instead
    (NOTE FULL output for the unsuccessful arm execution is included below.)

    Seems that the forward tree search has failed on ARM but OK on the i386? </sil>

    Q1) Can anyone suggest the reason why the same test case runs on i386 but
    fails on arm?
    I expect it is related to one of these:-
    • the test case setup
    • inherent problem running on the Arm (maybe RAM or other limitation on my
    platform)
    • crosscompile issue (maybe related to endianness or libraries or floating
    point?)

    There doesn’t seem to be any error message indicating whats gone wrong, so I
    am not sure where to start to debug this issue.
    I have searched the help files but haven’t found any similar problems.
    So I am hoping someone with more expereince with Pocketsphinx can point me in
    the right direction? Any ideas?

    (A few extra notes
    - When running on the arm processor there seems to be some very big pauses in the output (twice for upto 4-5 seconds?). I have assumed this is because the processor is much slower than the i386 machine- but maybe this is wrong? Maybe when running on arm something is going wrong so this is actually a symptom of my problem?)
    – I have compiled arm versions that use fixed point, softfloat and hard float
    arithmetic . Fixed Point and softfloat fail with the same error. Hard float
    version crashes with illegal memory access near the start of execution….I
    think this is likely an embedded linux environment issue though)

    Finally a couple of quick ‘newbie’ questions:-
    Q2) What are my chances of achieving realtime recognition using the Turtle
    model/dictionary with an ARM 200Mhz?.
    Q3) Would I be best to use fixed point , soft float or hard float to best
    achieve realtime recognition?

    Best Regards
    Helibot

    *Full output of the unsuccessful execution on arm processor*

    ./run_test.sh

    /bin/sh
    INFO: cmd_ln.c(512): Parsing command line:
    ../pocketsphinx_batch \
    -hmm hmm \
    -lm lm/tidigits.DMP \
    -dict dict/tidigits.dic \
    -ctl hmtest.ctl \
    -cepdir . \
    -hyp hmtest.match

    Current configuration:

    -adchdr 0 0
    -adcin no no
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -backtrace no no
    -beam 1e-48 nan
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -build_outdirs yes yes
    -cepdir .
    -cepext .mfc .mfc
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -ctl hmtest.ctl
    -ctlcount -1 -1
    -ctlincr 1 1
    -ctloffset 0 0
    -ctm
    -debug 0
    -dict dict/tidigits.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgctl
    -fsgdir
    -fsgext
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 nan
    -fwdtree yes yes
    -hmm hmm
    -hyp hmtest.match
    -hypseg
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm lm/tidigits.DMP
    -lmctl
    -lmname default default
    -lmnamectl
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 nan
    -lponlybeam 7e-29 nan
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mllrctl
    -mllrdir
    -mllrext
    -mmap yes yes
    -nbest 0 0
    -nbestdir
    -nbestext .hyp .hyp
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -outlatdir
    -pbeam 1e-48 nan
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 nan
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(512): Parsing command line:

    \
    -dither yes \
    -lowerf 1 \
    -upperf 4000 \
    -nfilt 20 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -wlen 0.025 \
    -feat s2_4x \
    -agc none \
    -cmn current \
    -cmninit 63,-1,1 \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 63,-1,1
    -dither no yes
    -doublebw no no
    -feat 1s_c_d_dd s2_4x
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.000000e+00
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 20
    -remove_dc no yes
    -round_filters yes no
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 4.000000e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.500000e-02

    INFO: acmod.c(238): Parsed model-specific feature parameters from
    hmm/feat.params
    INFO: fe_interface.c(288): You are using the internal mechanism to generate
    theseed.
    INFO: feat.c(848): Initializing feature stream to type: 's2_4x', ceplen=13,
    CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: mdef.c(520): Reading model definition: hmm/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef
    file
    INFO: bin_mdef.c(330): Reading binary model definition: hmm/mdef
    INFO: bin_mdef.c(508): 34 CI-phone, 396 CD-phone, 5 emitstate/phone, 170 CI-
    sen, 670 Sen, 222 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    hmm/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: hmm/means
    INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
    256x12 256x24 256x3 256x12
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: hmm/variances
    INFO: ms_gauden.c(292): 1 codebook, 4 feature, size
    256x12 256x24 256x3 256x12
    INFO: ms_gauden.c(356): 90 variance values floored
    INFO: s2_semi_mgau.c(897): Loading senones from dump file hmm/sendump
    INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0 0
    INFO: dict.c(294): Allocating 4107 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(306): Reading main dictionary: dict/tidigits.dic
    INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(309): 11 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(405): Allocating 34^3 * 2 bytes (76 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 14008 bytes (13 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 14008 bytes (13 KiB) for single-phone word
    triphones
    ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
    INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(195): ngrams 1=14, 2=1, 3=0
    INFO: ngram_model_dmp.c(241): 14 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(289): 1 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(338): 2 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(461): 14 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 10 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 5 single-phone
    words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 5
    single-phone words
    INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 140
    INFO: ngram_search_fwdtree.c(333): after: 10 root, 12 non-root channels, 4
    single-phone words
    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 37.98 -1.19 0.37 0.95 -1.53 -1.38 -0.19 -1.04 -0.13
    -1.34 -0.21 -0.42 -0.74
    INFO: ngram_search_fwdtree.c(1513): 0 words recognized (0/fr)
    INFO: ngram_search_fwdtree.c(1515): 1 senones evaluated (0/fr)
    INFO: ngram_search_fwdtree.c(1517): 1 channels searched (0/fr), 0 1st, 1last
    INFO: ngram_search_fwdtree.c(1521): 1 words for which last channels evaluated
    (0/fr)
    INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
    (0/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdflat.c(912): 0 words recognized (0/fr)
    INFO: ngram_search_fwdflat.c(914): 2121 senones evaluated (5/fr)
    INFO: ngram_search_fwdflat.c(916): 425 channels searched (0/fr)
    INFO: ngram_search_fwdflat.c(918): 425 words searched (0/fr)
    INFO: ngram_search_fwdflat.c(920): 0 word transitions (0/fr)
    ERROR: "ngram_search.c", line 1034: Couldn't find in first frame
    INFO: batch.c(661): woman.ak.276317oa: 4.25 seconds speech, 11.11 seconds CPU,
    11.37 seconds wall
    INFO: batch.c(663): woman.ak.276317oa: 2.61 xRT (CPU), 2.68 xRT (elapsed)
    INFO: batch.c(675): TOTAL 4.25 seconds speech, 11.11 seconds CPU, 11.37
    secondswall
    INFO: batch.c(677): AVERAGE 2.61 xRT (CPU), 2.68 xRT (elapsed)
    *END of Full output of the unsuccessful execution*

     
  • Helibot

    Helibot - 2011-03-10

    Just an update .....I didnt mange to trace the problem using version 0.6. But
    I went back to sphinxbase0.3 and pocketsphinx0.3 and the recognition will now
    run successfully on my arm target (its very slow but it works!).
    Now working on getting live detection to work - but I am having truobles with
    the OSS implementation on my platform - but I expected this - the sound
    drivers and/or hardware is not fully implemented on my arm device :-(.
    Also I found that I couldnt use '--with-oss' in the .configure file when
    crosscompiling? (I always ended up with no backend supported) Can anyone tell
    me the correct procedure to set the sound hardware when cross compiling?

     
  • Nickolay V. Shmyrev

    Hello

    It's better to consider config.log in order to solve cross-compilation issues
    including the issues with alsa. For example, modern practice is to use --host
    and --build options. --target means something different.

    As for initial issue, it's better to investigate it comparing the computation
    on arm and on i386 and looking for the difference.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.