Menu

Pocketsphinx on Freescale 8308

Help
2016-10-05
2016-11-16
  • Russ Pitman

    Russ Pitman - 2016-10-05

    Hi all,

    I have been playing with pockesphinx on a intel based PC with a Linux Ubuntu installation
    and I have been getting some good results (90%) with our tailored language model. I am using
    a set of WAV files as test data.

    I have come to trying to get this to work on a NXP8308 based PPC card, with Linux OS, and
    this is where I am having issues. Pocketsphinx builds and runs without any reported errors but
    I seem to have lost the recognition! If I leave the default endian to big, then all I get is empty recognition
    but setting the '-input_endian little' gives me some recognition but down at the 5% mark on the same
    input data.

    My questions are, has anyone been using PPC for running this? If so are there any compiler, pocketshinx
    settings that I should eb looking at? Could this be a timing issue as the processing core on the NXP8308 is
    a lot slower thatn the PC based system?

    Thanks

    Russ

     
    • Nickolay V. Shmyrev

      My questions are, has anyone been using PPC for running this?

      No, you are the first.

      If so are there any compiler, pocketshinx settings that I should eb looking at?

      There are no specific settings.

      In order to debug this issue, you need to pinpoint it to a specific file and try to ensure all scores and intermediate values in decoding process are the same. You can compare acoustic scores and language model scores for example.

      You can try to reproduce this problem in qemu, that would help us to debug this issue.

      Could this be a timing issue as the processing core on the NXP8308 is a lot slower thatn the PC based system?

      Unlikely.

       
  • Russ Pitman

    Russ Pitman - 2016-10-05

    Thanks for the reply. These are the 2 runs that I did, everything up until the batch.c - decoding is the same. The audio test file is a WAV that just contains the speech 'UNDO'.

    The configuration for both runs were:

    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40,3,-1
    -compallsen no no
    -debug 0
    -dict /en-dvi/dvi-en-us.dict
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /en-dvi/en-us
    -input_endian big little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 22
    -lm /en-dvi/dvi.lm
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02​

    Working on PC based Linux....

    INFO: batch.c(729): Decoding 'dvi_0001'
    INFO: cmn.c(183): CMN: 46.62 11.12 -15.49 30.30 -14.97 -22.92 -1.66 -10.51 -12.51 -1.91 3.24 6.40 -10.64
    INFO: ngram_search_fwdtree.c(1553): 424 words recognized (4/fr)
    INFO: ngram_search_fwdtree.c(1555): 48962 senones evaluated (466/fr)
    INFO: ngram_search_fwdtree.c(1559): 21922 channels searched (208/fr), 8100 1st, 3851 last
    INFO: ngram_search_fwdtree.c(1562): 532 words for which last channels evaluated (5/fr)
    INFO: ngram_search_fwdtree.c(1564): 506 candidate words for entering last phone (4/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.07 CPU 0.065 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.19 wall 0.177 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 7 words
    INFO: ngram_search_fwdflat.c(948): 466 words recognized (4/fr)
    INFO: ngram_search_fwdflat.c(950): 11888 senones evaluated (113/fr)
    INFO: ngram_search_fwdflat.c(952): 7149 channels searched (68/fr)
    INFO: ngram_search_fwdflat.c(954): 847 words searched (8/fr)
    INFO: ngram_search_fwdflat.c(957): 288 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.03 CPU 0.027 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.03 wall 0.024 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .71
    INFO: ngram_search.c(1279): Eliminated 2 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 141 nodes, 106 links
    INFO: ps_lattice.c(1380): Bestpath score: -2515
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:71:103) = -195297
    INFO: ps_lattice.c(1441): Joint P(O,S) = -205515 P(S|O) = -10218
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
    INFO: batch.c(761): dvi_0001: 1.04 seconds speech, 0.10 seconds CPU, 0.21 seconds wall
    INFO: batch.c(763): dvi_0001: 0.09 xRT (CPU), 0.20 xRT (elapsed)
    undo (dvi_0001 -2770)
    dvi_0001 done --------------------------------------
    INFO: batch.c(778): TOTAL 1.04 seconds speech, 0.10 seconds CPU, 0.21 seconds wall
    INFO: batch.c(780): AVERAGE 0.09 xRT (CPU), 0.20 xRT (elapsed)
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.07 CPU 0.065 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.19 wall 0.178 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.03 CPU 0.027 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.03 wall 0.025 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT​

    Non-working on 8308 based Linux with little endian set and the same WAV file.

    INFO: batch.c(729): Decoding 'dvi_0001'
    INFO: cmn.c(183): CMN: 46.62 11.11 -15.49 30.28 -14.96 -22.94 -1.67 -10.52 -12.5
    2 -1.93 3.22 6.39 -10.65
    INFO: ngram_search_fwdtree.c(1553): 399 words recognized (4/fr)
    INFO: ngram_search_fwdtree.c(1555): 23001 senones evaluated (219/fr)
    INFO: ngram_search_fwdtree.c(1559): 11418 channels searched (108/fr), 2527 1s
    t, 4919 last
    INFO: ngram_search_fwdtree.c(1562): 448 words for which last channels evalu
    ated (4/fr)
    INFO: ngram_search_fwdtree.c(1564): 319 candidate words for entering last p
    hone (3/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 1.97 CPU 1.876 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 1.97 wall 1.876 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 8 words
    INFO: ngram_search_fwdflat.c(948): 592 words recognized (6/fr)
    INFO: ngram_search_fwdflat.c(950): 15743 senones evaluated (150/fr)
    INFO: ngram_search_fwdflat.c(952): 11665 channels searched (111/fr)
    INFO: ngram_search_fwdflat.c(954): 1113 words searched (10/fr)
    INFO: ngram_search_fwdflat.c(957): 326 word transitions (3/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.98 CPU 0.930 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.98 wall 0.930 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .71
    INFO: ngram_search.c(1279): Eliminated 2 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 115 nodes, 129 links
    INFO: ps_lattice.c(1380): Bestpath score: -5280
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:71:103) = -350950
    INFO: ps_lattice.c(1441): Joint P(O,S) = -427149 P(S|O) = -76199
    INFO: ngram_search.c(875): bestpath 0.01 CPU 0.010 xRT
    INFO: ngram_search.c(878): bestpath 0.01 wall 0.009 xRT
    INFO: batch.c(761): dvi_0001: 1.04 seconds speech, 2.96 seconds CPU, 2.96 s
    econds wall
    INFO: batch.c(763): dvi_0001: 2.84 xRT (CPU), 2.84 xRT (elapsed)
    one down (dvi_0001 -4823)
    dvi_0001 done --------------------------------------
    INFO: batch.c(778): TOTAL 1.04 seconds speech, 2.96 seconds CPU, 2.96 seconds wa
    ll
    INFO: batch.c(780): AVERAGE 2.84 xRT (CPU), 2.84 xRT (elapsed)
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 1.97 CPU 1.894 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 1.97 wall 1.894 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.98 CPU 0.938 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.98 wall 0.938 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.01 CPU 0.010 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.01 wall 0.009 xRT​

    Thanx

    Russ

     

    Last edit: Russ Pitman 2016-10-05
    • Nickolay V. Shmyrev

      Ok, so the scores are different. I suspect something wrong with the language model scores since we didn't test this part. Could you test with pocketsphinx-0.8 and sphinxbase-0.8 if that works?

      Also, do you compile on the device or cross-compile? Did you run the tests in sphinxbase?

       
      • Nickolay V. Shmyrev

        Ok, I was able to reproduce this thing in qemu, our new lm code is does not work in big endian. sphinxbase ngram tests fail. So just another issue to fix.

         
  • Russ Pitman

    Russ Pitman - 2016-10-06

    Hi there,

    We have not tried 0.8 yet, would this issue be in that version also?

    Thanks

    Russ

     
    • Nickolay V. Shmyrev

      0.8 should be fine.

      I'll try to fix this issue in coming days.

       
  • Russ Pitman

    Russ Pitman - 2016-10-06

    Fantastic! If you get it fixed, please let me know and I will give it a go :-)

     

    Last edit: Russ Pitman 2016-10-06
  • Russ Pitman

    Russ Pitman - 2016-10-11

    Hi Nickolay,

    Have you had any chance to look at this issue?

    regards

    Russ

     
    • Nickolay V. Shmyrev

      Sorry, I didn't have time to look yet, it will take few more days.

       
  • Russ Pitman

    Russ Pitman - 2016-10-13

    HI there,

    I have made some progress on this today.

    I constructed a grammar file and this seems to work ok on the target hardware, no language model to worry about for the moment. I am going to start updating my target application and see what sort of timings I have. I am assuming that parts of the pocketsphinx API can be called directly, to save time, so that I dont have to keep calling pocketshinx_batch?

    Russ

     

    Last edit: Russ Pitman 2016-10-13
  • Russ Pitman

    Russ Pitman - 2016-10-13

    Thanks, will have a look. What does the semi continuous model do?

    I agree that more powerful hardware would be better, this is board is just for me to get a feel of what is needed.

    I will have a look at keyword spotting.

    Many thanks

    Russ

     

    Last edit: Russ Pitman 2016-10-13
  • Russ Pitman

    Russ Pitman - 2016-11-02

    Just an update, Pocketsphinx is running on the 8308 and the semi-continuous model does improve things BUT I ma still looking at ways to improve latency. I am using the grammar file rather than LM as this seems faster. I only have 112 words in the dictionary and a limited grammar as this is for controlling a multifunction display but as the user has only certain phrases he is allowed to use I feel that the grammar file would be sufficicnet.

    Russ

     
    • Nickolay V. Shmyrev

      Hi Russ

      There are various config options you can use to make it faster and keep accuracy - beams (-beam, -wbeam, -pbeam), downsampling (-ds), topn gaussian (-topn), phoneme loop (-pl_window). You can check

      http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds

      You only need to test accuracy on a test data while chaning the parameters.

       
  • Russ Pitman

    Russ Pitman - 2016-11-16

    HI Nickolay,

    Thanks for the info, I will go and have a look :-)

    I have another question about word detection speeds and speech gaps.
    In my JSGF file I have an entery as such: page ( up* | down*)+
    I find that if I speak at a normal speed something like 'page up up down' I tend to get recognised PAGE UP UP UP UP DOWN but yesterday I decided to try it and talk a little faster and this DID manage to get PAGE UP UP DOWN!
    Is there any threshold settings to allow for a more normalized speed of speech?

    Thanks

    Russ

     

    Last edit: Nickolay V. Shmyrev 2016-11-16
    • Nickolay V. Shmyrev

      In my JSGF file I have an entery as such: page ( up | down)+

      You can simplify that to more simple page (up | down)+

      Is there any threshold settings to allow for a more normalized speed of speech?

      There are many parameters, for example, word insertion penalty -wip, but to optimize them you need to prepare a test set as described in http://cmusphinx.sourceforge.net/wiki/tutorialtuning

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.