CMU Sphinx / Forums / Help: PocketSphinx keyword spotting from Java

Stephen McCants - 2016-09-22

Hello All,

I'm trying to use PocketSphinx for keyword spotting inside a Java application. I've used swig to successfully build the JNI library and have a simple program that links to the libraries and will do regular speech recognition. I'm now trying to switch it to keyword mode and I'm getting the following error:

Exception in thread "main" java.lang.RuntimeException: Decoder_setSearch returned -1
at edu.cmu.pocketsphinx.PocketSphinxJNI.Decoder_setSearch(Native Method)
at edu.cmu.pocketsphinx.Decoder.setSearch(Decoder.java:181)
at pocket.DecoderTest.main(DecoderTest.java:111)

Here is my Java code that leads up to that error:

Config c = Decoder.defaultConfig(); // Setup the dictionary c.setString("-hmm", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us"); c.setString("-lm", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us.lm.bin"); c.setString("-dict", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/orcontrol.dict"); // Set it up to detect the key phrase c.setString("-keyphrase", "or(3) control"); c.setFloat("-kws_threshold", 1e-1); // Build the decoder Decoder d = new Decoder(c); d.setSearch("KEYPHRASE");

The error is on "d.setSearch("KEYPHRASE"). I've tried a variety of values there ("keyword", etc.), but haven't had any luck and can't find any documentation or where it is used in the code to know what values are legitimate.

What is the correct way to switch the decoder to keyword mode?

Thanks in advance.
--Stephen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stephen McCants - 2016-09-22

I'm still not sure I understand.

The only log I've seen is what is dumped to stderr (below). Maybe I'm missing another error message? I tried using d.setSearch("or control") and d.setSearch("or(3) control"), which match in my dictionary, but that didn't seem to make any difference.

Stderr:

INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live live
-cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
-compallsen no no
-debug 0
-dict /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/orcontrol.dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us
-input_endian little little
-jsgf
-keyphrase or(3) control
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e-01
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us.lm.bin
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none'
INFO: acmod.c(166): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/mdef
INFO: bin_mdef.c(181): Allocating 142108 * 8 bytes (1110 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138825 * 32 bytes (4338 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/orcontrol.dict
INFO: dict.c(213): Dictionary size 134724, allocated 1016 KiB for strings, 1679 KiB for phones
INFO: dict.c(336): 134724 words read
INFO: dict.c(358): Reading filler dictionary: /home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us/noisedict
INFO: dict.c(213): Dictionary size 134729, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
INFO: kws_search.c(406): KWS(beam: -1080, plp: -23, default threshold -23, delay 10)
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152610
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152482 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: kws_search.c(448): TOTAL kws 0.00 CPU -nan xRT
INFO: kws_search.c(451): TOTAL kws 0.00 wall -nan xRT
Exception in thread "main" java.lang.RuntimeException: Decoder_setSearch returned -1
at edu.cmu.pocketsphinx.PocketSphinxJNI.Decoder_setSearch(Native Method)
at edu.cmu.pocketsphinx.Decoder.setSearch(Decoder.java:181)
at pocket.DecoderTest.main(DecoderTest.java:111)
Java Result: 1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-09-22
  
  Sorry, I was not reading your post in details
  
  If you want to switch between searches, you need to add them with names with decoder.setLm(..) and decoder.setKeyphrase(..), not configure through configuration.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thanks Nickolay for your help. I was able to get it working and detecting the key word pretty well.

For other readers, here is the solution I ended up with:

        Config c = Decoder.defaultConfig();
        // Setup the dictionary
        c.setString("-hmm", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us");
        c.setString("-lm", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us.lm.bin");
        c.setString("-dict", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/orcontrol.dict");
        // Set it up to detect the key phrase
        c.setString("-keyphrase", "or(3) control");
        c.setFloat("-kws_threshold", 1e-1);
        // Build the decoder
        Decoder d = new Decoder(c);
        System.out.println("Search: "+d.getSearch());
        d.setLmFile("keyword", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us.lm.bin");
//        d.setKeyphrase("keyword", "or(3) control");
        d.setKws("keyword", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/keyword");
        d.setSearch("keyword");

        // Go through sample files and see if we hear the keyword

First attempt got 5 of the 6 files correct! With some further work, I think this could be good.

Nickolay V. Shmyrev - 2016-09-23

d.setLmFile("keyword", "/home/smm/hcs/orc.trunk2/nb/Speech/resources/model/en-us.lm.bin");

LM search should have different name if you want to use it. In your case keyword search replaces lm search since you give it the same name.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-09-23
  
  Also you should not configure -lm in configuration.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

PocketSphinx keyword spotting from Java

Speech Recognition Toolkit

Forums

Help

PocketSphinx keyword spotting from Java document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

PocketSphinx keyword spotting from Java