Menu

Correctly instantiating decoder via java for kws? Null hyp returned. Working via commandline.

Help
TheEditor
2016-08-09
2016-08-10
  • TheEditor

    TheEditor - 2016-08-09

    Trying to get kws working via java. It's almost working except I get a null hypothesis every time.

    This is from the cmusphinx faq on how to implement this.

         ps_set_keyphrase(ps, "keyphrase_search", "oh mighty computer");
         ps_set_search(ps, "keyphrase_search);
         ps_start_utt();
         /* process data */
    

    There is a setKeyphrase, and a setKws method in decoder.java. That being said the only example I really found was set using config. This is my "working" code below. Doesn't error but is always null. ( If there is an examply of kws in java I cannot find it. The ONLY thing I have really found is the decodertest in the github repo. )

    import javax.sound.sampled.AudioFormat;
    import javax.sound.sampled.AudioSystem;
    import javax.sound.sampled.DataLine;
    import javax.sound.sampled.TargetDataLine;
    import java.io.ByteArrayOutputStream;
    import java.io.IOException;
    import java.nio.ByteBuffer;
    import java.nio.ByteOrder;
    import java.util.Arrays;
    
    import edu.cmu.pocketsphinx.Decoder;
    import edu.cmu.pocketsphinx.Config;
    import edu.cmu.pocketsphinx.Hypothesis;
    
    public class Controller {
    static {
    System.loadLibrary("pocketsphinx_jni");
    }
    
    private static ByteArrayOutputStream out;
    
    public static void main(String args[]) {
    
    AudioFormat format = new AudioFormat(44100, 16, 1, true, true);
    TargetDataLine targetLine = null;
    DataLine.Info targetInfo = new DataLine.Info(TargetDataLine.class, format);
    boolean running = true;
    
    try {
    
        targetLine = AudioSystem.getTargetDataLine(format);
        targetLine.open();
        out = new ByteArrayOutputStream();
        int numBytesRead;
        byte[] data = new byte[targetLine.getBufferSize() / 5];
    
        Config c = Decoder.defaultConfig();
        c.setString("-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us/");
        c.setString("-dict", "/usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict");
        c.setString("-keyphrase", "abomination");
        c.setFloat("-kws_threshold", 1e-20);
    
        Decoder d = new Decoder(c);
        d.setRawdataSize(300000);
    
        targetLine.start();
        System.out.println("Recorder started");
    
        byte[] b = new byte[4096];
    
        d.startUtt();
    
        System.out.println("Decoder started");
    
        while ((running)) {
            int nbytes;
            short[] s = null;
            nbytes = targetLine.read(b,0,b.length);
    
            ByteBuffer bb = ByteBuffer.wrap(b, 0, nbytes);
            s = new short[nbytes/2];
    
            bb.asShortBuffer().get(s);
    
            d.processRaw(s, nbytes/2, false, false);
    
            if (nbytes > 0) {
    
                Hypothesis hypothesis = d.hyp();
                if (hypothesis != null) {
                    System.out.println("------------------------------------------------------");
                    System.out.println(hypothesis.getHypstr());
                    System.out.println("------------------------------------------------------");
    
                    d.endUtt();
                    d.startUtt();
                }
            }
        }
    
    }
    catch (Exception e) {
        System.err.println(e);
    }
    }
    }
    

    I've also tried adding the line:

    c.setString("-kws", "/path/to/keyword.list");
    

    I know that kws works via commandline using the same above keyword list which contains:

    abomination /le-20/
    

    When run, it hits almost every time.

    When I run the above java code I get output every time I speak.

    INFO: cmn_prior.c(99): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
    INFO: cmn_prior.c(116): cmn_prior_update: to   < 51.22 14.61 -8.72 -0.31 -3.49  0.18 -7.35  8.43 -0.77  7.64  1.41  0.27 -1.82 >
    INFO: cmn_prior.c(99): cmn_prior_update: from < 51.22 14.61 -8.72 -0.31 -3.49  0.18 -7.35  8.43 -0.77  7.64  1.41  0.27 -1.82 >
    INFO: cmn_prior.c(116): cmn_prior_update: to   < 51.80 15.37 -8.77 -0.62 -2.74  0.09 -6.18 10.24  0.14  7.79  2.59  1.86 -3.22 >
    

    The numbers change slightly. You never see any output of kws as you do when doing it via commandline.

    Also. If I add in a line:

    c.setString("-lm", "/path/to/en-us.lm.bin");
    

    Then I get hypothesis every time. It's totally wrong every time but I get one.

    Thanks

     
  • TheEditor

    TheEditor - 2016-08-10

    This is the output when run. -kws is not set.

    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us//feat.params
    Current configuration:
    [NAME]          [DEFLT]     [VALUE]
    -agc            none        none
    -agcthresh      2.0     2.000000e+00
    -allphone               
    -allphone_ci        no      no
    -alpha          0.97        9.700000e-01
    -ascale         20.0        2.000000e+01
    -aw         1       1
    -backtrace      no      no
    -beam           1e-48       1.000000e-48
    -bestpath       yes     yes
    -bestpathlw     9.5     9.500000e+00
    -ceplen         13      13
    -cmn            current     current
    -cmninit        8.0     40,3,-1
    -compallsen     no      no
    -debug                  0
    -dict                   /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
    -dictcase       no      no
    -dither         no      no
    -doublebw       no      no
    -ds         1       1
    -fdict                  
    -feat           1s_c_d_dd   1s_c_d_dd
    -featparams             
    -fillprob       1e-8        1.000000e-08
    -frate          100     100
    -fsg                    
    -fsgusealtpron      yes     yes
    -fsgusefiller       yes     yes
    -fwdflat        yes     yes
    -fwdflatbeam        1e-64       1.000000e-64
    -fwdflatefwid       4       4
    -fwdflatlw      8.5     8.500000e+00
    -fwdflatsfwin       25      25
    -fwdflatwbeam       7e-29       7.000000e-29
    -fwdtree        yes     yes
    -hmm                    /usr/local/share/pocketsphinx/model/en-us/en-us/
    -input_endian       little      little
    -jsgf                   
    -keyphrase              abomination
    -kws                    
    -kws_delay      10      10
    -kws_plp        1e-1        1.000000e-01
    -kws_threshold      1       1.000000e-20
    -latsize        5000        5000
    -lda                    
    -ldadim         0       0
    -lifter         0       22
    -lm                 
    -lmctl                  
    -lmname                 
    -logbase        1.0001      1.000100e+00
    -logfn                  
    -logspec        no      no
    -lowerf         133.33334   1.300000e+02
    -lpbeam         1e-40       1.000000e-40
    -lponlybeam     7e-29       7.000000e-29
    -lw         6.5     6.500000e+00
    -maxhmmpf       30000       30000
    -maxwpf         -1      -1
    -mdef                   
    -mean                   
    -mfclogdir              
    -min_endfr      0       0
    -mixw                   
    -mixwfloor      0.0000001   1.000000e-07
    -mllr                   
    -mmap           yes     yes
    -ncep           13      13
    -nfft           512     512
    -nfilt          40      25
    -nwpen          1.0     1.000000e+00
    -pbeam          1e-48       1.000000e-48
    -pip            1.0     1.000000e+00
    -pl_beam        1e-10       1.000000e-10
    -pl_pbeam       1e-10       1.000000e-10
    -pl_pip         1.0     1.000000e+00
    -pl_weight      3.0     3.000000e+00
    -pl_window      5       5
    -rawlogdir              
    -remove_dc      no      no
    -remove_noise       yes     yes
    -remove_silence     yes     yes
    -round_filters      yes     yes
    -samprate       16000       1.600000e+04
    -seed           -1      -1
    -sendump                
    -senlogdir              
    -senmgau                
    -silprob        0.005       5.000000e-03
    -smoothspec     no      no
    -svspec                 0-12/13-25/26-38
    -tmat                   
    -tmatfloor      0.0001      1.000000e-04
    -topn           4       4
    -topn_beam      0       0
    -toprule                
    -transform      legacy      dct
    -unit_area      yes     yes
    -upperf         6855.4976   6.800000e+03
    -uw         1.0     1.000000e+00
    -vad_postspeech     50      50
    -vad_prespeech      20      20
    -vad_startspeech    10      10
    -vad_threshold      2.0     2.000000e+00
    -var                    
    -varfloor       0.0001      1.000000e-04
    -varnorm        no      no
    -verbose        no      no
    -warp_params                
    -warp_type      inverse_linear  inverse_linear
    -wbeam          7e-29       7.000000e-29
    -wip            0.65        6.500000e-01
    -wlen           0.025625    2.562500e-02
    
    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us//mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us//mdef
    INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
    INFO: tmat.c(206): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us//transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us//means
    INFO: ms_gauden.c(292): 42 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us//variances
    INFO: ms_gauden.c(292): 42 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(294):  128x13
    INFO: ms_gauden.c(354): 222 variance values floored
    INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us//sendump
    INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
    INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(835): Maximum top-N: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 138623 * 32 bytes (4331 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
    INFO: dict.c(213): Allocated 1014 KiB for strings, 1677 KiB for phones
    INFO: dict.c(336): 134522 words read
    INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us//noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 5 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
    INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -450, delay 10)
    Recorder started
    Decoder started
    

    I found reference to why you don't need the -lm line here.

    Also if you intend to use kws there is no need to use -lm in arguments. You need to remove:
    
    "-lm", ".../model/hub4wsj_sc_8k_adapt/etc/hub4.5000.DMP",
    

    That answers that.

    If I change the above code and remove:

    c.setString("-keyphrase", "abomination");
    

    and add:

    c.setString("-kws", "/home/pennyworth/keyphrase.list");
    

    Now the output shows -kws set and I get this in the output:

    INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -450, delay 10)
    

    Still though, null result.

    INFO: cmn_prior.c(99): cmn_prior_update: from < 73.10 11.10 -10.49  1.23  0.67 -1.37 -5.29  5.17 -0.62  3.91 -0.28  2.56 -2.14 >
    INFO: cmn_prior.c(116): cmn_prior_update: to   < 73.59 11.36 -9.61  2.41  2.13  0.15 -5.09  4.30  1.03  4.46 -0.13  3.36 -0.62 >
    INFO: cmn_prior.c(99): cmn_prior_update: from < 73.59 11.36 -9.61  2.41  2.13  0.15 -5.09  4.30  1.03  4.46 -0.13  3.36 -0.62 >
    INFO: cmn_prior.c(116): cmn_prior_update: to   < 74.65 10.58 -9.76  4.47  3.63  1.19 -5.20  3.74  2.33  4.75 -0.11  3.06 -0.32 >
    INFO: cmn_prior.c(99): cmn_prior_update: from < 74.65 10.58 -9.76  4.47  3.63  1.19 -5.20  3.74  2.33  4.75 -0.11  3.06 -0.32 >
    INFO: cmn_prior.c(116): cmn_prior_update: to   < 77.49 10.99 -8.80  5.45  4.37  2.83 -4.14  4.06  3.48  5.07 -0.41  2.82 -0.35 >
    INFO: cmn_prior.c(99): cmn_prior_update: from < 77.49 10.99 -8.80  5.45  4.37  2.83 -4.14  4.06  3.48  5.07 -0.41  2.82 -0.35 >
    INFO: cmn_prior.c(116): cmn_prior_update: to   < 73.54  9.62 -11.34  3.19  3.30  2.24 -6.61  4.52  1.31  5.99 -1.28  2.24 -0.96 >
    

    Am I assuming wrong that kws doesn't return a hyp? There is a great python example but nada for java regarding kws.

    https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py
    

    I don't even know what to try next.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.