Menu

pocketsphinx threshold query

Help
2016-02-13
2016-06-01
  • neville9763

    neville9763 - 2016-02-13

    Hi

    I am using Android pocketsphinx to recognize an audio file with a customized dictionary.

    All is working well ... possibly a little too well.

    My recognition of a keyword phrase in an audio file is 100%. Whatever value I put into as the keyword threshold I get a match.

    The keyphrase I use is "HELLO MIN EAR".

    My config is:

    c.setString("-hmm", new File(assetDir, "en-us-ptm").toString());
    c.setString("-lm", new File(assetDir, "minear.lm").toString());
    c.setString("-dict", new File(assetDir, "minear.dict").toString());
    c.setFloat("-samprate", 16000.0);
    c.setBoolean("-allphone_ci", true);
    c.setFloat("-kws_threshold", kws_threshold);

    where I vary kws_threshold between 0 and 1 for test purposes, and also using the 'recommended' values of 1e-35f etc.

    My questions are:
    1) Should I be getting 100% recognition even tho I vary the threshold values?
    2) What is the optimum threshold level for my keyphrase? I understand it should be around 1e-35f for the 3 or 4 syllable keyphrase.

    Thanks for any help!!

     

    Last edit: neville9763 2016-02-13
    • Nickolay V. Shmyrev

      My recognition of a keyword phrase in an audio file is 100%. Whatever value I put into as the keyword threshold I get a match.

      It is wrong to call this "100%" recognition, you simply say you have too many false alarms

      1) Should I be getting 100% recognition even tho I vary the threshold values?

      No

      2) What is the optimum threshold level for my keyphrase? I understand it should be around 1e-35f for the 3 or 4 syllable keyphrase.

      Yes

      Thanks for any help!!

      To get help on your issue you'd better provide model files you are using and the logs and the audio file.

       
      • neville9763

        neville9763 - 2016-02-13

        Hi Nickolay,

        Many thanks for your response. It is much appreciated!

        I have attached a zip file with all of the files I am using. The audio file is called minear_keyphrase.wav and the log file is minear.log. The log file contains the results of 10 iterations of processing the keyword audio file. Each iteration is seperated by a log entry of the following nature:

        org.minear D/NS_DEBUG﹕ iteration 0 / 1.000000E-1

        org.minear D/NS_DEBUG﹕ iteration 9 / 1.401298E-45

        where the iteration and threshold value used are show (along with the tag NS_DEBUG).

        Any help or tips you can provide is very much appreciated. Thank you.

        Best

        Neville

        PS If you need any other information, please ask.

         

        Last edit: neville9763 2016-02-13
        • Nickolay V. Shmyrev

          Your file is too short for decoder to estimate amplitude level, if you process file twice second time it will detect keyphrase reliably. Continuous processing requires some time to adapt to amplitude levels.

           
  • neville9763

    neville9763 - 2016-02-18

    Hi Nickolay,

    Thanks for the previous help. It is much appreciated!

    I have followed your advice and played the audio file a number of times but still have an issue where no matter how much I vary the kws threshold, I can only get an overall positive result.

    What I have done is:
    - run the code loop (see below) for 9 iterations
    - each iteration reads the input file 3 times and processes it.
    - I have used values for kws_threshold varying from 1E-045 to 1E036.

    It appears that no matter what the kws_threshold value is set as I cannot get a false value and the probability and best score remain constant - which appears strange. I would have thought that varying the kws_threshold would lead to a varying result.

    The result are listed below where the first two columns are the iteration counters, the third column is the threshold used, fourth column is a boolean result, the fifth and sixth columns are the probability and best score and the last column is the returned recognised string.

    Am I doing something wrong here?

    Many thanks

    Neville

    Log:

    0 0 : 1E-045 false /-79791--7711/ HELLO MIN AHEAD END
    0 1 : 1E-045 true /-36567--4490/ HELLO MIN EAR
    0 2 : 1E-045 true /-36649--4463/ HELLO MIN EAR
    1 0 : 1E-036 false /-79791--7711/ HELLO MIN AHEAD END
    1 1 : 1E-036 true /-36567--4490/ HELLO MIN EAR
    1 2 : 1E-036 true /-36649--4463/ HELLO MIN EAR
    2 0 : 1E-027 false /-79791--7711/ HELLO MIN AHEAD END
    2 1 : 1E-027 true /-36567--4490/ HELLO MIN EAR
    2 2 : 1E-027 true /-36649--4463/ HELLO MIN EAR
    3 0 : 1E-018 false /-79791--7711/ HELLO MIN AHEAD END
    3 1 : 1E-018 true /-36567--4490/ HELLO MIN EAR
    3 2 : 1E-018 true /-36649--4463/ HELLO MIN EAR
    4 0 : 1E-009 false /-79791--7711/ HELLO MIN AHEAD END
    4 1 : 1E-009 true /-36567--4490/ HELLO MIN EAR
    4 2 : 1E-009 true /-36649--4463/ HELLO MIN EAR
    5 0 : 1E000 false /-79791--7711/ HELLO MIN AHEAD END
    5 1 : 1E000 true /-36567--4490/ HELLO MIN EAR
    5 2 : 1E000 true /-36649--4463/ HELLO MIN EAR
    6 0 : 1E009 false /-79791--7711/ HELLO MIN AHEAD END
    6 1 : 1E009 true /-36567--4490/ HELLO MIN EAR
    6 2 : 1E009 true /-36649--4463/ HELLO MIN EAR
    7 0 : 1E018 false /-79791--7711/ HELLO MIN AHEAD END
    7 1 : 1E018 true /-36567--4490/ HELLO MIN EAR
    7 2 : 1E018 true /-36649--4463/ HELLO MIN EAR
    8 0 : 1E027 false /-79791--7711/ HELLO MIN AHEAD END
    8 1 : 1E027 true /-36567--4490/ HELLO MIN EAR
    8 2 : 1E027 true /-36649--4463/ HELLO MIN EAR
    9 0 : 1E036 false /-79791--7711/ HELLO MIN AHEAD END
    9 1 : 1E036 true /-36567--4490/ HELLO MIN EAR
    9 2 : 1E036 true /-36649--4463/ HELLO MIN EAR

    Code used:

        Config c = Decoder.defaultConfig();
    
        c.setString("-hmm", new File(assetDir, "en-us-ptm").toString());
        c.setString("-lm", new File(assetDir, "minear.lm").toString());
        c.setString("-dict", new File(assetDir, "minear.dict").toString());
        c.setFloat("-samprate", 16000.0);
        c.setBoolean("-allphone_ci", true);
        c.setString("-keyphrase", "HELLO MIN EAR");
        c.setFloat("-kws_threshold", kws_threshold);
    
        Decoder d = new Decoder(c);
    
        for (int i=0; i<3;i++){
    
            Thread.sleep(200);
    
            FileInputStream stream = new FileInputStream(new File(mFileName));
    
            d.startUtt();
    
            byte[] b = new byte[4096];
            try {
                int nbytes;
                int totbytes = 0;
                while ((nbytes = stream.read(b)) >= 0) {
                    ByteBuffer bb = ByteBuffer.wrap(b, 0, nbytes);
    
                    // Not needed on desktop but required on android
                    bb.order(ByteOrder.LITTLE_ENDIAN);
    
                    short[] s = new short[nbytes / 2];
                    bb.asShortBuffer().get(s);
                    d.processRaw(s, nbytes / 2, false, false);
                    totbytes = totbytes + nbytes;
                }
            } catch (IOException e) {
                Log.d("NS_DEBUG", "Error when reading " + mFileName + " : " + e.getMessage());
            }
    
            d.endUtt();
    
            if (d.hyp() != null) {
    
                if (d.hyp().getHypstr().equals(KEYPHRASE)) {
                    retBool = true;
                    _global.setResultArr[_global.setCntr]++;
                }
    
            }
    
             NumberFormat formatter = new DecimalFormat();
            formatter = new DecimalFormat("0E000");
            String numStr=formatter.format(kws_threshold);
            Log.d("NS_DEBUG", "" + _global.setCntr + " " + i + " : " + numStr + " " + retBool + " /" + d.hyp().getProb() + "-" + d.hyp().getBestScore() + "/ " + d.hyp().getHypstr());
    
        }
    
    } catch (Exception e) {
        Log.d("NS_DEBUG", e.getMessage());
    }
    
     
    • Nickolay V. Shmyrev

      You need to process large amounts of unrelated speech (say, 1 hour recording) to get proper estimation of false alarms.

      You can better optimize threshold on desktop, not on android

      For better help you need to share audio files.

       
  • neville9763

    neville9763 - 2016-02-20

    Hi Nickolay,

    Have got this working on a desktop with a really good result.

    Thank you for your help and patience.

    I have one other thing I am not really sure of. I am running with the -time option and have listed the output below. I assume the 2nd and 3rd columns are are the start and stop times of when the keyphrase was found, but I do not know what the last column is and also the second last row shows no time info - why is this?

    btw, only the last row is correct, the others are false positives and the 1st column is the keyphrase.

    HELLO MIN EAR 2955.310 2955.650 0.912917
    HELLO MIN EAR 2897.520 2898.030 0.918503
    HELLO MIN EAR 2896.160 2896.430 0.889217
    HELLO MIN EAR
    HELLO MIN EAR 3002.710 3003.680 1.004310

    Again, thanks for your help.

    Best

    Neville

     
    • Nickolay V. Shmyrev

      If only last is true match, you can raise threshold a little bit more probably. Last column is confidence, it's controlled by threshold.

       
  • neville9763

    neville9763 - 2016-02-21

    Having the position and confidence (expressed in terms of 1) is extremely useful. Does Android allow the time parameter? Or, alternatively, how can I get a similer outcome in Android?

     
    • Nickolay V. Shmyrev

      Yes, you can access segments and their probs with

      LogMath lmath = recognizer.getDecoder.getLogmath();
      for (Segment seg : recognizer.getDecoder().seg()) {
             float prob = logmath.exp(seg.getProb());
      }
      
       
  • neville9763

    neville9763 - 2016-04-22

    Hi Nickolay,

    I am progressing with establishing keyword thresholds using pocketsphinx_continous - after a long delay.

    I have a few issues, one to do with what seems to be erratic results which I am investigating further but had another quick question.

    What I have done is to take a random spoken sample of approx 12 minutes and then programatically add the recorded keywords/phrase at 2 minute intervals so I know exactly where (ito frames and seconds) the required keywords are. I do this in two passes so that the first pass (the broad pass) establishes broadly what the required threshold is and another pass (the narrow pass) identifies more closely what the threshold is based on the result from the broad pass.

    I then tabulate the results as shown below:

    Broad Pass

    0 1.0E31 -- 0.0 / 0.0
    1 1.0E27 -- 0.0 / 0.0
    2 1.0E22 -- 1.0 / 1.0
    3 9.9999998E17 -- 2.0 / 2.0
    4 9.9999998E12 -- 3.0 / 3.0
    5 1.0E9 -- 3.0 / 3.0
    6 10000.0 -- 4.0 / 4.0
    7 1.0 -- 5.0 / 6.0

    ...required threshold : 7 / 1.0

    8 1.0 -- 5.0 / 6.0
    9 1.0E-4 -- 5.0 / 6.0
    10 1.0E-9 -- 5.0 / 6.0
    11 1.0E-13 -- 5.0 / 9.0
    12 1.0E-18 -- 5.0 / 9.0
    13 1.0E-22 -- 5.0 / 11.0
    14 1.0E-27 -- 5.0 / 27.0
    15 1.0E-31 -- 5.0 / 38.0
    16 1.0E-36 -- 5.0 / 81.0
    17 1.0E-40 -- 5.0 / 129.0

    Narrow Pass

    16 1.0E8 -- 3.0 / 3.0
    15 1.0E7 -- 4.0 / 4.0
    14 1000000.0 -- 4.0 / 4.0
    13 100000.0 -- 4.0 / 4.0
    12 10000.0 -- 4.0 / 4.0
    11 1000.0 -- 4.0 / 4.0
    10 100.0 -- 5.0 / 6.0

    ...required threshold : 10 / 100.0

    9 10.0 -- 5.0 / 6.0
    8 1.0 -- 5.0 / 6.0
    7 0.1 -- 5.0 / 6.0
    6 0.01 -- 5.0 / 6.0
    5 0.001 -- 5.0 / 6.0
    4 1.0E-4 -- 5.0 / 6.0
    3 1.0E-5 -- 5.0 / 6.0
    2 1.0E-6 -- 5.0 / 6.0
    1 1.0E-7 -- 5.0 / 6.0
    0 1.0E-8 -- 5.0 / 6.0

    The table consists of 4 columns which are:
    Column 1 - number of iteration
    Column 2 - threshold used in algorithm
    Column 3 - true positives
    Column 4 - true and false positives

    My question is: In the broad pass, a number of iterations, ie iterations 7 thru to 10, show the same number of true and false positives. Do I use the largest or smallest threshold in this instance.

    Thanks

    Neville

     

    Last edit: neville9763 2016-04-22
    • Nickolay V. Shmyrev

      You need to count number of corrct detections from all occurences and number of false alarms. First should grow when you change threshold, second should fall. You need to choose the best point in the middle.

      You did not collect the proper data in your table, so it does not let to choose the threshold.

       
  • neville9763

    neville9763 - 2016-06-01

    Hi Nickolay,

    I have what is probably a pretty wierd (or maybe a pretty silly question) question.

    I have tuned pocketsphinx and have a number of threshold values for multiple keyphrases which seem to work well but I want to try and adjust these values manually ie. increment/decrement them by say 10 or 15 percent to see what effect they have on recognition.

    The threshold values I have range from 10-28 to 10-4 but because I have no real understanding of how the threshold values work I cannot make a reasonable judgement about increments. (If for example, in straight mathematical terms, if I wanted to vary 1 by 15% I would get an lower limit of 0.85 and an upper limit of 1.15, but I am not sure if such a straightforward rule could apply to the threshold values.)

    So, my question is what would the upper and lower threshold limits be for an increment or decrement of, say, 15% be on the following threshold values:

    1) 10-28
    2) 10-14
    3) 10-4
    4) 1

    Thanks

    Neville

     
    • Nickolay V. Shmyrev

      Sorry, I have no idea what do you mean by "upper" and "lower" threshold limits. Threshold is a single value, you can vary it from 1.0 to 1e-50. Percentage is not really applicable to it.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.