CMU Sphinx / Forums / Help: pocketsphinx threshold query

neville9763 - 2016-02-13

Hi

I am using Android pocketsphinx to recognize an audio file with a customized dictionary.

All is working well ... possibly a little too well.

My recognition of a keyword phrase in an audio file is 100%. Whatever value I put into as the keyword threshold I get a match.

The keyphrase I use is "HELLO MIN EAR".

My config is:

c.setString("-hmm", new File(assetDir, "en-us-ptm").toString());
c.setString("-lm", new File(assetDir, "minear.lm").toString());
c.setString("-dict", new File(assetDir, "minear.dict").toString());
c.setFloat("-samprate", 16000.0);
c.setBoolean("-allphone_ci", true);
c.setFloat("-kws_threshold", kws_threshold);

where I vary kws_threshold between 0 and 1 for test purposes, and also using the 'recommended' values of 1e-35f etc.

My questions are:
1) Should I be getting 100% recognition even tho I vary the threshold values?
2) What is the optimum threshold level for my keyphrase? I understand it should be around 1e-35f for the 3 or 4 syllable keyphrase.

Thanks for any help!!

Last edit: neville9763 2016-02-13

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-13
  
  My recognition of a keyword phrase in an audio file is 100%. Whatever value I put into as the keyword threshold I get a match.
  
  It is wrong to call this "100%" recognition, you simply say you have too many false alarms
  
  1) Should I be getting 100% recognition even tho I vary the threshold values?
  
  No
  
  2) What is the optimum threshold level for my keyphrase? I understand it should be around 1e-35f for the 3 or 4 syllable keyphrase.
  
  Yes
  
  Thanks for any help!!
  
  To get help on your issue you'd better provide model files you are using and the logs and the audio file.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - neville9763 - 2016-02-13
    
    Hi Nickolay,
    
    Many thanks for your response. It is much appreciated!
    
    I have attached a zip file with all of the files I am using. The audio file is called minear_keyphrase.wav and the log file is minear.log. The log file contains the results of 10 iterations of processing the keyword audio file. Each iteration is seperated by a log entry of the following nature:
    
    org.minear D/NS_DEBUG﹕ iteration 0 / 1.000000E-1
    …
    org.minear D/NS_DEBUG﹕ iteration 9 / 1.401298E-45
    
    where the iteration and threshold value used are show (along with the tag NS_DEBUG).
    
    Any help or tips you can provide is very much appreciated. Thank you.
    
    Best
    
    Neville
    
    PS If you need any other information, please ask.
    
    Last edit: neville9763 2016-02-13
    
    minear_vr_files.zip
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-02-15
      
      Your file is too short for decoder to estimate amplitude level, if you process file twice second time it will detect keyphrase reliably. Continuous processing requires some time to adapt to amplitude levels.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

neville9763 - 2016-02-18

Hi Nickolay,

Thanks for the previous help. It is much appreciated!

I have followed your advice and played the audio file a number of times but still have an issue where no matter how much I vary the kws threshold, I can only get an overall positive result.

What I have done is:
- run the code loop (see below) for 9 iterations
- each iteration reads the input file 3 times and processes it.
- I have used values for kws_threshold varying from 1E-045 to 1E036.

It appears that no matter what the kws_threshold value is set as I cannot get a false value and the probability and best score remain constant - which appears strange. I would have thought that varying the kws_threshold would lead to a varying result.

The result are listed below where the first two columns are the iteration counters, the third column is the threshold used, fourth column is a boolean result, the fifth and sixth columns are the probability and best score and the last column is the returned recognised string.

Am I doing something wrong here?

Many thanks

Neville

Log:

0 0 : 1E-045 false /-79791--7711/ HELLO MIN AHEAD END
0 1 : 1E-045 true /-36567--4490/ HELLO MIN EAR
0 2 : 1E-045 true /-36649--4463/ HELLO MIN EAR
1 0 : 1E-036 false /-79791--7711/ HELLO MIN AHEAD END
1 1 : 1E-036 true /-36567--4490/ HELLO MIN EAR
1 2 : 1E-036 true /-36649--4463/ HELLO MIN EAR
2 0 : 1E-027 false /-79791--7711/ HELLO MIN AHEAD END
2 1 : 1E-027 true /-36567--4490/ HELLO MIN EAR
2 2 : 1E-027 true /-36649--4463/ HELLO MIN EAR
3 0 : 1E-018 false /-79791--7711/ HELLO MIN AHEAD END
3 1 : 1E-018 true /-36567--4490/ HELLO MIN EAR
3 2 : 1E-018 true /-36649--4463/ HELLO MIN EAR
4 0 : 1E-009 false /-79791--7711/ HELLO MIN AHEAD END
4 1 : 1E-009 true /-36567--4490/ HELLO MIN EAR
4 2 : 1E-009 true /-36649--4463/ HELLO MIN EAR
5 0 : 1E000 false /-79791--7711/ HELLO MIN AHEAD END
5 1 : 1E000 true /-36567--4490/ HELLO MIN EAR
5 2 : 1E000 true /-36649--4463/ HELLO MIN EAR
6 0 : 1E009 false /-79791--7711/ HELLO MIN AHEAD END
6 1 : 1E009 true /-36567--4490/ HELLO MIN EAR
6 2 : 1E009 true /-36649--4463/ HELLO MIN EAR
7 0 : 1E018 false /-79791--7711/ HELLO MIN AHEAD END
7 1 : 1E018 true /-36567--4490/ HELLO MIN EAR
7 2 : 1E018 true /-36649--4463/ HELLO MIN EAR
8 0 : 1E027 false /-79791--7711/ HELLO MIN AHEAD END
8 1 : 1E027 true /-36567--4490/ HELLO MIN EAR
8 2 : 1E027 true /-36649--4463/ HELLO MIN EAR
9 0 : 1E036 false /-79791--7711/ HELLO MIN AHEAD END
9 1 : 1E036 true /-36567--4490/ HELLO MIN EAR
9 2 : 1E036 true /-36649--4463/ HELLO MIN EAR

Code used:

Config c = Decoder.defaultConfig(); c.setString("-hmm", new File(assetDir, "en-us-ptm").toString()); c.setString("-lm", new File(assetDir, "minear.lm").toString()); c.setString("-dict", new File(assetDir, "minear.dict").toString()); c.setFloat("-samprate", 16000.0); c.setBoolean("-allphone_ci", true); c.setString("-keyphrase", "HELLO MIN EAR"); c.setFloat("-kws_threshold", kws_threshold); Decoder d = new Decoder(c); for (int i=0; i<3;i++){ Thread.sleep(200); FileInputStream stream = new FileInputStream(new File(mFileName)); d.startUtt(); byte[] b = new byte[4096]; try { int nbytes; int totbytes = 0; while ((nbytes = stream.read(b)) >= 0) { ByteBuffer bb = ByteBuffer.wrap(b, 0, nbytes); // Not needed on desktop but required on android bb.order(ByteOrder.LITTLE_ENDIAN); short[] s = new short[nbytes / 2]; bb.asShortBuffer().get(s); d.processRaw(s, nbytes / 2, false, false); totbytes = totbytes + nbytes; } } catch (IOException e) { Log.d("NS_DEBUG", "Error when reading " + mFileName + " : " + e.getMessage()); } d.endUtt(); if (d.hyp() != null) { if (d.hyp().getHypstr().equals(KEYPHRASE)) { retBool = true; _global.setResultArr[_global.setCntr]++; } } NumberFormat formatter = new DecimalFormat(); formatter = new DecimalFormat("0E000"); String numStr=formatter.format(kws_threshold); Log.d("NS_DEBUG", "" + _global.setCntr + " " + i + " : " + numStr + " " + retBool + " /" + d.hyp().getProb() + "-" + d.hyp().getBestScore() + "/ " + d.hyp().getHypstr()); } } catch (Exception e) { Log.d("NS_DEBUG", e.getMessage()); }
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-19
  
  You need to process large amounts of unrelated speech (say, 1 hour recording) to get proper estimation of false alarms.
  
  You can better optimize threshold on desktop, not on android
  
  For better help you need to share audio files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

neville9763 - 2016-02-20

Hi Nickolay,

Have got this working on a desktop with a really good result.

Thank you for your help and patience.

I have one other thing I am not really sure of. I am running with the -time option and have listed the output below. I assume the 2nd and 3rd columns are are the start and stop times of when the keyphrase was found, but I do not know what the last column is and also the second last row shows no time info - why is this?

btw, only the last row is correct, the others are false positives and the 1st column is the keyphrase.

HELLO MIN EAR 2955.310 2955.650 0.912917
HELLO MIN EAR 2897.520 2898.030 0.918503
HELLO MIN EAR 2896.160 2896.430 0.889217
HELLO MIN EAR
HELLO MIN EAR 3002.710 3003.680 1.004310

Again, thanks for your help.

Best

Neville

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-20
  
  If only last is true match, you can raise threshold a little bit more probably. Last column is confidence, it's controlled by threshold.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

neville9763 - 2016-02-21

Having the position and confidence (expressed in terms of 1) is extremely useful. Does Android allow the time parameter? Or, alternatively, how can I get a similer outcome in Android?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-21
  
  Yes, you can access segments and their probs with
  
  LogMath lmath = recognizer.getDecoder.getLogmath(); for (Segment seg : recognizer.getDecoder().seg()) { float prob = logmath.exp(seg.getProb()); }
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

neville9763 - 2016-04-22

Hi Nickolay,

I am progressing with establishing keyword thresholds using pocketsphinx_continous - after a long delay.

I have a few issues, one to do with what seems to be erratic results which I am investigating further but had another quick question.

What I have done is to take a random spoken sample of approx 12 minutes and then programatically add the recorded keywords/phrase at 2 minute intervals so I know exactly where (ito frames and seconds) the required keywords are. I do this in two passes so that the first pass (the broad pass) establishes broadly what the required threshold is and another pass (the narrow pass) identifies more closely what the threshold is based on the result from the broad pass.

I then tabulate the results as shown below:

Broad Pass

0 1.0E31 -- 0.0 / 0.0
1 1.0E27 -- 0.0 / 0.0
2 1.0E22 -- 1.0 / 1.0
3 9.9999998E17 -- 2.0 / 2.0
4 9.9999998E12 -- 3.0 / 3.0
5 1.0E9 -- 3.0 / 3.0
6 10000.0 -- 4.0 / 4.0
7 1.0 -- 5.0 / 6.0

...required threshold : 7 / 1.0

8 1.0 -- 5.0 / 6.0
9 1.0E-4 -- 5.0 / 6.0
10 1.0E-9 -- 5.0 / 6.0
11 1.0E-13 -- 5.0 / 9.0
12 1.0E-18 -- 5.0 / 9.0
13 1.0E-22 -- 5.0 / 11.0
14 1.0E-27 -- 5.0 / 27.0
15 1.0E-31 -- 5.0 / 38.0
16 1.0E-36 -- 5.0 / 81.0
17 1.0E-40 -- 5.0 / 129.0

Narrow Pass

16 1.0E8 -- 3.0 / 3.0
15 1.0E7 -- 4.0 / 4.0
14 1000000.0 -- 4.0 / 4.0
13 100000.0 -- 4.0 / 4.0
12 10000.0 -- 4.0 / 4.0
11 1000.0 -- 4.0 / 4.0
10 100.0 -- 5.0 / 6.0

...required threshold : 10 / 100.0

9 10.0 -- 5.0 / 6.0
8 1.0 -- 5.0 / 6.0
7 0.1 -- 5.0 / 6.0
6 0.01 -- 5.0 / 6.0
5 0.001 -- 5.0 / 6.0
4 1.0E-4 -- 5.0 / 6.0
3 1.0E-5 -- 5.0 / 6.0
2 1.0E-6 -- 5.0 / 6.0
1 1.0E-7 -- 5.0 / 6.0
0 1.0E-8 -- 5.0 / 6.0

The table consists of 4 columns which are:
Column 1 - number of iteration
Column 2 - threshold used in algorithm
Column 3 - true positives
Column 4 - true and false positives

My question is: In the broad pass, a number of iterations, ie iterations 7 thru to 10, show the same number of true and false positives. Do I use the largest or smallest threshold in this instance.

Thanks

Neville

Last edit: neville9763 2016-04-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-04-22
  
  You need to count number of corrct detections from all occurences and number of false alarms. First should grow when you change threshold, second should fall. You need to choose the best point in the middle.
  
  You did not collect the proper data in your table, so it does not let to choose the threshold.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

neville9763 - 2016-06-01

Hi Nickolay,

I have what is probably a pretty wierd (or maybe a pretty silly question) question.

I have tuned pocketsphinx and have a number of threshold values for multiple keyphrases which seem to work well but I want to try and adjust these values manually ie. increment/decrement them by say 10 or 15 percent to see what effect they have on recognition.

The threshold values I have range from 10-28 to 10-4 but because I have no real understanding of how the threshold values work I cannot make a reasonable judgement about increments. (If for example, in straight mathematical terms, if I wanted to vary 1 by 15% I would get an lower limit of 0.85 and an upper limit of 1.15, but I am not sure if such a straightforward rule could apply to the threshold values.)

So, my question is what would the upper and lower threshold limits be for an increment or decrement of, say, 15% be on the following threshold values:

1) 10-28
2) 10-14
3) 10-4
4) 1

Thanks

Neville

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-01
  
  Sorry, I have no idea what do you mean by "upper" and "lower" threshold limits. Threshold is a single value, you can vary it from 1.0 to 1e-50. Percentage is not really applicable to it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx threshold query

Speech Recognition Toolkit

Forums

Help

pocketsphinx threshold query document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Broad Pass

Narrow Pass

pocketsphinx threshold query