CMU Sphinx / Forums / Help: pocketsphinx on Android, getBestScore()

Sebbi - 2013-02-04

Hi,
first I have to say that I am an Engineer not a linguist so I do not understand that much of this voice recognition stuff. I just have some very basic knowledge of voice recognition.

I am trying to implement a small command and control structure for an app. The system should work in german, so I took the german voxforge model. Base of my development was the android demo provided by the sphinx project.

I stripped down the .dic file to this:

JA j@ NEIN n ai n WEITER v ai t ei ZURÜCK qq t s u: qq r y: s e k aa: NONE SIL

My jsgf file only contains one command:

public<test> = (JA | NEIN | WEITER | ZURÜCK | NONE);

This works so far but how can I get a confidence of what has been recognized?

I found this guy here: https://github.com/Kaljurand/Inimesed
He took the Hypothesis_best_score_get(...) value and divided it by the length of the recorded audio. But this value just produces nonsense to me. I seems to me that I cannot use it to get any information whether the recognition was accurate or not.

Is there a way to tell the system that If the accuracy was not high enough I want to receive a "GARBAGE" string or so? Because it always forced itself to one of the words in the dictionary. Or can I use any other value to do this by myself?

Any help is appreciated. Thank you guys
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-05

Is there a way to tell the system that If the accuracy was not high enough I want to receive a "GARBAGE" string or so? Because it always forced itself to one of the words in the dictionary. Or can I use any other value to do this by myself?

Hello

Filtering out of vocabulary is not supported out of box yet. You are welcome to help us to implement it. See the FAQ on the subject

http://cmusphinx.sourceforge.net/wiki/faq#qcan_pocketsphinx_reject_out-of-grammar_words_and_noises

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-05

He took the Hypothesis_best_score_get(...) value and divided it by the length of the recorded audio. But this value just produces nonsense to me. I seems to me that I cannot use it to get any information whether the recognition was accurate or not.

True, it makes no sense to calculate score that way.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sebbi - 2013-02-05

I tried to access the ps_get_prob() function from my java code as suggested in your link. I never worked with SWIG before and have no idea how i have to edit the SWIG interface file.

Can somebody give me a hint?

Vocabulary can be expanded to >100 words to make it accurate.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Вадим - 2013-02-10
  
  Here is a hint:
  open your jni/pocketsphinx.i file
  in Decoder zone, you can see all the functions which are already used by the RecognizerTask.java (like startUtt, endUtt, getHyp), and native pocketsphinx methods in C language are described in pocketsphinx.c (pocketsphinx/src/libpocketsphinx).
  
  for example,
  
  int startUtt() {
  return ps_start_utt($self, NULL);
  }
  
  startUtt is used in RecognizerTask.java in
  
  this.ps.startUtt()
  
  to mark that utterance is started (this mark is for pocketsphinx to start recording raw, if i am correct)
  
  Now, open pocketsphinx/src/continuous.c, and find function print_word_times:
  
  static void print_word_times(int32 start) { ps_seg_t *iter = ps_seg_iter(ps, NULL); while (iter != NULL) { int32 sf, ef, pprob; float conf; ps_seg_frames (iter, &sf, &ef); pprob = ps_seg_prob (iter, NULL, NULL, NULL); conf = logmath_exp(ps_get_logmath(ps), pprob); printf ("%s %f %f %f\n", ps_seg_word (iter), (sf + start) / 100.0, (ef + start) / 100.0, conf); iter = ps_seg_next (iter); } }
  
  this function prints recognition results to console, and also prints the result for ps_seg_prob, as you can see (ps_seg_prob is almost like ps_get_prob, but used for a segment (for one word from result, like if the phase recognized is "i trained the acoustic model", than this function (print_word_times) prints results for every word in my phrase (and also, prints the probality) - for "i", for "trained", etc.). (You can try pocketsphinx_continuous.exe on your PC and see these result being printed)
  
  what you can try is to:
  1) modify this print_word_times(...) function
  2) add it to pocketsphinx.c
  3) add it to pocketsphinx.h
  4) add a function for java in pocketsphinx.i to call this new function
  5) cd to your PocketSphinxAndroidDemo/jni, then ndk/ndk-build (or ndk/ndk-build.cmd, if on Windows)
  6) then add somewhere to log your float result, and it should work
  
  so, (1 and 2):
  
  i made this thing a couple days before, and it worked, but i have no script left, so i will type here the variant that i remember.
  Add this function to pocketsphinx.c:
  
  float returnSegProb(ps_decoder_t *ps) { ps_seg_t *iter = ps_seg_iter(ps, NULL); int32 pprob; float conf; pprob = ps_seg_prob (iter, NULL, NULL, NULL); conf = logmath_exp(ps_get_logmath(ps), pprob); return conf; }
  
  (3)
  
  add
  
  float returnSegProb(ps_decoder_t *ps);
  
  to pocketsphinx.h
  
  (4)
  
  add
  
  float getSegmentProb(ps_decoder_t *ps) { return returnSegProb($self); }
  
  to your pocketsphinx.i in the Decoder field, near to startUtt, endUtt and other
  
  (5 and 6)
  
  build with ndk-build, (cd to jni folder, and start ndk-build (ndk-build.cmd for windows)), then add the log, showing this.ps.getSegmentProb() just after the this.ps.endUtt() in RecognizerTask.java (as i remember). Then build in eclipse and run (eclipse - for example)
  
  I hope it works, tell me please about your results :)
  
  Last edit: Вадим 2013-02-11
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-05

Can somebody give me a hint?

SWIG tutorial

http://www.swig.org/tutorial.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

François Bourqui - 2013-07-11

Hi,
I would share my work, because I tried many times to find it on the Web, without any result. I think it's what you need to get confidence measure for each word.

I'm using PocketSphinx on Android and I've made my own dictionary, grammar (JSGF) and Acoustic model for a command app with significant result in push to talk (about 95-98%).

Like you I would detect inaccurate recognition to reject it. So I've got inspired by:
static void print_word_times(int32 start)
in the file continuous.c

First of all I extended some pocketsphinx.h function to SWIG in the file jni/pocketsphinx.i:

1: add those functions to the SegmentIterator class:

SegmentIterator *next(){ return ps_seg_next ($self); } char const * getWord(){ return ps_seg_word($self); } float getProb(Decoder *ps){ int32 pprob = ps_seg_prob ($self, NULL, NULL, NULL); return logmath_exp(ps_get_logmath(ps), pprob); }

2: add this function to the Decoder class

SegmentIterator *getSegmentIterator() {
return ps_seg_iter($self, NULL);
}

then you can build SWIG to generate the pocketsphinx_wrap.c and ndk build.

in your Android project, add this code in RecognizerTask.java when you get the final hypothesis:

SegmentIterator i = ps.getSegmentIterator(); while (i != null) { Log.d(getClass().getName(), "word: " + i.getWord() + " , prob: " + i.getProb(ps)); i = i.next(); }

Result with my own models (about 50 word trained by me):

Here, I said "téléphone appeler je dis n'importe quoi".
The part "je dis n'importe quoi" are only unknown words.

Here, I said "téléphone appeler un deux trois" all words are in the dictionary and grammar

Result with English model hub4wsj_sc_8k and dict hub4.5000 (no adaptation):

Here, I said "I don't know"

We can see that everything is fine with good confidence measure

Here I said "this is a test"

We can see that we can eliminate the bad result and maybe ask the user to speak again.

I know there is some bad confidence measure on my own models but I think I can explain that by the low amount of words and my small grammar.
FAQ - reject out-of-grammar words and noises

So I hope I can help some people with it as I wished I could find it myself couple weeks ago.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- vincent - 2014-01-07
  
  hi,
  
  why i get the prob always return 1.0?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Olivier Rousseau - 2015-03-16
    
    I had this problem, make sure you end utterance before iterating through the segments. Also make sure the bestpath option is enable in the config.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-07-12

Thanks for update François

Still, a proper algorithm implementation is still pending. I hope we will get there soon.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rayava - 2014-04-08

HI Francois,

can you please provide me the complete pocketsphinx.i i'm working on a similar project but got struck with the code

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Teknia - 2014-07-05

Hi Francois or anyone else,
Is any of this code available in the later releases of Pocketsphinx on Android or could anybody please point me in the right direction?

Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- doesnt_matter - 2014-10-22
  
  I think it´s not. (At least I haven´t found it). But it would be great if someone who allready did it for the latest releases, would just share their code / compiled library.
  And it seems to be a high requested feature for Android.
  
  (Even if it may be possible to achieve with the instudctions allready posted in this thread)
  
  Last edit: doesnt_matter 2014-10-22
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

qiqi - 2014-10-29

Hi everyone,
Is there anyone can tell me what is Вадим said on 2013-02-10:
5) cd to your PocketSphinxAndroidDemo/jni, then ndk/ndk-build (or ndk/ndk-build.cmd, if on Windows)
6) then add somewhere to log your float result, and it should work

In other words, how to use SWIG to generate the pocketsphinx_wrap.c and ndk build?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-11-01
  
  Updated instructions for building pocketsphinx-android are available in tutorial
  
  http://cmusphinx.sourceforge.net/wiki/tutorialandroid
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Saiful Irham Wicaksana - 2017-12-15

Hi everyone,
I know this is an old discussion but maybe someone needs this. I just created android library based on pocketsphinx-android-demo by Nickolay V. Shmyrev. Just add my library to your gradle dependencies without assets configuration. https://github.com/icaksama/RapidSphinx

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx on Android, getBestScore()

Speech Recognition Toolkit

Forums

Help

pocketsphinx on Android, getBestScore() document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pocketsphinx on Android, getBestScore()