Menu

pocketsphinx on Android, getBestScore()

Help
Sebbi
2013-02-04
2017-12-15
  • Sebbi

    Sebbi - 2013-02-04

    Hi,
    first I have to say that I am an Engineer not a linguist so I do not understand that much of this voice recognition stuff. I just have some very basic knowledge of voice recognition.

    I am trying to implement a small command and control structure for an app. The system should work in german, so I took the german voxforge model. Base of my development was the android demo provided by the sphinx project.

    I stripped down the .dic file to this:

    JA j@
    NEIN n ai n
    WEITER v ai t ei
    ZURÜCK qq t s u: qq r y: s e k aa:
    NONE SIL
    

    My jsgf file only contains one command:

    public<test> = (JA | NEIN | WEITER | ZURÜCK | NONE);
    

    This works so far but how can I get a confidence of what has been recognized?

    I found this guy here: https://github.com/Kaljurand/Inimesed
    He took the Hypothesis_best_score_get(...) value and divided it by the length of the recorded audio. But this value just produces nonsense to me. I seems to me that I cannot use it to get any information whether the recognition was accurate or not.

    Is there a way to tell the system that If the accuracy was not high enough I want to receive a "GARBAGE" string or so? Because it always forced itself to one of the words in the dictionary. Or can I use any other value to do this by myself?

    Any help is appreciated. Thank you guys

     
  • Nickolay V. Shmyrev

    Is there a way to tell the system that If the accuracy was not high enough I want to receive a "GARBAGE" string or so? Because it always forced itself to one of the words in the dictionary. Or can I use any other value to do this by myself?

    Hello

    Filtering out of vocabulary is not supported out of box yet. You are welcome to help us to implement it. See the FAQ on the subject

    http://cmusphinx.sourceforge.net/wiki/faq#qcan_pocketsphinx_reject_out-of-grammar_words_and_noises

     
  • Nickolay V. Shmyrev

    He took the Hypothesis_best_score_get(...) value and divided it by the length of the recorded audio. But this value just produces nonsense to me. I seems to me that I cannot use it to get any information whether the recognition was accurate or not.

    True, it makes no sense to calculate score that way.

     
  • Sebbi

    Sebbi - 2013-02-05

    I tried to access the ps_get_prob() function from my java code as suggested in your link. I never worked with SWIG before and have no idea how i have to edit the SWIG interface file.

    Can somebody give me a hint?

    Vocabulary can be expanded to >100 words to make it accurate.

     
    • Вадим

      Вадим - 2013-02-10

      Here is a hint:
      open your jni/pocketsphinx.i file
      in Decoder zone, you can see all the functions which are already used by the RecognizerTask.java (like startUtt, endUtt, getHyp), and native pocketsphinx methods in C language are described in pocketsphinx.c (pocketsphinx/src/libpocketsphinx).

      for example,

      int startUtt() {
      return ps_start_utt($self, NULL);
      }

      startUtt is used in RecognizerTask.java in

      this.ps.startUtt()

      to mark that utterance is started (this mark is for pocketsphinx to start recording raw, if i am correct)

      Now, open pocketsphinx/src/continuous.c, and find function print_word_times:

      static void print_word_times(int32 start)
      {
          ps_seg_t *iter = ps_seg_iter(ps, NULL);
          while (iter != NULL) {
                  int32 sf, ef, pprob;
              float conf;
              ps_seg_frames (iter, &sf, &ef);
              pprob = ps_seg_prob (iter, NULL, NULL, NULL);
              conf = logmath_exp(ps_get_logmath(ps), pprob);
              printf ("%s %f %f %f\n", ps_seg_word (iter), (sf + start) / 100.0, (ef + start) / 100.0, conf);
              iter = ps_seg_next (iter);
          }
      }
      

      this function prints recognition results to console, and also prints the result for ps_seg_prob, as you can see (ps_seg_prob is almost like ps_get_prob, but used for a segment (for one word from result, like if the phase recognized is "i trained the acoustic model", than this function (print_word_times) prints results for every word in my phrase (and also, prints the probality) - for "i", for "trained", etc.). (You can try pocketsphinx_continuous.exe on your PC and see these result being printed)

      what you can try is to:
      1) modify this print_word_times(...) function
      2) add it to pocketsphinx.c
      3) add it to pocketsphinx.h
      4) add a function for java in pocketsphinx.i to call this new function
      5) cd to your PocketSphinxAndroidDemo/jni, then ndk/ndk-build (or ndk/ndk-build.cmd, if on Windows)
      6) then add somewhere to log your float result, and it should work

      so, (1 and 2):


      i made this thing a couple days before, and it worked, but i have no script left, so i will type here the variant that i remember.
      Add this function to pocketsphinx.c:

      float returnSegProb(ps_decoder_t *ps)
      {
          ps_seg_t *iter = ps_seg_iter(ps, NULL);
          int32 pprob;
          float conf;
      
          pprob = ps_seg_prob (iter, NULL, NULL, NULL);
          conf = logmath_exp(ps_get_logmath(ps), pprob);
      
          return conf;
      }
      

      (3)


      add

      float returnSegProb(ps_decoder_t *ps);
      

      to pocketsphinx.h

      (4)


      add

      float getSegmentProb(ps_decoder_t *ps) {
          return returnSegProb($self);
      }
      

      to your pocketsphinx.i in the Decoder field, near to startUtt, endUtt and other

      (5 and 6)


      build with ndk-build, (cd to jni folder, and start ndk-build (ndk-build.cmd for windows)), then add the log, showing this.ps.getSegmentProb() just after the this.ps.endUtt() in RecognizerTask.java (as i remember). Then build in eclipse and run (eclipse - for example)

      I hope it works, tell me please about your results :)

       

      Last edit: Вадим 2013-02-11
  • Nickolay V. Shmyrev

    Can somebody give me a hint?

    SWIG tutorial

    http://www.swig.org/tutorial.html

     
  • François Bourqui

    Hi,
    I would share my work, because I tried many times to find it on the Web, without any result. I think it's what you need to get confidence measure for each word.

    I'm using PocketSphinx on Android and I've made my own dictionary, grammar (JSGF) and Acoustic model for a command app with significant result in push to talk (about 95-98%).

    Like you I would detect inaccurate recognition to reject it. So I've got inspired by:
    static void print_word_times(int32 start)
    in the file continuous.c

    First of all I extended some pocketsphinx.h function to SWIG in the file jni/pocketsphinx.i:

    1: add those functions to the SegmentIterator class:

    SegmentIterator *next(){
        return ps_seg_next ($self);
    }
    char const * getWord(){
        return ps_seg_word($self);
    }
    float getProb(Decoder *ps){
        int32 pprob = ps_seg_prob ($self, NULL, NULL, NULL);
        return logmath_exp(ps_get_logmath(ps), pprob);
    }
    

    2: add this function to the Decoder class

    SegmentIterator *getSegmentIterator() {
    return ps_seg_iter($self, NULL);
    }

    then you can build SWIG to generate the pocketsphinx_wrap.c and ndk build.

    in your Android project, add this code in RecognizerTask.java when you get the final hypothesis:

    SegmentIterator i = ps.getSegmentIterator();
    while (i != null) {
       Log.d(getClass().getName(), "word: " + i.getWord()                            + " , prob: " +     i.getProb(ps));
        i = i.next();
    }
    

    Result with my own models (about 50 word trained by me):

    Here, I said "téléphone appeler je dis n'importe quoi".
    The part "je dis n'importe quoi" are only unknown words.

    Here, I said "téléphone appeler un deux trois" all words are in the dictionary and grammar

    Result with English model hub4wsj_sc_8k and dict hub4.5000 (no adaptation):

    Here, I said "I don't know"

    We can see that everything is fine with good confidence measure

    Here I said "this is a test"

    We can see that we can eliminate the bad result and maybe ask the user to speak again.

    I know there is some bad confidence measure on my own models but I think I can explain that by the low amount of words and my small grammar.
    FAQ - reject out-of-grammar words and noises

    So I hope I can help some people with it as I wished I could find it myself couple weeks ago.

     
    • vincent

      vincent - 2014-01-07

      hi,

      why i get the prob always return 1.0?

       
      • Olivier Rousseau

        I had this problem, make sure you end utterance before iterating through the segments. Also make sure the bestpath option is enable in the config.

         
  • Nickolay V. Shmyrev

    Thanks for update François

    Still, a proper algorithm implementation is still pending. I hope we will get there soon.

     
  • rayava

    rayava - 2014-04-08

    HI Francois,

    can you please provide me the complete pocketsphinx.i i'm working on a similar project but got struck with the code

     
  • Teknia

    Teknia - 2014-07-05

    Hi Francois or anyone else,
    Is any of this code available in the later releases of Pocketsphinx on Android or could anybody please point me in the right direction?

    Thanks!

     
    • doesnt_matter

      doesnt_matter - 2014-10-22

      I think it´s not. (At least I haven´t found it). But it would be great if someone who allready did it for the latest releases, would just share their code / compiled library.
      And it seems to be a high requested feature for Android.

      (Even if it may be possible to achieve with the instudctions allready posted in this thread)

       

      Last edit: doesnt_matter 2014-10-22
  • qiqi

    qiqi - 2014-10-29

    Hi everyone,
    Is there anyone can tell me what is Вадим said on 2013-02-10:
    5) cd to your PocketSphinxAndroidDemo/jni, then ndk/ndk-build (or ndk/ndk-build.cmd, if on Windows)
    6) then add somewhere to log your float result, and it should work

    In other words, how to use SWIG to generate the pocketsphinx_wrap.c and ndk build?

    Thanks.

     
    • Nickolay V. Shmyrev

      Updated instructions for building pocketsphinx-android are available in tutorial

      http://cmusphinx.sourceforge.net/wiki/tutorialandroid

       
  • Saiful Irham Wicaksana

    Hi everyone,
    I know this is an old discussion but maybe someone needs this. I just created android library based on pocketsphinx-android-demo by Nickolay V. Shmyrev. Just add my library to your gradle dependencies without assets configuration. https://github.com/icaksama/RapidSphinx

     

Log in to post a comment.