CMU Sphinx / Forums / Help: Problems with ps_get

Speech Recognition Toolkit

Problems with ps_get_prob()

Forum: Help

Creator: IvanBembe

Created: 2015-04-13

Updated: 2016-03-27

IvanBembe - 2015-04-13

Hi all,

I'm trying to obtain a confidence score that show me how many "good" is the obtained final hypothesis about the "real spoken words" (the &score that I pass to the function ps_get_hyp() partial hypothesis doesn't show me this). I want a kind of rule like if the confidence score of a final hypothesis is <0.5% or 50% I discard it. I uses a C++/CLI wrapper of pocketsphinx.

I read the FAQ of the official wiki which explains as follows:

Garbage Models - requires you to train special model. There is no public model with garbage phones >which can reject OOV words now. There are models with fillers, but they reject only specific sounds >(breath, laught, um). They can't reject OOV word.

I implemented it and works remarkably well.

Generic Word Model - same as above, requires you to train special model. There are no public models >yet.

I'm working with grammar. I skip this step.

Confidence Scores - confidence score (ps_get_prob) can be reliably calculated only for a large >vocabulary (> 100 words). It doens't work with small grammar. There are approaches with phone-based >confidence and one of them was implemented in sphixn2, but pocketsphinx doesn't support them. >Confidence scoring also require you to have three-pass recognition (enable both fwdflat and >bestpath).
So for now recommendation for rejection with the small grammar is - train your own model (make it >public). For the large language model (> 100 words) use confidence score.

My grammar contains exactly 100 rules and 1 word for rule.

I'm trying to implement it and i always get 0 when I call ps_get_prob(ps) with right words (inside the grammar) and wrong words (outside the grammar). I read the pocketsphinx.h and this says that:

note Unless the -bestpath option is enabled, this function will
always return zero (corresponding to a posterior probability of 1.0).

config = cmd_ln_init(NULL, ps_args(), TRUE,
"-hmm", hmmPath,
"-dict", dictPath,
"-mmap", "no",
"-logfn", logPath,
"-kws_threshold", "1e-40",
"-fwdflat", "yes",
"-bestpath", "yes",
NULL);

-fwdflat and -bestpath appears enabled in the log. These are well-activated?

Even if -bestpath is enabled, it will also return zero when called on a partial result.

I'm calling ps_get_prob() when the speech is ended so I gather that I'm not calling in a partial hypothesis:

if (!isCurrentInSpeech == 1 && isPreviousInSpeech == 1)
{
OnResultFinalizedBySilence(previousHyp);
int32 getprob = ps_get_prob(ps);
System::Console::WriteLine("ps_get_prob: " + getprob);
System::String^ restart = RestartProcessing();
System::Console::WriteLine(restart);
}

Where is my error? I have spent many hours trying to find it without success. :(

P.S: In LM mode I always get 0 too (the model contains 2491 ngrams, 7448 2-grams and 10140 3-grams) .

Thanks so much for all.

Last edit: IvanBembe 2015-04-13

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

IvanBembe - 2015-04-13

Here is the C++/CLI wrapper:

Thanks.

SphinxWrapper.cpp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

IvanBembe - 2015-04-13

And here is the log

textlog.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

IvanBembe - 2015-04-13

Here is my grammar (is a Spanish grammar) which contains 100 rules with 1 word for rule. The dictionary contains more than 2000 words.

Last edit: IvanBembe 2015-04-13

textualnormal.gram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-04-13
  
  Confidence estimation is not implemented in FSG mode. In language model mode it should work.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - virginia - 2016-03-26
    
    Is it implemented in jsgf mode? Thanks.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-03-27
      
      JSGF and FSG modes are the same
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

IvanBembe - 2015-04-13

And in older versions of pocketsphinx? f.e: 0.8?

Thanks Nikolay.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-04-13
  
  No, it is not implemented. It is not a trivial algorithm. If you need keyword detection, please use keyword spotting mode, it should fit your task.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

IvanBembe - 2015-04-14

In language model I always get 0 too.. where could be my error?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-04-14
  
  No idea, you can try to reproduce your issue with C code and share code and data to reproduce your problem.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problems with ps_get_prob()

Speech Recognition Toolkit

Forums

Help

Problems with ps_get_prob() document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problems with ps_get_prob()