Menu

Partial Hypothesis in PocketSphinx

Help
Anuj Kumar
2011-06-02
2016-10-29
  • Anuj Kumar

    Anuj Kumar - 2011-06-02

    Hi,

    How do you get partial hypothesis from the decoder in pocketSphinx i.e. get
    the hypothesis while the user is speaking? Is there some example and/or link
    that talks about this for pocketSphinx? I know that pocketSphinx supports
    this, but don't know how to get that.

    Thanks for any help in advance,

    • Anuj
     
  • Nickolay V. Shmyrev

    You can get the partial hypothesis during the recognition before utterance
    ended with the same function ps_get_hyp.

     
  • shamsa abid

    shamsa abid - 2011-09-26

    http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#partial_results

    from the above link i found

    It is possible to configure Sphinx-4 to generate partial results, that is,
    to inform you periodically as to what it thinks is the best possible
    hypothesis so far, even before the user has stopped speaking.

    To get this information, add a result listener to the recognizer. Your
    listener will receive a result (which may or not be a final result). The
    hypothesis text can be extracted from the text.

    There is a good example of this in sphinx4/tests/live/Live.java

    You can control how often the result listener is fired by setting the
    configuration variable 'featureBlockSize' in the decoder. The default setting
    of 50 indicates that the listener will be called after every 50 frames. Since
    each frame represents 10MS of speech, the listener is called every 500ms.

    Do i have featureblocksize available in pocketsphinx? I tried to find it but
    couldnt.
    I am currently getting partial results but after 5 to 6 second delay, whereas
    ideally i would like to get them instantaneously

    Any help?
    Regards,
    Shamsa.

     
  • Nickolay V. Shmyrev

    Do i have featureblocksize available in pocketsphinx?

    Pocketsphinx uses "push" processing so you decide yourself which amount of
    frames to push into the decoder. Sphinx4 uses "pull" processing so you have
    feature block size parameter. In pocketsphinx you just call ps_process_raw
    with the required amount of frames and then you call ps_get_hyp to get a
    partial result as soon as frames are processed. You decide when utterance end
    yourself with the flag to ps_process_raw.

     
  • shamsa abid

    shamsa abid - 2011-09-27

    I am getting a partial hypothesis as null more number of times than a proper
    word during voice input.
    This would mean that in case of null, for the current block, no matches have
    been found.
    What if i want to get a partial hypothesis based on a character i supply?
    Would it be possible for the process_raw function to return me one or more
    matches from partial hypothesis result starting with a character 'a'.

    I sort of want filtered partial hypothesis based on a character parameter. Am
    i headed in the right direction?

     
  • Nickolay V. Shmyrev

    am getting a partial hypothesis as null more number of times than a proper
    word during voice input.

    Maybe you don't set do_search parameter in ps_process_raw to TRUE. It must be
    TRUE.

    This would mean that in case of null, for the current block, no matches have
    been found.

    It's a normal situation

    What if i want to get a partial hypothesis based on a character i supply?
    Would it be possible for the process_raw function to return me one or more
    matches from partial hypothesis result starting with a character 'a'. I sort
    of want filtered partial hypothesis based on a character parameter. Am i
    headed in the right direction?

    If you want to have some restrictions on the results you need to set them
    through the grammar, not through the code.

     
  • shamsa abid

    shamsa abid - 2011-09-29

    http://www.youtube.com/watch?v=OEUeJb6Pwt4

    On the above link is a video by david huggins which is the thing im trying to
    do on android.

    In that video you can see partial results as the user proceeds with his
    speech.
    What amazes me is that
    1) Its very fast, (with a vocab of 4000 words)

    2) Getting a result every time, when in some cases even null can be returned.

    3) During speech user can pick suggestions from a popup as corrections. Where
    do these come from? A lattice? I tried to get a lattice while in listening
    mode but probably its not available unless utterance is completed.

    Please guide me as to how to get results from the decoder during speech and
    other suggestions as well.
    I need to figure out the logic behind this, that is, whats happening here?

     
  • Nickolay V. Shmyrev

    1) Its very fast, (with a vocab of 4000 words)

    Yes, pocketsphinx can ran very fast

    ) 2) Getting a result every time, when in some cases even null can be
    returned

    Usually when you passed some speech into the recognizer there is a partial
    result

    3) During speech user can pick suggestions from a popup as corrections.
    Where do these come from? A lattice?

    Yes

    I tried to get a lattice while in listening mode but probably its not
    available unless utterance is completed.

    On the video varants are accessed when utterance is completed already (after
    pause)

     
  • shamsa abid

    shamsa abid - 2011-10-13

    You said:

    On the video varants are accessed when utterance is completed already (after
    pause)

    How do you make/identify a PAUSE? I used ps.endutterance(); during my
    listening mode. It didnt give me any results.

    Right now im trying to get useful results from my partial hypothesis.

    I start recognition and i say
    "can you send me"

    with the following code what i get as partial hypothesis is

    "can "
    and then
    "can you"
    and then
    "can you an"

    However, i would actually like to have it like
    "can"
    and then
    "you"
    and then
    "send"

    How can i break up my partial hypothesis?

    i tried stopping utterance and restarting it at the end but it didnt work.
    i tried stopping the audio (before processraw)and starting a new one and that
    didnt help.
    how does this code need to be changed?
    So basically during the listening state i want my hypstring to containg only
    the latest word that i have spoken.
    I also tried accessing the nbest from the decoder at this satge but couldnt.

    if (state == State.LISTENING) {

    assert this.audio != null;
    try {

    short buf = this.audioq.take();
    Log.d(getClass().getName(), "Reading " + buf.length + " samples from queue");
    this.ps.processRaw(buf, buf.length, false, false);
    Hypothesis hyp = this.ps.getHyp();
    if (hyp != null) {
    String hypstr = hyp.getHypstr();

    if (hypstr != partial_hyp) //just to make sure im not getting the same
    hypothesis returned every time
    {

    if (this.rl != null && hyp != null) {
    Bundle b = new Bundle();
    b.putString("hyp", hyp.getHypstr());
    this.rl.onPartialResults(b);
    }
    }
    partial_hyp = hypstr;
    }

    }
    catch (InterruptedException e) {
    Log.d(getClass().getName(), "Interrupted in audioq.take");
    }
    }
    }
    }

     
  • Nickolay V. Shmyrev

    How do you make/identify a PAUSE? I used ps.endutterance(); during my
    listening mode. It didnt give me any results.

    Pause is identified by endpointer by energy of signal (when cont_ad_read
    returns 0 for some amount of time). After pause is identified utterance is
    ended with ps_end_utt.

    How can i break up my partial hypothesis? So basically during the listening
    state i want my hypstring to containg only the latest word that i have spoken.

    This requires modification of the pocketsphinx source. It's also a fuzzy
    stated problem because you don't quite understand how search is being
    performed to create a hypothesis. Last word spoken can be only identified
    after some delay with partial traceback. You can read more about search in a
    book

    http://www.amazon.com/Spoken-Language-Processing-Algorithm-
    Development/dp/0130226165

    Specifically you could read about:

    1. Viterbi search
    2. Partial traceback
     
  • Kristoffer

    Kristoffer - 2016-10-28

    In the sample above the partial result "can you an" is retrieved. Is it possible that any of that part changes in the next partial result? E.g. "canyon blah blah"

     
    • Nickolay V. Shmyrev

      Yes, sure. Partial traceback is exactly supposed to return "certain" part, everything else may change.

       
  • Kristoffer

    Kristoffer - 2016-10-29

    Ok. So the result isn't simply appended to the previous result?

    E.g. NOT this?
    "Hi"
    "Hi there"
    "Hi there little"
    a.s.o.

    But this?
    "Hi"
    "High the"
    "Hi there"
    "Hi there lit"

     
    • Nickolay V. Shmyrev

      Decoding considers many thousands of decoding sequences at once, not just one best sequence. You can check https://en.wikipedia.org/wiki/Viterbi_algorithm for details.

       
  • Kristoffer

    Kristoffer - 2016-10-29

    Thanks. I'm just wondering how stable this "partial hypothesis" is. Consider using some logic around the received partial hypothesis... If it can change completely, that means it's not stable.

     
    • Nickolay V. Shmyrev

      If you want a stable partial hypothesis you need to implement partial traceback (not implemented in pocketsphinx).

       
  • Kristoffer

    Kristoffer - 2016-10-29

    Ok. Thanks for your help.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.