Menu

Detecting end of speech

Help
Anuj Kumar
2011-06-01
2012-09-22
  • Anuj Kumar

    Anuj Kumar - 2011-06-01

    Hi,

    Would anyone know what parameter is there to control the length of pause
    before end-of-speech is detected and the audio file is sent to the decoder?
    Which file has that parameter in pocketSphinx?

    • Anuj
     
  • Pankaj

    Pankaj - 2011-06-02

    Hi,

    In the continuous.c, in the utterance_loop() function, the following code
    decides the end of an utterance
    /
    No speech data available; check current timestamp with most recent
    speech to see if more than 1 sec elapsed. If so, end of utterance.
    /
    if ((cont->read_ts - ts) > DEFAULT_SAMPLES_PER_SEC)
    break;

    By changing the value compared in the if condition the length of pause before
    end of speech is detected can be controlled.

    With regards
    Pankaj

     
  • Anuj Kumar

    Anuj Kumar - 2011-06-02

    Hi Pankaj,

    DEFAULT_SAMPLES_PER_SEC is set to 16000 samples per second in ad.h, which
    essentially means that if statement checks if there are fewer than 16000
    samples in a second to detect if it is a pause or not. I'm not sure if that
    will work in my scenario, which is, to start decoding and printing as soon as
    the user has spoken -- I think the current codebase waits for the buffer to
    fill up before it sends the audio file to be decoded. Would you know what
    parameter to tweak for that?

    I think in continuous.c, there's a line "(k = cont_ad_read(cont, adbuf, 4096))
    == 0" that waits for the buffer to fill up with 4096 samples in the buffer.
    The cont_ad_read function is defined in cont_ad.h. Do you think bringing the
    value of 4096 down to say 256 will start the decoding sooner, and make the
    decoder print the hypothesis without waiting for the entire sentence to be
    completed and then followed by a pause?

     
  • Pankaj

    Pankaj - 2011-06-03

    Hi,
    1. To change the time required to detect end of utterance you need not change DEFAULT_SAMPLES_PER_SEC. Actually it is not checking no of samples but the time elapsed (in number of samples per sec). Since 1 sec corresponds to 16000 samples by default the programmer must have used the DEFAULT_SAMPLES_PER_SEC. In my application I wanted to use a time gap of 500 ms, and hence I used (DEFAULT_SAMPLES_PER_SEC/2). There is an application cont_ad_fileseg.c in sphinxbase where instead of using the DEFAULT_SAMPLES_PER_SEC the programmer is comparing the time elapsed with a hardcoded value.

    1. If you change the value 4096 to say 256 the process of decoding will start earlier, but the printing happens only when the complete utterance is passed to the get_hyp() function. You will have to modify the utterance_loop() function to suit your requirements, but it will require some considerable effort. A simple tweaking of parameters will not be sufficient.

    Regards
    Pankaj

     
  • Anuj Kumar

    Anuj Kumar - 2011-06-03

    Thanks, Pankaj!

    What changes would be needed then to start the decoding earlier and have the
    hypothesis be printed earlier? I know that pocketSphinx supports printing
    partial hypothesis, but don't know how? Would you know?

     
  • Nickolay V. Shmyrev

    You can get the partial hypothesis using the same function ps_get_hyp(). You
    can call it during the recognition, there is no need to wait to utterance end.

     

Log in to post a comment.