Detecting end of speech

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Detecting end of speech

Forum: Help

Creator: Anuj Kumar

Created: 2011-06-01

Updated: 2012-09-22

Anuj Kumar - 2011-06-01

Hi,

Would anyone know what parameter is there to control the length of pause
before end-of-speech is detected and the audio file is sent to the decoder?
Which file has that parameter in pocketSphinx?

Anuj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pankaj - 2011-06-02

Hi,

In the continuous.c, in the utterance_loop() function, the following code
decides the end of an utterance
/
No speech data available; check current timestamp with most recent
speech to see if more than 1 sec elapsed. If so, end of utterance.
/
if ((cont->read_ts - ts) > DEFAULT_SAMPLES_PER_SEC)
break;

By changing the value compared in the if condition the length of pause before
end of speech is detected can be controlled.

With regards
Pankaj

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anuj Kumar - 2011-06-02

Hi Pankaj,

DEFAULT_SAMPLES_PER_SEC is set to 16000 samples per second in ad.h, which
essentially means that if statement checks if there are fewer than 16000
samples in a second to detect if it is a pause or not. I'm not sure if that
will work in my scenario, which is, to start decoding and printing as soon as
the user has spoken -- I think the current codebase waits for the buffer to
fill up before it sends the audio file to be decoded. Would you know what
parameter to tweak for that?

I think in continuous.c, there's a line "(k = cont_ad_read(cont, adbuf, 4096))
== 0" that waits for the buffer to fill up with 4096 samples in the buffer.
The cont_ad_read function is defined in cont_ad.h. Do you think bringing the
value of 4096 down to say 256 will start the decoding sooner, and make the
decoder print the hypothesis without waiting for the entire sentence to be
completed and then followed by a pause?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pankaj - 2011-06-03

Hi,
1. To change the time required to detect end of utterance you need not change DEFAULT_SAMPLES_PER_SEC. Actually it is not checking no of samples but the time elapsed (in number of samples per sec). Since 1 sec corresponds to 16000 samples by default the programmer must have used the DEFAULT_SAMPLES_PER_SEC. In my application I wanted to use a time gap of 500 ms, and hence I used (DEFAULT_SAMPLES_PER_SEC/2). There is an application cont_ad_fileseg.c in sphinxbase where instead of using the DEFAULT_SAMPLES_PER_SEC the programmer is comparing the time elapsed with a hardcoded value.

If you change the value 4096 to say 256 the process of decoding will start earlier, but the printing happens only when the complete utterance is passed to the get_hyp() function. You will have to modify the utterance_loop() function to suit your requirements, but it will require some considerable effort. A simple tweaking of parameters will not be sufficient.

Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anuj Kumar - 2011-06-03

Thanks, Pankaj!

What changes would be needed then to start the decoding earlier and have the
hypothesis be printed earlier? I know that pocketSphinx supports printing
partial hypothesis, but don't know how? Would you know?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-06-03

You can get the partial hypothesis using the same function ps_get_hyp(). You
can call it during the recognition, there is no need to wait to utterance end.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.