I am extracting keywords from an audio file.
The audio is a sentence containing the words "w1 w2" in sequence.
The decoder identifies these words (as well as some variants)
but returns the following segment start and end times:
w1:a-xw2:y-b
with y > x by three or four frames.
Is this a normal behaviour? I can assume a certain tolerance when processing the segments
further.
- which values are expected with the default arguments?
- Is it possible to control this margin?
Many thanks,
Yuval
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is this a normal behaviour? I can assume a certain tolerance when processing the segments
Yes it should be ok. There is no direct restart of the next word after one ends since it's keyword spotting. So it can decide word started a bit later.
Is it possible to control this margin?
I don't think so.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am extracting keywords from an audio file.
The audio is a sentence containing the words "w1 w2" in sequence.
The decoder identifies these words (as well as some variants)
but returns the following segment start and end times:
with y > x by three or four frames.
Is this a normal behaviour? I can assume a certain tolerance when processing the segments
further.
- which values are expected with the default arguments?
- Is it possible to control this margin?
Many thanks,
Yuval
Yes it should be ok. There is no direct restart of the next word after one ends since it's keyword spotting. So it can decide word started a bit later.
I don't think so.
I see, thank you.