CMU Sphinx / Forums / Help: Segment times with JSGF grammar

Hello all,

I have used Pocketsphinx multiple times now to perform speech recognition tasks on Android devices. Most recently, I tried to detect a single word with a grammar, and afterwards obtain time stamps for each of the segments. However, it turned out that the time stamps do not match the actual times from the file. It always tells me that the word was detected right in the beginning of the file, after 7 frames of SILence.
This is my jsgf:

#JSGF V1.0;
grammar neighbors;
public <kompliment> = sil <k> <oo> <m> <p> <l> <iy> <m> <eh> <n> <t> [ sil ];
<aa> = /2/ aa | /1/ eh | /1/ ax | /1/ ehh | /1/ ae | /1/ ex;
<zh> = zh;
<vv> = vv;
<dh> = dh;
<aah> = aah;
<yy> = yy;
<y> = y;
<hh> = hh;
<ch> = ch;
<jh> = jh;
<eeh> = eeh;
<x> = x;
<aaah> = aaah;
<erh> = erh;
<eh> = /2/ eh | /1/ ax | /1/ ehh | /1/ ae | /1/ ex;
<ohh> = ohh;
<rr> = rr;
<ts> = ts;
<ng> = ng;
<w> = w;
<ee> = ee;
<pf> = pf;
<th> = th;
<oooh> = oooh;
<nspc> = nspc;
<iih> = iih;
<uh> = uh;
<iy> = /2/ iy | /1/ ih | /1/ iih | /1/ ax;
<b> = b;
<ae> = ae;
<d> = d;
<g> = g;
<f> = f;
<cc> = cc;
<ah> = ah;
<k> = /2/ k | /1/ g;
<m> = m;
<l> = /15/ l | /1/ y;
<ao> = ao;
<n> = n;
<q> = q;
<p> = /2.5/ p | /1/ b;
<s> = s;
<r> = r;
<ex> = /2/ ex | /1/ eh | /1/ ah | /1/ aa;
<aw> = aw;
<v> = v;
<ay> = ay;
<ax> = ax;
<z> = z;
<ehh> = ehh;
<er> = er;
<oo> = /2/ oo | /1/ ooh | /1/ ex | /1/ eh | /1/ ah | /1/ aa;
<oi> = oi;
<ih> = ih;
<uu> = uu;
<oe> = oe;
<oh> = oh;
<ooh> = ooh;
<sh> = sh;
<uuh> = uuh;
<oy> = oy;
<ue> = ue;
<hhh> = hhh;
<yyh> = yyh;
<t> = /1.2/ t | /1/ d;

This is the output for any file where I say the German word "Kompliment":

11-15 12:15:08.680 5858-6719/de.---.app I/ASR: Phoneme: sil Start: 0 End: 7
11-15 12:15:08.686 5858-6719/de.---.app I/ASR: Phoneme: k Start: 8 End: 16
11-15 12:15:08.686 5858-6719/de.---.app I/ASR: Phoneme: oo Start: 17 End: 25
11-15 12:15:08.686 5858-6719/de.---.app I/ASR: Phoneme: m Start: 26 End: 35
11-15 12:15:08.686 5858-6719/de.---.app I/ASR: Phoneme: p Start: 36 End: 42
11-15 12:15:08.686 5858-6719/de.---.app I/ASR: Phoneme: y Start: 43 End: 49
11-15 12:15:08.687 5858-6719/de.---.app I/ASR: Phoneme: iy Start: 50 End: 56
11-15 12:15:08.687 5858-6719/de.---.app I/ASR: Phoneme: m Start: 57 End: 67
11-15 12:15:08.687 5858-6719/de.---.app I/ASR: Phoneme: ehh Start: 68 End: 80
11-15 12:15:08.687 5858-6719/de.---.app I/ASR: Phoneme: n Start: 81 End: 100
11-15 12:15:08.687 5858-6719/de.---.app I/ASR: Phoneme: t Start: 101 End: 113
11-15 12:15:08.687 5858-6719/de.---.app I/ASR: Phoneme: sil Start: 114 End: 229
11-15 12:15:08.689 5858-6719/de.---.app I/ASR: RESULT: sil k oo m p y iy m ehh n t sil
11-15 12:15:08.689 5858-6719/de.---.app I/ASR: SCORE: -3536.0
11-15 12:15:08.689 5858-6719/de.---.app I/ASR: FILE DURATION: 4.608

The last line shows the duration of the recorded file in seconds. One can clearly see that there is a mismatch between frame numbers and total time.
Audio is RIFF-WAVE, 16 kHz, 16 bit/mono

Can you please explain why this happens and how I can get correct frame numbers?

Segment times with JSGF grammar

Speech Recognition Toolkit

Forums

Help

Segment times with JSGF grammar document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Segment times with JSGF grammar