CMU Sphinx / Forums / Help: Conversion from frame number to time stamps of words or phones

Diwakar.G - 2017-01-10

This is the sphinx3_align results. Now, I am trying to convert frame number to corresponding onset and offset time of words.

SFrm EFrm SegAScr Word 0 58 -518766 <s> 59 106 -288027 <sil> 107 123 -42713 <s> 124 138 -129898 THE 139 171 -299927 QUICK 172 206 -231186 BROWN 207 263 -346435 FOX 264 298 -313259 JUMPS 299 321 -121317 OVER 322 331 -114349 THE 332 373 -390465 LAZY 374 424 -634808 DOG 425 433 -154447 <sil> 434 459 -105426 </s> 460 480 -138397 </s> Total score: -3829420

I am not change any configuration settings. By default these are the parameters in pocket sphinx in feature vector computation.
Frame rate=100,
window length=0.0256
From these I have calculated number of samples/frame=0.025616000=410 samples/frame
window shift=Fs/ frame rate= 16000/100=160
For every 160 samples there will be shift in window. For the last word </s> (4.6 s- 4.8 s) onset and offset time. But the total length of audio signal is showing around 6.195 sec. There is a mismatch. Please help me if am wrong. Thank you.

Last edit: Diwakar.G 2017-01-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

There is some problem with feature extraction with wav files. When I try to do feature extraction with .sph files with nist header the wavfile length and feature vectors are matched but for .wav files with RIFF header the length of wavfile and feature vectors are not matching.

sitecsp@acl-pg-06:~$ sphinx_cepview -b 82 -e 98 -d 13 -f FC01-Session1-0054.mfc
Current configuration:
[NAME]      [DEFLT]     [VALUE]
-b      0       82
-d      10      13
-describe   0       0
-e      2147483647  98
-f              FC01-Session1-0054.mfc
-header     0       0
-i      13      13

INFO: main_cepview.c(152): Displaying 13 out of 13 columns per frame
INFO: main_cepview.c(153): Total 401 frames

 53.251  -6.944   1.407  -2.457  -6.865  -5.867   9.838   9.280   2.394   5.270  -0.121  -4.562  10.742 
 52.650  -6.407  -1.116  -3.184   4.144  -1.115   3.089  -1.796   1.651   2.165   1.681  -6.425   4.559 
 52.858  -9.507  -4.892 -13.696  -9.678  -4.131   3.844   4.681   6.215   0.275  -7.676   1.772   6.018 
 49.998  -9.793  -0.598 -12.458 -10.120  -5.591   8.318   4.080  -2.615   0.523  -7.819  -2.630   0.533 
 50.406 -12.133  -5.096 -10.152 -13.763  -9.402   3.709   4.309  -5.395  -2.200   1.035   4.833   3.242 
 51.560  -9.216  -4.687  -4.951 -12.551  -6.305  11.447   5.209   2.412   0.401  -2.296   4.876   7.482 
 49.922  -7.210   3.928   2.236  -0.227  -1.214   3.342   4.979   2.664   4.103  -0.035   1.079   1.714 
 49.982  -8.810   2.402  -3.998 -11.644  -5.471   9.500   9.433  -4.328   6.917  -1.290   2.723   7.185 
 48.995 -11.293   5.384  -1.381 -15.051  -4.903  12.733   2.482   1.283  -0.518   2.501   9.957   5.690 
 49.450  -7.544   4.241   0.479 -13.377  -4.857   2.167   2.199  -3.727  -1.452   0.471  -1.558   5.217 
 50.368  -8.751  -3.676  -9.867 -16.020   3.206  13.360   1.570  -4.307   5.506   9.127   2.034   1.171 
 49.394  -5.125   7.343  -8.553  -2.801   5.864   9.743   1.970   0.877   6.508   3.909   5.188   6.046 
 50.854  -9.557  -1.881  -4.577 -17.259  -4.722  13.825  14.667   7.021   4.386  -2.905  -0.568  14.095 
 50.453 -10.525   5.201   3.677 -11.436   0.758  21.489   6.983   0.205  17.617  -6.318   1.005  18.804 
 54.942  -6.155  14.309  21.192  -3.079   9.075  25.065  15.158   7.923  18.296   5.295   5.404  16.285 
 58.618   4.933  21.806  30.604   1.804   7.033  23.187  14.198   5.428   9.530   1.043   3.065  13.867

The wavfile is actually 5.12s instead of getting 512 or 511 frames I am getting only 401 frames. The same problem occured for all wavfiles. Please help me.

Last edit: Diwakar.G 2017-01-10

Nickolay V. Shmyrev - 2017-01-10

By default feature extractor removes silence. You can add -remove_silence no to sphinx_fe to disable that but large silence in audio is harmful for other reasons.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Conversion from frame number to time stamps of words or phones

Speech Recognition Toolkit

Forums

Help

Conversion from frame number to time stamps of words or phones document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Conversion from frame number to time stamps of words or phones