CMU Sphinx / Forums / Speech Recognition Theory: limitations of input file size

Speech Recognition Toolkit

limitations of input file size

Forum: Speech Recognition Theory

Creator: Anurag Nilesh

Created: 2009-07-01

Updated: 2012-09-22

Anurag Nilesh - 2009-07-01

I have been using the sphinx as per the instructions described here
http://sphinx.subwiki.com/sphinx/index.php/Hello_World_Decoder_QuickStart_Guide

I recorded a few audio files and saved them in wav format and then used the 64k nvp 3-gram LM present at http://www.inference.phy.cam.ac.uk/kv227/lm_giga/

These are the results i got out of it.
Here, ud= undetectd words (that were present in actual transcript file)
wd= wrongly detected words (i.e they were not present in transcript but present in sphinx generated file)
d=detected words
taw=total no. of words in transcript file
tsw=total no. of words in sphinx generated file
acc.=accuracy(d100/(d+ud))
err.=error rate( wd100/(wd+d))

filename ud wd d taw tsw acc. err.

BB2009a.A.txt 77 114 23 100 137 23.00 83.21
BB2009a.B.txt 124 212 76 200 288 38.00 73.61
BB2009a.C.txt 183 292 117 300 409 39.00 71.39
BB2009a.D.txt 275 265 125 400 390 31.25 67.95
BB2009a.E.txt 354 251 146 500 397 29.20 63.22
BB2009a.F.txt 442 235 158 600 393 26.33 59.80
BB2009a.G.txt 529 219 171 700 390 24.43 56.15
BB2009a.H.txt 605 212 195 800 407 24.38 52.09
BB2009a.I.txt 717 212 183 900 395 20.33 53.67
BB2009a.J.txt 803 193 197 1000 390 19.70 49.49
BB2009a.K.txt 905 202 195 1100 397 17.73 50.88
BB2009a.L.txt 998 189 202 1200 391 16.83 48.34
BB2009a.M.txt 1092 182 208 1300 390 16.00 46.67
BB2009a.N.txt 1197 187 203 1400 390 14.50 47.95
BB2009a.O.txt 1293 179 218 1511 397 14.43 45.09

As you may notice that after the 3rd row ,the no. of words generated by the sphinx3(tsw) is more or less the same irrespective of the increasing words in the audio file.It seems to me that there is a certain limit on buffer output size but i am unable to find out which piece of code handles this.

What changes can i make to ensure that sphinx3 works on bigger audio files?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- n lindle - 2009-07-01
  
  sphinx3_decode and sphinx3_livepretend are limited by a constant called S3_MAX_FRAMES. It is set to 15000 frames initially, which is 150 seconds of audio. You can edit the header files to increase this limit if you want. It is defined in feat.h and s3types.h.
  
  Alternatively, you can try sphinx3_continuous and see if that works for your needs.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anurag Nilesh - 2009-07-01
  
  thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

limitations of input file size

Speech Recognition Toolkit

Forums

Help

limitations of input file size document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

limitations of input file size