CMU Sphinx / Forums / Help: Accuracy expectations w/ unrestricted grammar

Speech Recognition Toolkit

Accuracy expectations w/ unrestricted grammar

Forum: Help

Creator: Justin Beckwith

Created: 2011-02-27

Updated: 2012-09-22

Justin Beckwith - 2011-02-27

Greetings! I am attempting to write an application that will perform
transcription of various videos. Since the videos could come from anywhere, I
am using the HUB4 model that pocketsphinx seems to use by default, and an
unrestricted grammar. Just to get things started, I am using the
pocketsphinx_continuous app and the -infile option on the latest builds to get
an idea of what accuracy I can get. With fairly high quality audio, in PCM 16
khz mono, I am getting an accuracy less than 60%.

This leaves me with a few questions:
- What type of accuracy can I expect in this scenario?
- How can I improve the accuracy (assuming unrestricted grammar and untrained model)
- Is pocketsphinx or sphinx4 better suited for this use case
- Is there a model that may be better suited for my use case

Any help would be appreciated. Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-02-27

Hello

and an unrestricted grammar.

Decoding is always restricted somehow, for example there is language model or
a grammar. To learn more about the way CMUSphinx decoders work please read the
tutorial

http://cmusphinx.sourceforge.net/wiki/tutorial

Default language model used by pocketsphinx is not quite good (hub4.5000.DMP),
it has just 5000 words. It makes sense to build your own language model from
existing video transcripts, for example from closed captions.

Default acoustic model of pocketsphinx is quite good, it makes sense to try it
instead of hub4.

What type of accuracy can I expect in this scenario?

Accuracy depends on many factors including for example the decoder used to
extract audio track. Overall, with default models 60% is the expected
accuracy. Further improvements require you to adapt the models and to
implement custom components.

How can I improve the accuracy (assuming unrestricted grammar and
untrained model)

I suggest you to setup a prototype then to improve componets one by one. Try
to build better language model, adapt an acoustic model, implement
imrpovements. For example video decoding often requires specialized music
filtering component.

Is pocketsphinx or sphinx4 better suited for this use case

For server-based applications it's better to use sphinx4. For more details see
http://cmusphinx.sourceforge.net/wiki/versions
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Accuracy expectations w/ unrestricted grammar

Speech Recognition Toolkit

Forums

Help

Accuracy expectations w/ unrestricted grammar document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Accuracy expectations w/ unrestricted grammar