Pocketsphinx recognition speed

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Pocketsphinx recognition speed

Forum: Help

Creator: Anonymous

Created: 2012-05-29

Updated: 2012-09-22

Anonymous - 2012-05-29

Hi. I'm developing a command and control application for Windows CE (ARMV4I
based). I'm using the voxforge spanish acoustic model. It's working ok, except
for the time it takes to recognize the commands. I've trained a statistical
language model for the 13 commands I'm using. Each command has 1 or 2 words. I
used a statistical language model because somewhere it said that this was
better (in terms of user experience) than using a grammar for command and
control (is this true? in which way?). The recognition of a command takes an
average of 4 seconds, with a minimum of 2 seconds and a maximum of 8 seconds.
In which ways do you think I may improve the performance? Is it worthy to
change the model to a grammar to improve the recognition speed?

Thanks in advance,

Federico

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-05-29

I used a statistical language model because somewhere it said that this was
better (in terms of user experience) than using a grammar for command and
control (is this true? in which way?).

When someone says something you should be careful with that. The statement is
not true. You need to use grammar

The recognition of a command takes an average of 4 seconds, with a minimum
of 2 seconds and a maximum of 8 seconds. In which ways do you think I may
improve the performance?

Please see the documentation

http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds

There are some other options that might help you, actually everything depends
on the particular configuration you are using. For best performance model must
be semi-continuous, default voxforge spanish is continuous and it's hard to
expect a good performance with it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-05-29

And is usual in any performance optimization you need to profile your
application first then decode on the particular optimization method.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2012-05-30

Thanks. This may seem naive, but I'm new using speak recognition engines; is
it possible to convert an existing continuous acoustic model to a semi-
continuous one? is there any tool to help with this?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-05-30

is it possible to convert an existing continuous acoustic model to a semi-
continuous one? is there any tool to help with this?

It is not possible. You need to train a model from the Voxforge data available
for download. If you are new in this technology, read a tutorial:

http://cmusphinx.sourceforge.net/wiki/tutorial

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.