Sphinx technology for toys.

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Sphinx technology for toys.

Forum: Speech Recognition Theory

Creator: Mukund Ghosh

Created: 2014-07-12

Updated: 2014-07-13

Mukund Ghosh - 2014-07-12

Is Sphinx technology (specially Pocketsphinx ) suitable for deployment in toys ? I'd read somewhere that speech recognition in general doesn't work very well for kids and special techniques like vtln and others are required to achieve reasonable accuracies.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-07-12

Is Sphinx technology (specially Pocketsphinx ) suitable for deployment in toys ?

Yes

I'd read somewhere that speech recognition in general doesn't work very well for kids and special techniques like vtln and others are required to achieve reasonable accuracies

This is correct, it doesn't mean that you can't implement required features.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mukund Ghosh - 2014-07-12

Is it too bad without vtln (on kids) ? Is there plan to support vtln in pocketsphix? Will existing models be usable with vtln ?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-07-12
  
  Is it too bad without vtln (on kids) ?
  
  Why don't you try first and ask if you have issues
  
  Is there plan to support vtln in pocketsphix?
  
  No
  
  Will existing models be usable with vtln ?
  
  It's not clear what do you mean by "usable"
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mukund Ghosh - 2014-07-13

Well the issue is I have some familiarity with Pocketsphinx/Sprec. I know a group who is planning to use Speech recognition for deployment in Toys and have asked me for advice. They are looking for something compact as it will be an embedded application and PS seems to fit them well. They want to do some top level timeline estimation and so we have to see what exists and what needs to be developed.
In light of this fact my questions boil down to:

a) Can PS be used "as is" with reasonable accuracy with kids with an existing acoustic model ?
b) Would "An acoustic model trained on kids" help over an existing model (say en-us) ?
c) Would vtln be absolutely necessary to do something reasonably demoable and if yes, is it possible to use one of the existing models (say en-us) with vtln implemented in decoding or new models need to be trained (vtln used in both training and decoding) ?

We'll of course be doing rigorous testing on it later but for now just need some ballpark answers for planning.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-07-13

Can PS be used "as is" with reasonable accuracy with kids with an existing acoustic model ?

No

b) Would "An acoustic model trained on kids" help over an existing model (say en-us) ?

Yes

c) Would vtln be absolutely necessary to do something reasonably demoable

No

is it possible to use one of the existing models (say en-us) with vtln implemented in decoding or new models need to be trained (vtln used in both training and decoding) ?

You need new models for children

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.