Sphinx4 or PocketSphinx for a smart home?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Sphinx4 or PocketSphinx for a smart home?

Forum: Speech Recognition Theory

Creator: Thomas Cherryhomes

Created: 2011-11-22

Updated: 2012-09-22

Thomas Cherryhomes - 2011-11-22

I have spent the last decade doing smart home research, and currently work on
the LinuxMCE project
(http://www.linuxmce.org/), and I find
myself curious as to which engine would be better suited for ALL of the
following use cases:

Simple declarative commands with/without qualifiers and extraneous vocabulary, "Computer, turn on the lights, please. Only halfway...more...more...thank you."

Dealing with names of media "Play 2001 a Space Odyssey" "Play everything by The Beatles"

This would be the simplest example of the extremes of what I would ideally
like to research for this..Would I use PocketSphinx, or Sphinx4, or a hybrid
of both running in tandem, or?

... It is worth noting that we already have in LinuxMCE, a vastly distributed
message passing architecture that everything sits on top of.

Thanks,
-Thom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-22

Since LinuxMCE is mostly in C++ and it's supposed to work in low-resource
environment you need to use pocketsphinx. It has most of the required
functionality for remote control.

I need to warn you that if you have to build a real working system you will
have issues with distant microphones and you will have to build a processing
module for a microphone array (not a part of CMUSphinx). Or you need to use a
close-talking microphone.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-22

Well, for microphone array there are open source packages too, like ManyEars.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thomas Cherryhomes - 2011-11-25

I understand the issues with dealing with microphone arrays. It's one of the
reasons I am working with the microphone array present in the Kinect for my
initial experiments.

Is pocketsphinx still a viable option when I want to deal beyond a simple
"remote control?" Please understand, that I am trying to also be able to
SELECT media, deal with names of people, phone numbers, and other "not so
black and white" aspects whilst building a vocabulary/grammar for this thing.

-Thom

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-25

Is pocketsphinx still a viable option when I want to deal beyond a simple
"remote control?" Please understand, that I am trying to also be able to
SELECT media, deal with names of people, phone numbers, and other "not so
black and white" aspects whilst building a vocabulary/grammar for this thing.

Please understand that software is also "not so black and white". There are
always things which you will need to implement yourself. CMUSphinx is a just a
tool, it's not a product. However, it's a good starting point for your project
as I already told you above.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thomas Cherryhomes - 2011-11-25

I understand. I worked with Sphinx2 in the 2001-2002 time frame for a limited
domain speech recognizer project.

I'll stick with PocketSphinx for now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.