I have spent the last decade doing smart home research, and currently work on
the LinuxMCE project
(http://www.linuxmce.org/), and I find
myself curious as to which engine would be better suited for ALL of the
following use cases:
Simple declarative commands with/without qualifiers and extraneous vocabulary, "Computer, turn on the lights, please. Only halfway...more...more...thank you."
Dealing with names of media "Play 2001 a Space Odyssey" "Play everything by The Beatles"
This would be the simplest example of the extremes of what I would ideally
like to research for this..Would I use PocketSphinx, or Sphinx4, or a hybrid
of both running in tandem, or?
... It is worth noting that we already have in LinuxMCE, a vastly distributed
message passing architecture that everything sits on top of.
Thanks,
-Thom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Since LinuxMCE is mostly in C++ and it's supposed to work in low-resource
environment you need to use pocketsphinx. It has most of the required
functionality for remote control.
I need to warn you that if you have to build a real working system you will
have issues with distant microphones and you will have to build a processing
module for a microphone array (not a part of CMUSphinx). Or you need to use a
close-talking microphone.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I understand the issues with dealing with microphone arrays. It's one of the
reasons I am working with the microphone array present in the Kinect for my
initial experiments.
Is pocketsphinx still a viable option when I want to deal beyond a simple
"remote control?" Please understand, that I am trying to also be able to
SELECT media, deal with names of people, phone numbers, and other "not so
black and white" aspects whilst building a vocabulary/grammar for this thing.
-Thom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is pocketsphinx still a viable option when I want to deal beyond a simple
"remote control?" Please understand, that I am trying to also be able to
SELECT media, deal with names of people, phone numbers, and other "not so
black and white" aspects whilst building a vocabulary/grammar for this thing.
Please understand that software is also "not so black and white". There are
always things which you will need to implement yourself. CMUSphinx is a just a
tool, it's not a product. However, it's a good starting point for your project
as I already told you above.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have spent the last decade doing smart home research, and currently work on
the LinuxMCE project
(http://www.linuxmce.org/), and I find
myself curious as to which engine would be better suited for ALL of the
following use cases:
This would be the simplest example of the extremes of what I would ideally
like to research for this..Would I use PocketSphinx, or Sphinx4, or a hybrid
of both running in tandem, or?
... It is worth noting that we already have in LinuxMCE, a vastly distributed
message passing architecture that everything sits on top of.
Thanks,
-Thom
Since LinuxMCE is mostly in C++ and it's supposed to work in low-resource
environment you need to use pocketsphinx. It has most of the required
functionality for remote control.
I need to warn you that if you have to build a real working system you will
have issues with distant microphones and you will have to build a processing
module for a microphone array (not a part of CMUSphinx). Or you need to use a
close-talking microphone.
Well, for microphone array there are open source packages too, like ManyEars.
I understand the issues with dealing with microphone arrays. It's one of the
reasons I am working with the microphone array present in the Kinect for my
initial experiments.
Is pocketsphinx still a viable option when I want to deal beyond a simple
"remote control?" Please understand, that I am trying to also be able to
SELECT media, deal with names of people, phone numbers, and other "not so
black and white" aspects whilst building a vocabulary/grammar for this thing.
-Thom
Please understand that software is also "not so black and white". There are
always things which you will need to implement yourself. CMUSphinx is a just a
tool, it's not a product. However, it's a good starting point for your project
as I already told you above.
I understand. I worked with Sphinx2 in the 2001-2002 time frame for a limited
domain speech recognizer project.
I'll stick with PocketSphinx for now.