I am developping an application where users talk with a virtual character, in
order to train their voice (in terms of confidence, prosody, etc...). Most of
the time, I don't even need speech recognition, as I'm trying to detect voice
quality features.
But I've been considering various points in the dialogue, where it would be
very useful to be able to choose the branch between two topics, for example.
I'm really a beginner in speech recognition but the intuition I had was
displaying two sentences on the screen, and the user would read out loud one
of these sentences. I believe that if the two sentences are different enough
(eg "I'm a big rugby fan" vs "Sorry but I don't have a clue about rugby"),
then it should be very easy to say which sentence is more likely, even if we
only recognized about half of it.
Now I've looked a lot around the web and I can't seem to find systems that
would allow me to do this relatively easily. I've tried SR systems, such as
Sphinx or Julius but they really seem like big "power-houses" and I really
have no clue where to start to adapt them. I've also considered using keyword
spotting, I've seen a solution posted around here and I've started playing
with it, but I don't know if that's the best solution to my problem.
So after days of looking around various forums I thought it would be a better
idea if I just asked people with more SR experience. I would be very grateful
if someone could share some ideas about how they would attack this problem.
Thanks a lot !
Mat
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is easy to get it to differentiate between two sentences as you listed
above, just have a grammar with two entries , one for each sentence, but I
don't think thats terribly useful. Read the above link. It is a good starting
point. You will probably need to read more literature and then need to
customize the cmusphinx version you choose to use.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am developping an application where users talk with a virtual character, in
order to train their voice (in terms of confidence, prosody, etc...). Most of
the time, I don't even need speech recognition, as I'm trying to detect voice
quality features.
But I've been considering various points in the dialogue, where it would be
very useful to be able to choose the branch between two topics, for example.
I'm really a beginner in speech recognition but the intuition I had was
displaying two sentences on the screen, and the user would read out loud one
of these sentences. I believe that if the two sentences are different enough
(eg "I'm a big rugby fan" vs "Sorry but I don't have a clue about rugby"),
then it should be very easy to say which sentence is more likely, even if we
only recognized about half of it.
Now I've looked a lot around the web and I can't seem to find systems that
would allow me to do this relatively easily. I've tried SR systems, such as
Sphinx or Julius but they really seem like big "power-houses" and I really
have no clue where to start to adapt them. I've also considered using keyword
spotting, I've seen a solution posted around here and I've started playing
with it, but I don't know if that's the best solution to my problem.
So after days of looking around various forums I thought it would be a better
idea if I just asked people with more SR experience. I would be very grateful
if someone could share some ideas about how they would attack this problem.
Thanks a lot !
Mat
Take a look at :
http://cmusphinx.sourceforge.net/wiki/faq#qhow_to_implement_pronunciation_eva
luation
It is easy to get it to differentiate between two sentences as you listed
above, just have a grammar with two entries , one for each sentence, but I
don't think thats terribly useful. Read the above link. It is a good starting
point. You will probably need to read more literature and then need to
customize the cmusphinx version you choose to use.