I would like to use the pocketSphinx lib to be able to analyse an answer to a
specific question. For example, the app could ask : "what is the capital of
Australia ?" and the user would have to answer "Canberra".
I'm really not familiar with sphinx yet, so I guess that I would have a
dictionnary of every possible answers.
So my first question : is there a way to just start a voice recognition, and
give the expected correct word, so the recognition algorithm won't try to
recognize what the user said, but only test it against the expected answer.
Secondly, for an app of this kind (finite dictionnary that I provide), to be
able to recognize what the user says (with or without the specific mechanism I
mentioned above) what would be all the steps of the work ?
Just having a txt of the words would be enough ? or do I need to adapt/build
the acoustic model ? It seems that if I have 1000 possible answers, it would
take a long time of recording...
To be more specific, the questions of the app could also be "Who sang the song
Angie" and the answer would be "The Rolling Stones" so it seems to me that the
kind of recognition that I want to do is pretty hard, because these are proper
names. Finally, the app will be on smartphones (Android and maybe iPhone).
Thanks in advance !
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I would like to use the pocketSphinx lib to be able to analyse an answer
to a specific question. For example, the app could ask : "what is the capital
of Australia ?" and the user would have to answer "Canberra". I'm really not
familiar with sphinx yet, so I guess that I would have a dictionnary of every
possible answers.
This task is called "utterance verification". For example see
is there a way to just start a voice recognition, and give the expected
correct word, so the recognition algorithm won't try to recognize what the
user
said, but only test it against the expected answer.
This task requires you to implement specific algorithm for utterance
verification
like the one described in the paper above. This algorithm is not implemented
in pocketsphinx right now however pocketsphinx is a good base for it's
implementation.
what would be all the steps of the work ?
The first step is to estimate the approach to take and the amount of work
required.
It will be significant though if you want to have a really working
application.
It's like 2-3 month of active work
Just having a txt of the words would be enough ?
Yes if you will implement algorithm properly
or do I need to adapt/build the acoustic model ? It seems that if I have
1000 possible answers, it would take a long time of recording...
It would take long time to implement the alrogithm, the existing model could
be reused.
it seems to me that the kind of recognition that I want to do is pretty
hard, because these are proper names
When possible set of answers is fixed it's actually easy to recognize correct
one. The real issue is to filter incorrect answers. You'll have to implement
a background phone loop search and write a good confidence estimator.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This algorithm seems to be really too complicated for me, I don't have 2
months to spend on this problem. I asked the question to know if with this
specific mechanism (test the recognition against the one expecting answer), it
would be easier and shorter to implement the voice recognition.
I think that the classical way will be easier. So I guess that I only need to
specify the dictionnary of all answers, and when I get the result in my app, I
check if it corresponds to the expecting answer.
What scares me is that I don't really know if the accuracy will be good
enough, especially with all the differents users of the mobile app. It would
be a great waste of time If I spend 2 weeks to run this voice recognition, for
an accuracy rate below 90%. I can't afford to have users answering 'Canberra',
and having a false answer result from the app...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You already have, with the first answer.
My point in my second reply was to ask if it seems doable and accurate to use
the classical way, and not an utterance verification algorithm. I don't have
the necessary recoil on the techno to realize that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I would like to use the pocketSphinx lib to be able to analyse an answer to a
specific question. For example, the app could ask : "what is the capital of
Australia ?" and the user would have to answer "Canberra".
I'm really not familiar with sphinx yet, so I guess that I would have a
dictionnary of every possible answers.
So my first question : is there a way to just start a voice recognition, and
give the expected correct word, so the recognition algorithm won't try to
recognize what the user said, but only test it against the expected answer.
Secondly, for an app of this kind (finite dictionnary that I provide), to be
able to recognize what the user says (with or without the specific mechanism I
mentioned above) what would be all the steps of the work ?
Just having a txt of the words would be enough ? or do I need to adapt/build
the acoustic model ? It seems that if I have 1000 possible answers, it would
take a long time of recording...
To be more specific, the questions of the app could also be "Who sang the song
Angie" and the answer would be "The Rolling Stones" so it seems to me that the
kind of recognition that I want to do is pretty hard, because these are proper
names. Finally, the app will be on smartphones (Android and maybe iPhone).
Thanks in advance !
This task is called "utterance verification". For example see
A New Approach to Utterance Verification Based on Neighborhood Information in
Model Space
Hui Jiang, Member, IEEE, and Chin-Hui Lee, Fellow, IEEE
http://www.cse.yorku.ca/~hj/mypubs/Jiang_sap03.pdf
This task requires you to implement specific algorithm for utterance
verification
like the one described in the paper above. This algorithm is not implemented
in pocketsphinx right now however pocketsphinx is a good base for it's
implementation.
The first step is to estimate the approach to take and the amount of work
required.
It will be significant though if you want to have a really working
application.
It's like 2-3 month of active work
Yes if you will implement algorithm properly
It would take long time to implement the alrogithm, the existing model could
be reused.
When possible set of answers is fixed it's actually easy to recognize correct
one. The real issue is to filter incorrect answers. You'll have to implement
a background phone loop search and write a good confidence estimator.
Hello,
Thanks for the quick reply !
This algorithm seems to be really too complicated for me, I don't have 2
months to spend on this problem. I asked the question to know if with this
specific mechanism (test the recognition against the one expecting answer), it
would be easier and shorter to implement the voice recognition.
I think that the classical way will be easier. So I guess that I only need to
specify the dictionnary of all answers, and when I get the result in my app, I
check if it corresponds to the expecting answer.
What scares me is that I don't really know if the accuracy will be good
enough, especially with all the differents users of the mobile app. It would
be a great waste of time If I spend 2 weeks to run this voice recognition, for
an accuracy rate below 90%. I can't afford to have users answering 'Canberra',
and having a false answer result from the app...
And how can we help you?
You already have, with the first answer.
My point in my second reply was to ask if it seems doable and accurate to use
the classical way, and not an utterance verification algorithm. I don't have
the necessary recoil on the techno to realize that.