Menu

Checking only one expecting sentence

Help
buchay
2011-06-15
2012-09-22
  • buchay

    buchay - 2011-06-15

    Hello,

    I would like to use the pocketSphinx lib to be able to analyse an answer to a
    specific question. For example, the app could ask : "what is the capital of
    Australia ?" and the user would have to answer "Canberra".
    I'm really not familiar with sphinx yet, so I guess that I would have a
    dictionnary of every possible answers.

    So my first question : is there a way to just start a voice recognition, and
    give the expected correct word, so the recognition algorithm won't try to
    recognize what the user said, but only test it against the expected answer.

    Secondly, for an app of this kind (finite dictionnary that I provide), to be
    able to recognize what the user says (with or without the specific mechanism I
    mentioned above) what would be all the steps of the work ?

    Just having a txt of the words would be enough ? or do I need to adapt/build
    the acoustic model ? It seems that if I have 1000 possible answers, it would
    take a long time of recording...

    To be more specific, the questions of the app could also be "Who sang the song
    Angie" and the answer would be "The Rolling Stones" so it seems to me that the
    kind of recognition that I want to do is pretty hard, because these are proper
    names. Finally, the app will be on smartphones (Android and maybe iPhone).

    Thanks in advance !

     
  • Nickolay V. Shmyrev

    I would like to use the pocketSphinx lib to be able to analyse an answer
    to a specific question. For example, the app could ask : "what is the capital
    of Australia ?" and the user would have to answer "Canberra". I'm really not
    familiar with sphinx yet, so I guess that I would have a dictionnary of every
    possible answers.

    This task is called "utterance verification". For example see

    A New Approach to Utterance Verification Based on Neighborhood Information in
    Model Space
    Hui Jiang, Member, IEEE, and Chin-Hui Lee, Fellow, IEEE
    http://www.cse.yorku.ca/~hj/mypubs/Jiang_sap03.pdf

    is there a way to just start a voice recognition, and give the expected
    correct word, so the recognition algorithm won't try to recognize what the
    user
    said, but only test it against the expected answer.

    This task requires you to implement specific algorithm for utterance
    verification
    like the one described in the paper above. This algorithm is not implemented
    in pocketsphinx right now however pocketsphinx is a good base for it's
    implementation.

    what would be all the steps of the work ?

    The first step is to estimate the approach to take and the amount of work
    required.
    It will be significant though if you want to have a really working
    application.
    It's like 2-3 month of active work

    Just having a txt of the words would be enough ?

    Yes if you will implement algorithm properly

    or do I need to adapt/build the acoustic model ? It seems that if I have
    1000 possible answers, it would take a long time of recording...

    It would take long time to implement the alrogithm, the existing model could
    be reused.

    it seems to me that the kind of recognition that I want to do is pretty
    hard, because these are proper names

    When possible set of answers is fixed it's actually easy to recognize correct
    one. The real issue is to filter incorrect answers. You'll have to implement
    a background phone loop search and write a good confidence estimator.

     
  • buchay

    buchay - 2011-06-16

    Hello,

    Thanks for the quick reply !

    This algorithm seems to be really too complicated for me, I don't have 2
    months to spend on this problem. I asked the question to know if with this
    specific mechanism (test the recognition against the one expecting answer), it
    would be easier and shorter to implement the voice recognition.

    I think that the classical way will be easier. So I guess that I only need to
    specify the dictionnary of all answers, and when I get the result in my app, I
    check if it corresponds to the expecting answer.

    What scares me is that I don't really know if the accuracy will be good
    enough, especially with all the differents users of the mobile app. It would
    be a great waste of time If I spend 2 weeks to run this voice recognition, for
    an accuracy rate below 90%. I can't afford to have users answering 'Canberra',
    and having a false answer result from the app...

     
  • Nickolay V. Shmyrev

    And how can we help you?

     
  • buchay

    buchay - 2011-06-16

    You already have, with the first answer.
    My point in my second reply was to ask if it seems doable and accurate to use
    the classical way, and not an utterance verification algorithm. I don't have
    the necessary recoil on the techno to realize that.

     

Log in to post a comment.