Menu

Pronunciation scoring with Sphinx4

Anonymous
2010-08-17
2012-09-22
  • Anonymous

    Anonymous - 2010-08-17

    Hello,

    I am developing a pronunciation scoring application for non-native speakers
    using Sphinx 4. The main goal is to give user feedback about his pronunciation
    of single isolated words. The score should contain ratings of each phoneme
    from this word (correctness). Additional information will contain mistakes
    made by user - misspoken phonemes (missing phonemes, substituted phonemes and
    wrong phonemes added to pronunciation).

    The simplest way to achieve that is to get phoneme transcription of spoken
    phrase (whatever was spoken) and compare this transcription to the correct
    one. However, the problem is that getting exact transcription from speech is
    very inaccurate using Sphinx and it will be very difficult to generate
    flexible grammar to get a transcription that will correspond exactly to what
    user said.

    I am now using WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz model from Sphinx and a
    grammar that contains only single word, which pronunciation I want to score.
    Dictionary has an entry containing the correct phoneme transcription of given
    word and eventually a couple of entries containing most common mistakes made
    by non-native speakers, i.e. commonly substituted phonemes.

    Using that strategy, I have a couple of questions, how could I develop the
    rest of the system:

    I would like to access the log-likelihood scores of all recognized phonemes of
    that word. What would be the best approach to get a score for each phoneme?
    Could it be the plain sum or average value of acoustic scores given by Sphinx
    recognition results? Is it possible to get this scores even if Sphinx doesn't
    recognize given speech sample?

    For example: I want to score pronunciation of word APPLE (AE P AH L) and I run
    recognition on the utterance containing bad pronunciation like APPLES (AE P AH
    L S) - in this case Sphinx keeps returning the 'null' result as it doesn't
    recognize the word from the grammar (APPLE). Is it possible to get scores for
    first four phonemes anyway and give feedback for user that something is wrong
    with his pronunciation at the end of the word?

    I will be very thankful for answering those questions. Or maybe you have
    better idea/strategy of how to solve this problem - scoring pronunciation with
    Sphinx.

    Regards,
    Tomek

     
  • Anonymous

    Anonymous - 2010-08-20

    There are 3 measures of pronunciation quality popular in many papers:
    - segment log-likelihood (for each phoneme)
    - segment duration in frames
    - phone log-posterior probability scores

    I am wondering if it is possible to easily get the last one from Sphinx4
    result? This measure is descibed for example in "AUTOMATIC PRONUNCIATION
    SCORING FOR LANGUAGE INSTRUCTION" by SRI International (http://citeseerx.ist.
    psu.edu/viewdoc/download?doi=10.1.1.68.4549&rep=rep1&type=pdf
    )

    Do you know any easy way to calculate this measure (phone log-posterior
    probability score)?

    Regards,
    Michał

     
  • Nickolay V. Shmyrev

    No, there is no easy way to calculate this. It requires you both to develop
    search manager that use phone space to match the audio and to develop the
    accumulators that store phone likelihoods and turn them into posteriors.

     
  • Nickolay V. Shmyrev

    And, looking on the age of your article I wouldn't recommend you following
    their methods. Nowdays language aquisition tools target specific mistakes non-
    natives do. For example they are trying to catch the typical mistakes US
    students do in French. You will not get it looking on posterious.

     
  • Anonymous

    Anonymous - 2010-08-23

    Thanks for your replies, ill consider your ideas...

    I have also some slightly other question.
    I want to use Sphinx4 as an recognition engine in java applet on my website.
    The problem is that this site will be very often reloaded by user. The grammar
    contains only one word to be recognized but unfortunately whole WSJ package
    must be loaded every time. This package is quite heavy (10MB) but created
    search graph is just a little part of it.

    So, my question is: Is it possible to separately create SearchGraph from
    linguist and somehow save it to a file, and then only send this specific
    search graph to the applet to perform recognition on it? I think it will
    greatly improve performance of this application.

    Regards,
    Michał

     
  • Nickolay V. Shmyrev

    This package is quite heavy (10MB) but created search graph is just a little
    part of it.

    I'm not sure what you mean by search graph, but the recognizer search space is
    really huge (millions of nodes)

    So, my question is: Is it possible to separately create SearchGraph from
    linguist and somehow save it to a file, and then only send this specific
    search graph to the applet to perform recognition on it? I think it will
    greatly improve performance of this application. Regards, Michał

    Being heavyweight by nature recognition is unlikely to fit into lightweight
    client paradigm. I suggest you to consider using other technology as a base
    for your application. There was lot of success recently with Red5 + Flash
    setups. For example you can visit http://speechapi.com

     

Log in to post a comment.