Menu

pronunciation evaluation with forced alignment

Help
qiqi
2015-02-15
2016-03-24
  • qiqi

    qiqi - 2015-02-15

    Hi,dear all,

    I am working on a task of pronunciation evaluation. I need evaluate if the speakers pronounce well or not given a "standard" model. My idea is to do forced alignment (I have the transcripts for each speech), and get the probability of p(o|model), which is the likelihood. However, I see the sphinx only output acoustic score, which is a "normalized" state likelihood plus transition probability.

    Now I just want to know if it's OK, I directly use this score to evaluate the pronunciation? I want to clarify here that I have the transcripts and just want to evaluate words in the given text.

    Thank you.

     
    • Nickolay V. Shmyrev

      if it's OK

      It is not clear what do you mean by "OK".

      I directly use this score

      It is also not clear what do you mean by "use". Since you do not describe the algorithm in details it is hard to evaluate it.

       
      • tfpeach

        tfpeach - 2015-02-16

        Thank you.

        If I determine some threshold on this score and say the words are pronounced well if the scores are higher than the threshold and the words are not pronounced well when the scores are lower than the threshold.

        Of course, I will work on how to get the threshold. But I now just want to verify this idea can work.

        Thank you.

         
        • Nickolay V. Shmyrev

          I will work on how

          Are you "tfpeach" same as "qiqi"?

          If I determine some threshold on this score and say the words are pronounced well if the scores are higher than the threshold and the words are not pronounced well when the scores are lower than the threshold.

          The score is a fit of the model and the data. It might be that there is a perfectly pronounced words which has worse score (because they do not fit the model) than badly pronounced words. The score would be best for the speakers from the training database, not for the speakers who pronounce words properly.

          Another issue is that score is computed over whole utterance. If someone mispronounces just a single phone the score difference would be small. If there is noise but pronunciation is perfect the overall score will be very bad.

          So there are disadvantages in the approach you selected.

           
          • tfpeach

            tfpeach - 2015-02-16

            Thank you.

            Sorry for the confusion of the user. When I posted first time, I was using other's computer. So it showed his name.

            I see your point here. Yes I agree there will be disadvantages. While if I assume the model trained is a "standard" model. The speech does not fit this model can be treated as bad pronunciation.

            Regarding of the noise, it is a problem because it will affect the acoustic features. I'd better develop some approach which can adjust the threshold according to the environment.

            BTW, do you have any suggestion on this topic? Thank you very much!

             
            • Nickolay V. Shmyrev

              BTW, do you have any suggestion on this topic? Thank you very much!

              Please read the theory first before asking questions.

               

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.