What is acoustic score?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

What is acoustic score?

Forum: Speech Recognition Theory

Creator: Davide Mangiameli

Created: 2014-02-21

Updated: 2014-02-27

Davide Mangiameli - 2014-02-21

Hi all,
I read that acoustic score is a likelihood. Is that correct?
But i don't understand very well what is it.

Can i define acoustic score as
P(O|X) where O is an observation and X is the model?

Thanks and sorry if i'm completly wrong :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-02-21

I read that acoustic score is a likelihood. Is that correct?

Not really, score might undergo some normalization during decoding which destroy its probabilistic nature

P(O|X) where O is an observation and X is the model?

No, score is usually scaled, so it's not P(O|X) but something like C * P(O|X) or close to it

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Davide Mangiameli - 2014-02-21

Thank you very much

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Davide Mangiameli - 2014-02-23

I'm sorry if a have another question :)
Is acoustic score computed by forward algorithm in Sphinx?
Thank you in advice :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-02-23

Is acoustic score computed by forward algorithm in Sphinx?

You need to be more specific in what do you mean by "Sphinx". Acoustic score is computed in various places of CMUSphinx toolkit - in decoders, aligner, trainer. Viterbi (Forward) algorithm in different variations is used.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Davide Mangiameli - 2014-02-23

Hi! Thank you for your reply.
I'm talking about acoustic score that i obtain from forced alignment. I give in input to Sphinx the transcription, and the corrisponding audio. It gives me acoustic score per phoneme. Is that acoustic score computed by forward algorithm?
Thank you

Last edit: Davide Mangiameli 2014-02-24

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-02-24

Is that acoustic score computed by forward algorithm

Yes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Davide Mangiameli - 2014-02-25

Thank you!
You said that decoding destroyed acoustic score probabilistic nature. If i use it in comparison could it be useful? For example, if i have mean acoustic score from correct pronunciations of Phoneme AH. And i have an acoustic score from a good pronunciation of the same Phoneme. If i compare them they should be similar. Is it correct?
Thank You :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-02-26

If i compare them they should be similar. Is it correct?

No, it is not going to work this way.

You can learn more about pronunciation scoring methods and applications from the papers on the web.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Davide Mangiameli - 2014-02-27

Hi Nickolay.

I read a lot of papers on the web. Someone for the pronunciation scoring use a log-posterior probability. But i find a paper in which they seem to use acoustic score, saving mean and standard deviation of acoustic scores per phoneme from correct pronunciation aligned with text, and then they calculate z-score to evaluate new pronunciations. Is it possible? Or i misunderstand it all?
ref http://aclweb.org/anthology/W/W12/W12-5808.pdf

Thank you.
Bye

Last edit: Davide Mangiameli 2014-02-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-02-27

standard deviation of acoustic scores per phoneme from correct pronunciation aligned with text, and then they calculate z-score to evaluate new pronunciations. Is it possible?

I doubt you describe the paper method properly. Most likely the score is per state, not per phoneme.

If you want to discuss some specific paper give a link on it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-02-27

ref http://aclweb.org/anthology/W/W12/W12-5808.pdf

This paper is not very professional and not worth attention, it contains few conceptual mistakes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Davide Mangiameli - 2014-02-27

Ok... :) My mistake.... Sorry
So if i understand it well, acoustic score is not useful in pronunciation scoring.
Confidence Score seem to be better. But if i understand it well, Sphinx does not return this score automatically. I need to build it. Is it correct?

Thank you for your patience

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.