precision while computing likelihoods

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

precision while computing likelihoods

Forum: Help

Creator: Pranav Jawale

Created: 2012-06-01

Updated: 2012-09-22

Pranav Jawale - 2012-06-01

Hello,

I'm using s3 nbest list to compute LVCSR LL ratio as in this paper - LVCSR
log-likelihood ratio scoring for keyword spotting http://citeseerx.ist.psu.ed
u/viewdoc/summary?doi=10.1.1.156.669 (Pls see eqn. 5 )

Since computation involvs sum and ratio of probabilities, I thought it might
be better to use high precision computation library
(http://gmplib.org/) to compute 1.0003^logLikelihood

Anyway, the kind of probabilities I'm seeing are of the range E-14 per word !
Is this alright? Is the GMM spread over the feature space so much that it
gives rise to such low probabilities?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pranav Jawale - 2012-06-01

Please advise me as to how much precision (how many bytes) I should keep while
doing probability summing/multiplying operations. If I want to compute
P(W)*P(O|W) , I first do ; then I do 1.0003^.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-06-01

I thought it might be better to use high precision

This is a bad idea. Logbase is exactly to save precision

Anyway, the kind of probabilities I'm seeing are of the range E-14 per word
! Is this alright? Is the GMM spread over the feature space so much that it
gives rise to such low probabilities?

Yes, the dimension is very high (39). That's why logbase is used for
calculations.

Probability sum could be calculated without converting to linear scale, in the
log domain. logmath_add function exists exactly for that reason.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pranav Jawale - 2012-06-01

Hello,

I had thought about that, but I can't directly use sphinxbase function
(logmath_add) as it requires lmath object which I don't know how to create in
my external c-code. Please let me know if some easy solution exists.

logmath.c
http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/sphinxbase/src/li
bsphinxbase/util/logmath.c?revision=11275&view=markup

As I gather, logmath_add takes help of a prespecified log table, and if the
value (x - y) is not present in the table, then it falls back to
logmath_add_exact which uses the pow function in math library. If using
pow function is "exact" (or is it a misnomer here), why is it not a good idea
to use it by default?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pranav Jawale - 2012-06-01

figured how to get logmath object (logmath_t *lmath = logmath_init(1.0001, 0,
0);)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-06-03

and if the value (x - y) is not present in the table, then it falls back to
logmath_add_exact

Not like that, it falls back if there is no table, it doesn't fall back if
there is no value

why is it not a good idea to use it by default?

Because it loose the precision doing exponents two times. The right way is to
use a formula involving x - y which is used during the construction of the
table

or is it a misnomer here

The name is not good

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pranav Jawale - 2012-06-05

Thanks for the clarification. I went through logmath_init() to see how the log
table is created. We are saving the precision by doing only 1 exponention
(instead of 2).

Another thing is, (I think), the logtable seems to be constrained to give only
INTEGER answers as per it's logic. Which may introduce some small errors.
(Actually the errors may be because of 1/x function, I'm not sure)

For example, when I wanted to add 1000 and 2000 in log domain, I get 3848.

i.e.

log(1.0003^1000 + 1.0003^2000) = 3848 = log(1.0003^3848)
or 1.0003^1000 + 1.0003^2000 = 1.0003^3848

This result is off by 1.8E-4 acc to my calculator. If one needs more accuracy,
then perhaps using more precision while creating log table might help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-06-11

In sphinx4 for example logtable uses floats.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.