Since computation involvs sum and ratio of probabilities, I thought it might
be better to use high precision computation library
(http://gmplib.org/) to compute 1.0003^logLikelihood
Anyway, the kind of probabilities I'm seeing are of the range E-14 per word !
Is this alright? Is the GMM spread over the feature space so much that it
gives rise to such low probabilities?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please advise me as to how much precision (how many bytes) I should keep while
doing probability summing/multiplying operations. If I want to compute
P(W)*P(O|W) , I first do ; then I do 1.0003^.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I thought it might be better to use high precision
This is a bad idea. Logbase is exactly to save precision
Anyway, the kind of probabilities I'm seeing are of the range E-14 per word
! Is this alright? Is the GMM spread over the feature space so much that it
gives rise to such low probabilities?
Yes, the dimension is very high (39). That's why logbase is used for
calculations.
Probability sum could be calculated without converting to linear scale, in the
log domain. logmath_add function exists exactly for that reason.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I had thought about that, but I can't directly use sphinxbase function
(logmath_add) as it requires lmath object which I don't know how to create in
my external c-code. Please let me know if some easy solution exists.
As I gather, logmath_add takes help of a prespecified log table, and if the
value (x - y) is not present in the table, then it falls back to logmath_add_exact which uses the pow function in math library. If using
pow function is "exact" (or is it a misnomer here), why is it not a good idea
to use it by default?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
and if the value (x - y) is not present in the table, then it falls back to
logmath_add_exact
Not like that, it falls back if there is no table, it doesn't fall back if
there is no value
why is it not a good idea to use it by default?
Because it loose the precision doing exponents two times. The right way is to
use a formula involving x - y which is used during the construction of the
table
or is it a misnomer here
The name is not good
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the clarification. I went through logmath_init() to see how the log
table is created. We are saving the precision by doing only 1 exponention
(instead of 2).
Another thing is, (I think), the logtable seems to be constrained to give only
INTEGER answers as per it's logic. Which may introduce some small errors.
(Actually the errors may be because of 1/x function, I'm not sure)
For example, when I wanted to add 1000 and 2000 in log domain, I get 3848.
Hello,
I'm using s3 nbest list to compute LVCSR LL ratio as in this paper - LVCSR
log-likelihood ratio scoring for keyword spotting http://citeseerx.ist.psu.ed
u/viewdoc/summary?doi=10.1.1.156.669 (Pls see eqn. 5 )
Since computation involvs sum and ratio of probabilities, I thought it might
be better to use high precision computation library
(http://gmplib.org/) to compute 1.0003^logLikelihood
Anyway, the kind of probabilities I'm seeing are of the range E-14 per word !
Is this alright? Is the GMM spread over the feature space so much that it
gives rise to such low probabilities?
Please advise me as to how much precision (how many bytes) I should keep while
doing probability summing/multiplying operations. If I want to compute
P(W)*P(O|W) , I first do ; then I do 1.0003^.
Thanks.
This is a bad idea. Logbase is exactly to save precision
Yes, the dimension is very high (39). That's why logbase is used for
calculations.
Probability sum could be calculated without converting to linear scale, in the
log domain. logmath_add function exists exactly for that reason.
Hello,
I had thought about that, but I can't directly use sphinxbase function
(logmath_add) as it requires lmath object which I don't know how to create in
my external c-code. Please let me know if some easy solution exists.
logmath.c
http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/sphinxbase/src/li
bsphinxbase/util/logmath.c?revision=11275&view=markup
As I gather, logmath_add takes help of a prespecified log table, and if the
value (x - y) is not present in the table, then it falls back to
logmath_add_exact which uses the pow function in math library. If using
pow function is "exact" (or is it a misnomer here), why is it not a good idea
to use it by default?
figured how to get logmath object (logmath_t *lmath = logmath_init(1.0001, 0,
0);)
Not like that, it falls back if there is no table, it doesn't fall back if
there is no value
Because it loose the precision doing exponents two times. The right way is to
use a formula involving x - y which is used during the construction of the
table
The name is not good
Thanks for the clarification. I went through logmath_init() to see how the log
table is created. We are saving the precision by doing only 1 exponention
(instead of 2).
Another thing is, (I think), the logtable seems to be constrained to give only
INTEGER answers as per it's logic. Which may introduce some small errors.
(Actually the errors may be because of 1/x function, I'm not sure)
For example, when I wanted to add 1000 and 2000 in log domain, I get 3848.
i.e.
log(1.0003^1000 + 1.0003^2000) = 3848 = log(1.0003^3848)
or 1.0003^1000 + 1.0003^2000 = 1.0003^3848
This result is off by 1.8E-4 acc to my calculator. If one needs more accuracy,
then perhaps using more precision while creating log table might help.
In sphinx4 for example logtable uses floats.