I know that if prob. is greater than "1", it can be positive value in log domain.
~~~~~~~~~~~~~~~~
1gram:
-4.1258 above 0.0169
2gram:
-1.6579 above
-1.6579 above all
-1.6579 above him
-1.6579 above his
-1.6579 above the
~~~~~~~~~~~~~~~~~~~~
Total count of above is "5". Total vocabulary size: 6171
How toolkit is calculating BW here with Good Turing?
Freq of those bigrams are 1.
How it will estimate new count for 1 by GT because it is the highest frequency. I'll not get count for 2 as well.
Last edit: Nickolay V. Shmyrev 2014-08-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
example:
I know that if prob. is greater than "1", it can be positive value in log domain.
~~~~~~~~~~~~~~~~
1gram:
-4.1258 above 0.0169
2gram:
-1.6579 above
-1.6579 above all
-1.6579 above him
-1.6579 above his
-1.6579 above the
~~~~~~~~~~~~~~~~~~~~
Total count of above is "5". Total vocabulary size: 6171
How toolkit is calculating BW here with Good Turing?
Freq of those bigrams are 1.
How it will estimate new count for 1 by GT because it is the highest frequency. I'll not get count for 2 as well.
Last edit: Nickolay V. Shmyrev 2014-08-16
Any clarification on this ??
For the most precise and exact explanation of the calculation process you can read the code.