SourceForge has been redesigned. Learn more.

#103 Bugs(?) in GoodTuringProbDist and WittenBellProbDist


When training and testing HMM tagging using different estimators, there are problems with GoodTuring and WittenBell.

GoodTuring gets extremely bad accuracy, and WittenBell gets a ZeroDivisionError. I'm not an expert on GoodTuring and WittenBell, so there's a small chance I have called them in a wrong manner, but my guess is that there are bugs in their implementations.

Attached is python file which I used for testing, and its output is shown below. NOTE: the test file needs the patched version of, which was submitted as patch #1997742.

Training 450 sentences, 10431 tokens

Training using estimator: Laplace
Testing 50 sentences, 1280 tokens
Test result: 67.6%

Training using estimator: ELE
Testing 50 sentences, 1280 tokens
Test result: 75.2%

Training using estimator: Lidstone 0.1
Testing 50 sentences, 1280 tokens
Test result: 82.6%

Training using estimator: GoodTuring
Testing 50 sentences, 1280 tokens
Test result: 13.1%

Training using estimator: WittenBell
Testing 50 sentences, 1280 tokens
Traceback (most recent call last):
File "/var/folders/Hl/Hldpn1ooEeSui0EMU5q1zE+++TM/-Tmp-/py1805CCK", line 34, in <module>
acc = nltk.tag.accuracy(hmm, testC)
File "/Library/Python/2.5/site-packages/nltk/tag/", line 82, in accuracy
test_tokens += list(tagger.tag(untag(sent)))
File "/Library/Python/2.5/site-packages/nltk/tag/", line 182, in tag
path = self.best_path(unlabeled_sequence)
File "/Library/Python/2.5/site-packages/nltk/tag/", line 226, in best_path
File "/Library/Python/2.5/site-packages/nltk/tag/", line 204, in _create_cache
X[i, j] = self._transitions[si].logprob(self._states[j])
File "/Library/Python/2.5/site-packages/nltk/", line 316, in logprob
p = self.prob(sample)
File "/Library/Python/2.5/site-packages/nltk/", line 897, in prob
return self._T / float(self._Z * (self._N + self._T))
ZeroDivisionError: float division


  • peter ljunglöf

    peter ljunglöf - 2008-06-19

    Logged In: YES
    Originator: YES

    File Added:


Log in to post a comment.