Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#103 Bugs(?) in GoodTuringProbDist and WittenBellProbDist

open
nobody
None
5
2008-06-19
2008-06-19
peter ljunglöf
No

When training and testing HMM tagging using different estimators, there are problems with GoodTuring and WittenBell.

GoodTuring gets extremely bad accuracy, and WittenBell gets a ZeroDivisionError. I'm not an expert on GoodTuring and WittenBell, so there's a small chance I have called them in a wrong manner, but my guess is that there are bugs in their implementations.

Attached is python file which I used for testing, and its output is shown below. NOTE: the test file needs the patched version of hmm.py, which was submitted as patch #1997742.

Training 450 sentences, 10431 tokens

Training using estimator: Laplace
Testing 50 sentences, 1280 tokens
Test result: 67.6%

Training using estimator: ELE
Testing 50 sentences, 1280 tokens
Test result: 75.2%

Training using estimator: Lidstone 0.1
Testing 50 sentences, 1280 tokens
Test result: 82.6%

Training using estimator: GoodTuring
Testing 50 sentences, 1280 tokens
Test result: 13.1%

Training using estimator: WittenBell
Testing 50 sentences, 1280 tokens
Traceback (most recent call last):
File "/var/folders/Hl/Hldpn1ooEeSui0EMU5q1zE+++TM/-Tmp-/py1805CCK", line 34, in <module>
acc = nltk.tag.accuracy(hmm, testC)
File "/Library/Python/2.5/site-packages/nltk/tag/util.py", line 82, in accuracy
test_tokens += list(tagger.tag(untag(sent)))
File "/Library/Python/2.5/site-packages/nltk/tag/hmm.py", line 182, in tag
path = self.best_path(unlabeled_sequence)
File "/Library/Python/2.5/site-packages/nltk/tag/hmm.py", line 226, in best_path
self._create_cache()
File "/Library/Python/2.5/site-packages/nltk/tag/hmm.py", line 204, in _create_cache
X[i, j] = self._transitions[si].logprob(self._states[j])
File "/Library/Python/2.5/site-packages/nltk/probability.py", line 316, in logprob
p = self.prob(sample)
File "/Library/Python/2.5/site-packages/nltk/probability.py", line 897, in prob
return self._T / float(self._Z * (self._N + self._T))
ZeroDivisionError: float division

Discussion

  •  
    Attachments
  • Logged In: YES
    user_id=1932543
    Originator: YES

    File Added: test_probdistbug.py