I attach some false positives in a text file. Most conspicuously, the first two or three examples seem to show that 'wir' is treated as an article rather than a pronoun by this rule.
False positives of DE_AGREEMENT rule
I fixed some false positives, but there is stil a problem with "wir", "denselben", and "diejenigen". These words are tagged as "ALG", which might be a problem; ll. 416-423 seem to contain code to handle that postag. PRONOUNS_TO_BE_IGNORED contains "wir", but it is commented out for some reason.
I fixed the issue with "wir", but we now miss the error in "Erst recht wir fleißiges Arbeiter." The rule could use a general rewrite anyway.
Thanks for the fast responses. I'll download a nightly build later this week and test it on Wikipedia data.
Hitting the right balance between true and false positives will always be a problem. Maybe we can cover some of the newly introduced false negatives of the Java rule by an XML rule?
BTW I'm planning to do a small comparison Hunspell+LanguageTool vs. MS Word (built-in spell-check and grammar capabilities) vs. Duden Korrektor in the near future.
Another false positive of the same rule I happened to find a moment ago: "allen Ernstes". I didn't double-check this one, but I think it is a fairly common phrase.
Demnach erlangt niemand Eigentumstitel über ...
...Erklärungen der verschiedenen Bedeutungen eines jeden Ausdrucks ...
Log in to post a comment.