Menu

Tree [r2] /
 History

HTTPS access


File Date Author Commit
 bin 2011-02-15 rentier [r1] initial import: rudify-0.1.14
 lib 2011-02-17 rentier [r2] copyright statements corrected
 share 2011-02-15 rentier [r1] initial import: rudify-0.1.14
 COPYING 2011-02-15 rentier [r1] initial import: rudify-0.1.14
 Changelog 2011-02-15 rentier [r1] initial import: rudify-0.1.14
 README 2011-02-15 rentier [r1] initial import: rudify-0.1.14
 README.taggers 2011-02-15 rentier [r1] initial import: rudify-0.1.14

Read Me

Taggers provided with this release
----------------------------------

All taggers in share/taggers/ were created using  bin/mktagger.py.
Naming schema for all taggers within this directory:

<ISO language code>-<training corpus>-<tagger type>-tagger.pickled

Every tagger is accompanied by a logfile that documents the training.

Available taggers:

 * deu-conll2006-3gram-tagger.pickled
   (TIGER data sets for CoNLL-X shared task of 2006, 39573 sentences)
 * eng-brown-3gram-tagger.pickled
   (the Brown corpus, 57340 sentences)
 * esp-conll2002-3gram-tagger.pickled
   (data set for the CoNLL 2002 shared task, 11755 sentences)
 * eus-conll2007-3gram-tagger.pickled
   (data set for the CoNLL 2007 shared task, 3175 sentences)
 * ita-evalita2009-3gram-tagger.pickled
   (TANL dependency data set for EVALITA 2009 pilot task, 3247 sentences)
 * nld-conll2002-3gram-tagger.pickled
   (data set for the CoNLL 2002 shared task, 23896 sentences)


Building taggers using non-NLTK ressources
------------------------------------------

For Italian and German, no tagged corpus ressources
are provided by the NLTK corpus collection as of NLTK-0.9.9.
In order to train taggers for use with rudify yourself
you need to obtain additional ressources and incorporate
them into your $NLTK_DATA/corpora directory.

Easy to integrate corpora are:

 * the Italian training/test corpora for the EVALITA 2009 Italian parsing task
   (http://poesix1.ilc.cnr.it/evalita2009/)

 * the German training/test corpora for the CoNLL-X shared task
   (http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/)

See lib/Rudify/non_nltk/ for further information on how to
access the data sets.
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.