-Lite version 0.8 has been released.
This version is substantially revised and expanded from version 0.7.
The code now includes improved interfaces to chunkers, grammars,
frequency distributions, full integration with WordNet 3.0 and
implementations of WordNet similarity measures, the Lancaster Stemmer,
simpler conventions for importing modules, and simpler installation. The top-level package is renamed nltk (formerly nltk_lite). A
new corpus package supports caching, slicing, a corpus search path
permitting corpora to be stored in multiple locations, and provides a
more convenient API. The book contains substantial revision of Part I
(tokenization, tagging, chunking) and Part II (grammars and parsing),
making it accessible to a broader audience. NLTK-Lite 0.8 has several
new corpora and interfaces including the Switchboard Telephone Speech
Corpus transcript sample (Talkbank Project), CMU Problem Reports Corpus
sample, CONLL2002 POS+NER data, Patient Information Leaflet corpus
sample, Indian POS-Tagged data (Bangla, Hindi, Marathi, Telugu),
Shakespeare XML corpus sample, and the UDHR corpus with text samples in
300+ languages. The nltk.contrib package is now a new top-level
nltk_contrib package, and includes DRT and Glue Semantics (Dan
Garrette), Punkt sentence segmenter (Willy), LPath interpreter
(Haejoong Lee), classifiers (Sumukh Ghodke), Kimmo finite-state
morphology system (Rob Speer), Lambek calculus system (Edward Loper).
For installation instructions, please see:
This version is released to coincide with the start of the LSA Linguistic Institute at Stanford University, where two courses are based on Python and NLTK.