Share

OpenCCG: The OpenNLP CCG Library

File Release Notes and Changelog

Release Name: 0.8.4 - FLMs, packing

Notes:
This release includes initial support for factored language models, packing/unpacking during realization (generally slower than anytime method!), and various other improvements.

Changes: 0.8.4 - Factored language models (initial support), packing/unpacking, and more --------------------------------------------------------------- * Added Alex's latex visualization of derivations (nb: launch of previewer works better on Windows than Linux) * Added customizable tokenization and expansion routines for dates/times/nums/amounts and other named entities. * Added -2apml option to ccg-test. * Added Word class and many related changes to tokenization. * Added -textf|-textfsc options to ccg-test, for writing files in the format expected by the SRILM toolkit for factored language models. * Updated copyright notices. * Changed ngram model to use canonical lists of words as keys, removing size restriction. * Added -aanfilter option to ccg-test, with an optional list of exceptions, which may be culled from bigram counts. * Added keep-words-with-sem-classes option to grammar.xsd, to specify exceptional semantic classes where the word form is also considered relevant for scoring models. NB: Also changed grammar.xsd to specify a custom tokenizer class name and/or keep-words-with-sem-classes on a separate tokenizer element. * Added support for factored language models with fixed backoff paths, arranged into families of models for different child variables, and with the option to have secondary models for shorter available histories. Also added corresponding -flm|-flmsc options to ccg-test. * Added option to do scoring in a second stage, starting from a packed representation. * Switch from cached combos to collected combos, making the anytime case more like the packed case. * Added compacting of gen forest when unpacking is turned off. * Added pretty-printing of regex-like gen forest.