File Release Notes and Changelog
Notes:
This release includes initial support for factored language models, packing/unpacking during realization (generally slower than anytime method!), and various other improvements.
Changes:
0.8.4 - Factored language models (initial support), packing/unpacking, and more
---------------------------------------------------------------
* Added Alex's latex visualization of derivations
(nb: launch of previewer works better on Windows than Linux)
* Added customizable tokenization and expansion routines for
dates/times/nums/amounts and other named entities.
* Added -2apml option to ccg-test.
* Added Word class and many related changes to tokenization.
* Added -textf|-textfsc options to ccg-test, for writing files in the format
expected by the SRILM toolkit for factored language models.
* Updated copyright notices.
* Changed ngram model to use canonical lists of words as keys,
removing size restriction.
* Added -aanfilter option to ccg-test, with an optional list of
exceptions, which may be culled from bigram counts.
* Added keep-words-with-sem-classes option to grammar.xsd, to
specify exceptional semantic classes where the word form is also
considered relevant for scoring models.
NB: Also changed grammar.xsd to specify a custom tokenizer class name
and/or keep-words-with-sem-classes on a separate
tokenizer element.
* Added support for factored language models with fixed backoff paths,
arranged into families of models for different child variables,
and with the option to have secondary models for shorter available
histories. Also added corresponding -flm|-flmsc options to
ccg-test.
* Added option to do scoring in a second stage, starting from a packed
representation.
* Switch from cached combos to collected combos, making the anytime case
more like the packed case.
* Added compacting of gen forest when unpacking is turned off.
* Added pretty-printing of regex-like gen forest.