A handful of major new features have been implemented:
* Fast, robust stochastic gradient descent using Periodic Stepsize Adjustment (PSA)
* Disk-caching. Instantiated features for sequences can be cached on disk, allowing training over datasets that can't fit in main memory.
* Resulting models from training contain all the options used for that training run. The call to the decoder need not be responsible for keeping track of which options were passed to the trainer (this information is stored in the 'model file').
Log in to post a comment.