Carafe: ConditionAl RAndom Fields, Etc. / News: Recent posts

New Features

A handful of major new features have been implemented:

* Fast, robust stochastic gradient descent using Periodic Stepsize Adjustment (PSA)

* Disk-caching. Instantiated features for sequences can be cached on disk, allowing training over datasets that can't fit in main memory.

* Resulting models from training contain all the options used for that training run. The call to the decoder need not be responsible for keeping track of which options were passed to the trainer (this information is stored in the 'model file').

Posted by Ben Wellner 2008-01-11

Discriminative Word Alignment

A general framework for using non-factored features has been added to Carafe - these features don't explicitly predicate over the output variable assignments (as is typical with CRFs). This can be used for discriminative word-alignment or "sequence re-ranking" tasks.

Posted by Ben Wellner 2007-10-04

Ranking MaxEnt

An initial version of the standard "ranking formulation" of Maximum Entropy has been added to Carafe. This is useful for using MaxEnt as a re-ranker for parsing, semantic role labeling, answering/ranking answers to questions, learning similarity metrics, etc.

It can be used with the "-rank" option to the training "mxtrain(.opt)" and "mxtest(.opt)" (those are described in the file 'maxeml/README' in the distribution.

Posted by Ben Wellner 2006-12-08

Help With and Use Of Carafe

If you are interested in using Carafe and are having problems compiling or using the software, please let me know via an email (wellner _at_ cs _dot_ brandeis _dot_ edu ) or using one of the forums on this page. I'll be happy to help. Releases are not tested on all recent versions of the OCaml compiler and the build process may be sensitive to compiler version and platform variations (I fix these as I see them - remember, this is grad-student-ware of the single-programmer variety). ... read more

Posted by Ben Wellner 2006-12-08

Carafe is top-performing De-identification system

At the First Workshop on NLP Challenges in Clinical Data, a Carafe-built system achieved the best overall performance (out of 7 teams) as part of a challenge task in "De-identification". The task required identifying DATES, LOCATIONS, PATIENTs, DOCTORs and other information from medical records. The plan is to make pre-built binary versions and source code available for this specific task soon. More to follow.

Posted by Ben Wellner 2006-11-14

New Pre-Processor!

Carafe now includes a long-awaited Pre-Processor which takes care of tokenization and sentence detection. This is an early release of the pre-processor and is targeted now for Latin-1 chracater sets. A general Unicode tokenizer is planned for the future.

Posted by Ben Wellner 2006-10-20

Critical Bug Fix

Due to files missing from the previous distribution, that release (0.6.6) was completely broken. The new release (0.6.7) includes all the library files and works properly.

Posted by Ben Wellner 2006-09-29