DeSR (c) Giuseppe Attardi 2005-2014
DeSR is a statistical dependency parser for natural languages.
DeSR can be trained from annotated corpora like those supplied in the
CoNLL 2006, 2007, 2008 and 2009 shared tasks.
0. WORD EMBEDDINGS
This is a prototype version using a Deep Learning architecture.
It exploits word embeddings, as provided by either:
- Polyglot: https://sites.google.com/site/rmyeid/projects/polyglot
This version requires the Eigen 3 algebra library (http://eigen.tuxfamily.org/).
Issue configure from this directory:
then build the parser with:
This will produce the following program:
This version only provides a classifier based on:
Multi Layer Perceptron (with help from Joseph Turian)
The classifier can be tuned by settings these parameters in the configuration
# Number of hidden variables
# Max number of iterations
# Terminate if no updates occurr for these many iterations
# Learning rate
# Activation function: softsign, tanh, sigmoid, cube
Other classifiers are included instead in the full version of DeSR:
SVM (using libSVM code from http://www.csie.ntu.edu.tw/~cjlin/libsvm)
Averaged Perceptron (by Massimilano Ciaramita)
Passive Aggressive Perceptron
3. TRAINING and PARSING
Training the parser requires an annotated corpus in the CoNLL-X tsv
(tab separated) format.
The word embeddings must be converted to DeSR format using either:
When using SENNA embeddings, which uses lowercase words, one should add
FormReplace .+ \L
in the configuration file.
The feature templates to use and other parameters for tuning the model
are supplied in a configuration file, which defaults to:
A visual tool for creating annotated corpora is available at:
The home page for the project is:
The code is available on Sourceforge at: