Name | Modified | Size | Downloads / Week |
---|---|---|---|
Release | 2014-11-04 | ||
README | 2014-11-04 | 2.3 kB | |
Totals: 2 Items | 2.3 kB | 0 |
DeSR (c) Giuseppe Attardi 2005-2014 DeSR is a statistical dependency parser for natural languages. DeSR can be trained from annotated corpora like those supplied in the CoNLL 2006, 2007, 2008 and 2009 shared tasks. 0. WORD EMBEDDINGS This is a prototype version using a Deep Learning architecture. It exploits word embeddings, as provided by either: - Polyglot: https://sites.google.com/site/rmyeid/projects/polyglot - SENNA: 1. INSTALLATION This version requires the Eigen 3 algebra library (http://eigen.tuxfamily.org/). Issue configure from this directory: > ./configure then build the parser with: > make This will produce the following program: src/desr 2. CLASSIFIERS This version only provides a classifier based on: Multi Layer Perceptron (with help from Joseph Turian) The classifier can be tuned by settings these parameters in the configuration file: # Number of hidden variables DlHidden 300 # Max number of iterations DlIterations 30 # Terminate if no updates occurr for these many iterations DlVainIterations 3 # Learning rate DlLearningRate 0.01 # Activation function: softsign, tanh, sigmoid, cube DlActivationFunction softsign Other classifiers are included instead in the full version of DeSR: Maximum Entropy SVM (using libSVM code from http://www.csie.ntu.edu.tw/~cjlin/libsvm) Averaged Perceptron (by Massimilano Ciaramita) Passive Aggressive Perceptron 3. TRAINING and PARSING Training the parser requires an annotated corpus in the CoNLL-X tsv (tab separated) format. The word embeddings must be converted to DeSR format using either: - script/polyglot2desr.py - script/senna2desr.py When using SENNA embeddings, which uses lowercase words, one should add FormReplace .+ \L in the configuration file. The feature templates to use and other parameters for tuning the model are supplied in a configuration file, which defaults to: desr.conf A visual tool for creating annotated corpora is available at: http://medialab.di.unipi.it/Project/QA/Parser/DgAnnotator 5. INFORMATION The home page for the project is: https://sites.google.com/site/desrparser/ The code is available on Sourceforge at: http://sourceforge.net/projects/desr Enjoy Giuseppe Attardi attardi@di.unipi.it