Download Latest Version desr-1.4.3.tgz (15.4 MB)
Email in envelope

Get an email when there's a new version of DeSR

Home
Name Modified Size InfoDownloads / Week
Release 2014-11-04
README 2014-11-04 2.3 kB
Totals: 2 Items   2.3 kB 0
DeSR (c) Giuseppe Attardi 2005-2014

DeSR is a statistical dependency parser for natural languages.
DeSR can be trained from annotated corpora like those supplied in the
CoNLL 2006, 2007, 2008 and 2009 shared tasks.

0. WORD EMBEDDINGS

This is a prototype version using a Deep Learning architecture.
It exploits word embeddings, as provided by either:

- Polyglot: https://sites.google.com/site/rmyeid/projects/polyglot
- SENNA: 

1. INSTALLATION

This version requires the Eigen 3 algebra library (http://eigen.tuxfamily.org/).

Issue configure from this directory:

	> ./configure

then build the parser with:

	> make

This will produce the following program:

	src/desr

2. CLASSIFIERS

This version only provides a classifier based on:

	Multi Layer Perceptron	(with help from Joseph Turian)

The classifier can be tuned by settings these parameters in the configuration
file:

   # Number of hidden variables
   DlHidden	300
   # Max number of iterations
   DlIterations	30
   # Terminate if no updates occurr for these many iterations
   DlVainIterations 3
   # Learning rate
   DlLearningRate	0.01
   # Activation function: softsign, tanh, sigmoid, cube
   DlActivationFunction softsign

Other classifiers are included instead in the full version of DeSR:

	Maximum Entropy
	SVM			(using libSVM code from http://www.csie.ntu.edu.tw/~cjlin/libsvm)
	Averaged Perceptron	(by Massimilano Ciaramita)
	Passive Aggressive Perceptron

3. TRAINING and PARSING

Training the parser requires an annotated corpus in the CoNLL-X tsv
(tab separated) format.

The word embeddings must be converted to DeSR format using either:

    - script/polyglot2desr.py
    - script/senna2desr.py

When using SENNA embeddings, which uses lowercase words, one should add

    FormReplace	.+   \L

in the configuration file.

The feature templates to use and other parameters for tuning the model
are supplied in a configuration file, which defaults to:

	desr.conf

A visual tool for creating annotated corpora is available at:

	http://medialab.di.unipi.it/Project/QA/Parser/DgAnnotator

5. INFORMATION

The home page for the project is:

	https://sites.google.com/site/desrparser/

The code is available on Sourceforge at:

	http://sourceforge.net/projects/desr

Enjoy

Giuseppe Attardi
attardi@di.unipi.it
Source: README, updated 2014-11-04