RNNLIB Wiki

Status: Beta

Brought to you by: alexgraves

Home

RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition.

Contents
Introduction
Installation
Usage
Autosave
Data File Format
Examples
Utilities
Experimental Results
Citations
References

Introduction

RNNLIB is a recurrent neural network library for sequence labelling
problems, such as speech and handwriting recognition. It implements the
Long Short-Term Memory (LSTM) architecture¹, as well as more
traditional neural network structures, such as Multilayer Perceptrons
and standard recurrent networks with nonlinear hidden units. Its most
important features are:

Bidirectional Long Short-Term Memory², which provides access to
long range contextual information in all input directions
Connectionist Temporal Classification³, which allows the system to
transcribe unsegmented sequence data
Multidimensional Recurrent Neural Networks⁴, which extends the
system to data with more than one spatiotemporal dimension (images,
videos, fMRI scans etc.)

All of which are explained in more detail in my Ph.D. thesis⁵. The
library also implements the multilayer, subsampling structure developed
for offline arabic handwriting recognition⁶. This structure allows the
network to efficiently label high resolution data such as raw images and
speech waveforms.

Taken together, the above components make RNNLIB a generic system for
labelling and classifying data with one or more spatiotemporal
dimensions. Perhaps its greatest strength is its flexibility: as well as
speech and handwriting⁷ recognition, it has so far been applied (with
varying degrees of success) to image classification, object recognition,
facial expression recognition, EEG and fMRI classification, motion
capture labelling, robot localisation, wind turbine energy prediction,
signature verification, image compression and touch sensor
classification. RNNLIB is also able to accept a wide variety of
different input representations for the same task, e.g. raw sensor data
or hand-crafted features (as shown for online handwriting⁸). See my
homepage for more publications.

RNNLIB also implements adaptive weight noise regularisation¹⁴, which makes it possible to train an arbitrary neural network with stochastic variational inference (or equivalently, to minimise the two part description length of the training data given the network weights plus the weights themselves). This form of regularisation makes overfitting virtually impossible; however it can lead to very long training times.

Installation

RNNLIB is written in C++ and should compile on any platform. However it
is currently only tested for Linux and OSX.

Building it requires the following:

A modern C++ compiler (e.g. gcc 3.0 or higher)
GNU Libtool
GNU automake version 1.9
(NOTE: will not work with version 1.10)
NetCDF scientific data
library
Boost C++ Libraries version 1.36 or higher
(headers only, no compilation needed.)

In addition, the following python packages are needed for the auxiliary
scripts in the ‘utils’ directory:

And these packages are needed to create and manipulate netcdf data files
with python, and to run the experiments in the ‘examples’ directory:

ScientificPython
(NOT Scipy)
netCDF Operator

To build RNNLIB, first download the source, then enter the root
directory and type

./configure
make

This should create the binary file ‘rnnlib’ in the ‘src’ directory. Note
that on most linux systems the default installation directory for the
Boost headers is ‘/usr/local/include/boost-VERSION_NUMBER’ which is not
on the standard include path. In this case type

CXXFLAGS=-I/usr/local/include/boost-VERSION_NUMBER/ ./configure
make

If you wish to install the binary type:

make install

By default this will use ‘/usr’ as the installation root (for which you
will usually need administrator privileges). You can change the install
path with the --prefix option of the configure script (use ./configure
--help for other options)

It is recommended that you add the directory containing the ‘rnnlib’
binary to your path, as otherwise the tools in the ‘utilities’ directory
will not work.

Project files are provided for the following integrated development
environments in the ‘ide’ directory:

kdevelop (KDE, linux)
xcode (OSX)

Usage

RNNLIB can be run from the command line as follows:

Usage: rnnlib [config_options] config_file
config_options syntax: --<variable_name>=<variable_value>
whitespace not allowed in variable names or values
all config_file variables overwritten by config_options
setting <variable_value> = "" removes the variable from the config
repeated variables overwritten by last specified

All the parameters determining the network structure, experimental setup
etc. can be specified either in the config file or on the command line.

The main parameters are as follows:

Parameter	Type	Allowed Values	Default	Comment
autosave	boolean	true,false	false	see below
batchLearn	boolean	true,false	true if RPROP is used, false otherwise	false => gradient descent updates at the end of each sequence, true => at the end of epochs only
dataFraction	real	0-1	1	determines fraction of the data to load
hiddenBlock	list of integer lists	all >=1		Hidden layer block dimensions
hiddenSize	integer list	all >=1		Sizes of the hidden layers
hiddenType	string	tanh, linear, logistic, lstm, linear_lstm, softsign	lstm	Type of units in the hidden layers
inputBlock	integer list	all >= 1		Input layer block dimensions
maxTestsNoBest	integer	>=0	20	Number of error tests without improvement on the validation set before early stopping
optimiser	steepest, rprop	steepest
learnRate	real	0-1	1e-4	Learning rate (steepest descent optimiser only)
momentum	real	0-1	0.9	Momentum (steepest descent optimiser only)
subsampleSize	integer list	all >= 1		Sizes of hidden subsample layers
task	string	classification, sequence_classification, transcription		Network task. sequence_* => one target for whole sequence (not for each point in the sequence). transcription => unsegmented sequence labelling with CTC.
trainFile	string list			Netcdf files used for training. Note that all datasets can consist of multiple files. During each training epoch, the files will be cycled through in random order, with the sequences cycled randomly within each file
valFile	string list			Netcdf files used for validation / early stopping
testFile	string list			Netcdf files used for testing
verbose	boolean	true,false	false	Verbose console output
mdl	boolean	true,false	false	Use adaptive weight noise (M)inimum (D)escription (L)ength regularisation
mdlWeight	real	0-1	1	weight for MDL regularisation (0 => no regularisation; 1 => true MDL)
mdlInitStdDev	real	>0	0.075	initial std. dev. for MDL adaptive weight noise
mdlSamples	int	>=1	1	number of Monte Carlo samples to pick for each sequence to get stochastic derivs for MDL adaptive weight noise (more samples => less noisy derivatives, more computationl cost)
mdlSymmetricSampling	boolean	true,false	false	if true, use symmetric (AKA antithetical) sampling to reduce variance in the derivatives

Parameter names and values are separated by whitespace, and must
themselves contain no whitespace. Lists are comma separated, e.g.:

trainFile a.nc,b.nc,c.nc

and lists of lists are semicolon separated, e.g.:

hiddenBlock 3,3;4,4

See the ‘examples’ directory for examples of config files.

To override parameters at the command line, the syntax is:

rnnlib --OPTION_NAME=VALUE CONFIG_FILE

so e.g.

rnnlib --learnRate=1e-5 CONFIG_FILE

will override the learnRate set in the config file.

Autosave

If the 'autosave' option is true the system will store all dynamic
information (e.g. network weights) as it runs. Without this there will
be no way to to resume an interrupted experiment (e.g. if a computer
crashes) and the final trained system will not be saved. If saving is
activated, timestamped config files with dynamic information appended
will be saved after each training epoch, and whenever one of the error
measures for the given task is improved on. In addition a timestamped
log file will be saved, containing all the console output. For example,
for a classification task, the command

rnnlib --autosave=true classification.config

might create the following files

classification@2009.07.17-13.08.40.712422.best_classificationError.save
classification@2009.07.17-13.08.40.712422.best_crossEntropyError.save
classification@2009.07.17-13.08.40.712422.last.save
classification@2009.07.17-13.08.40.712422.log

Data File Format

All RNNLIB data files (for training, testing and validation) are in
netCDF format, a binary
file format designed for large scientific datasets.

A netCDF file has the following basic structure:

Dimensions:

o …

Variables:

o …

Data:

o …

Following the statement ‘Variables’ the variables that will listed in
the ‘Data’ section are declared. For example

float foo[ 3 ]

would declare an array of floats with size 3. For saving variable sized
array the size can be declared after ‘Dimensions’. So the example would
look like:

Dimensions:
fooSize= 3
Variables:
float foo[ fooSize ];

Following ‘Data’ the actual values are stored:

Data:
foo = 1,2,3;

The data format for RNNLIB is specified below. The codes at the start
determine which tasks the dimension/variable is required for:

R = regression (sum-of-squares error with linear outputs)
T = transcription (sequence labelling with connectionist temporal
classification outputs)
C = classification (cross-entropy error with softmax outputs)
SC = sequence_classification (as above, but only one target per
sequence)
O = optional, not required for any task

Dimensions:

numSeqs = total number of data sequences
numTimesteps = total number of timesteps (sum of lengths of all
sequences)
inputPattSize = size of input vectors (e.g. 3 if input points are
RGB pixels)
( O ) maxSeqTagLength = length of longest sequence tag string
(including null terminator)
( R ) targetPattSize = size of target vectors
( T, SC ) maxTargStringLength = length of longest target string
(including null terminator)
( T, C, SC ) numLabels = number of distinct class labels
( T, C, SC ) maxLabelLength = length of longest label string
(including null terminator)

Variables:

float inputs[numTimesteps,inputPattSize] = array of input vectors
int seqDims[numSeqs,numDims] = array of sequence dimensions
( R ) float targetPatterns[numTimesteps,targetPattSize] = array of
regression target vectors
( C ) int targetClasses[numTimesteps] = array of target classes
( T, SC ) char targetStrings[numSeqs,maxTargStringLength] = array of
target strings for transcription
( T, C, SC ) char labels[numLabels, maxLabelLength] = class label
names (can just be “1”,“2”…)
( O ) char seqTags[numSeqs,maxSeqTagLength] = array of tags for
sequences (e.g. filename they were created from)

netCDF Operator provides several tools
for creating, manipulating and displaying netCDF files, and is
recommended for anyone wanting to make their own datasets. In particular
the toold ncgen and ncdump convert ASCII text files to and from netcdf
format.

Examples

The ‘examples’ directory provides example experiments that can be run
with RNNLIB. To run the experiments, the ‘utilities’ directory must be
added to your pythonpath, and the following python packages must be
installed:

In each subdirectory type

./build_netcdf

to build the netcdf datasets, then

rnnlib SAMPLE_NAME.config

to run the experiments. Note that some directories may contain more than
1 config file, since different tasks may be defined for the same data.

The results of these experiments will not correspond to published
results, because only a fraction of the complete dataset is used in each
case (to keep the size of the distribution down). In addition, early
stopping is not used, because no validation files are created. However
the same scripts can be used to build realistic experiments, given more
data.

If you want to adapt the python scripts to create netcdf files for your own experiments, here is a useful tutorial on using netcdf with python.

Utilities

The ‘utilities’ directory provides a range of auxiliary tools for
RNNLIB. In order for these to work, the directory containing the
‘rnnlib’ binary must be added to your path. The ‘utilities’ directory
must be added to your pythonpath for the experiments in the ‘examples’
directory to work. The most important utilities are:

dump_sequence_variables.sh: writes to file all the internal
variables (activations, delta terms etc.) of the network while
processing a single sequence
plot_variables.py: plots a single variable file saved with
‘dump_sequence_variables’
plot_errors.sh: plots the error curves written to a log file during
training
normalise_inputs.sh: adjusts the inputs of one or more netcdf files
to have mean 0, standard deviation 1 (relative to the first file,
which should be used for training)
gradient_check.sh: numerically checks the network’s gradient
calculation

All files should provide a list of arguments if called with no
arguments.The python scripts will give a list of optional arguments,
defaults etc. if called with the single argument ‘-h’. The following
python libraries are required for some of the scripts:

SciPy (for all scripts)
matplotlib (for all
plotting/visualisation scripts)
PIL (for
plot_variables.py)
ScientificPython
(for normalise_inputs.sh)

Experimental Results

RNNLIB’s best results so far have been in speech and handwriting
recognition. It has matched the best recorded performance in phoneme
recognition on the TIMIT database⁹, and recently won three handwriting
recognition competitions at the ICDAR 2009 conference, for offline
French¹⁰, offline Arabic¹¹ and offline Farsi character
classification¹². Unlike the competing systems, RNNLIB worked entirely
on raw inputs, and therefore did not require any preprocessing or
alphabet-specific feature extraction. It also has among the best
published results on the IAM Online and IAM offline English handwriting
databases¹³.

Citations

If you use RNNLIB for your research, please cite it with the following
reference:

@misc
{rnnlib,
Author = {Alex Graves},
Title = {RNNLIB: A recurrent neural network library for sequence learning problems},
howpublished = {\url{http://sourceforge.net/projects/rnnl/}}}

References

¹ Sepp Hochreiter and Jürgen Schmidhuber
Long Short-Term Memory
Neural Computation, 9(8):1735-1780, 1997

² Alex Graves and Jürgen Schmidhuber.
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Neural Networks, 18(5-6):602-610, June 2005

³ Alex Graves, Santiago Fernández, Faustino Gomez and Jürgen Schmidhuber.
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
International Conference on Machine Learning, June 2006, Pittsburgh

⁴ Alex Graves, Santiago Fernández and Jürgen Schmidhuber.
Multidimensional recurrent neural networks
International Conference on Artificial Neural Networks, September 2007,
Porto

⁵ Alex Graves.
Supervised Sequence Labelling with Recurrent Neural Networks
PhD thesis, July 2008, Technische Universität München

⁶ Alex Graves and Jürgen Schmidhuber.
Offline handwriting recognition with multidimensional recurrent neural networks
Advances in Neural Information Processing Systems, December 2008,
Vancouver

⁷ Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, and Jürgen Schmidhuber.
A novel connectionist system for unconstrained handwriting recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence,
31(5):855-868, May 2009

⁸ Alex Graves, Santiago Fernández, Marcus Liwicki, Horst Bunke, and Jürgen Schmidhuber.
Unconstrained online handwriting recognition with recurrent neural networks
Advances in Neural Information Processing Systems, December 2007,
Vancouver

⁹ Santiago Fernández, Alex Graves, and Jürgen Schmidhuber.
Phoneme recognition in TIMIT with BLSTM-CTC
Technical Report IDSIA-04-08, IDSIA, April 2008.

¹⁰ E. Grosicki, H. El Abed ICDAR 2009 Handwriting Recognition
Competition
International Conference on Document Analysis and Recognition, July
2009, Barcelona

¹¹ V. Märgner and H. El Abed.
ICDAR 2009 Arabic Handwriting Recognition Competition
International Conference on Document Analysis and Recognition, July
2009, Barcelona

¹² S. Mozaffari and H. Soltanizadeh.
ICDAR 2009 Handwritten
Farsi/Arabic Character Recognition Competition
International Conference on Document Analysis and Recognition, July
2009, Barcelona

¹³ Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami,
Horst Bunke, and Jürgen Schmidhuber.
A novel connectionist system for unconstrained handwriting recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence,
31(5):855-868, May 2009

¹⁴ Alex Graves.
Practical Variational Inference For Neural Networks
Advances in Neural Information Processing Systems, December 2011, Granada, Spain