RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition.
RNNLIB is a recurrent neural network library for sequence labelling
problems, such as speech and handwriting recognition. It implements the
Long Short-Term Memory (LSTM) architecture1, as well as more
traditional neural network structures, such as Multilayer Perceptrons
and standard recurrent networks with nonlinear hidden units. Its most
important features are:
All of which are explained in more detail in my Ph.D. thesis5. The
library also implements the multilayer, subsampling structure developed
for offline arabic handwriting recognition6. This structure allows the
network to efficiently label high resolution data such as raw images and
speech waveforms.
Taken together, the above components make RNNLIB a generic system for
labelling and classifying data with one or more spatiotemporal
dimensions. Perhaps its greatest strength is its flexibility: as well as
speech and handwriting7 recognition, it has so far been applied (with
varying degrees of success) to image classification, object recognition,
facial expression recognition, EEG and fMRI classification, motion
capture labelling, robot localisation, wind turbine energy prediction,
signature verification, image compression and touch sensor
classification. RNNLIB is also able to accept a wide variety of
different input representations for the same task, e.g. raw sensor data
or hand-crafted features (as shown for online handwriting8). See my
homepage for more publications.
RNNLIB also implements adaptive weight noise regularisation14, which makes it possible to train an arbitrary neural network with stochastic variational inference (or equivalently, to minimise the two part description length of the training data given the network weights plus the weights themselves). This form of regularisation makes overfitting virtually impossible; however it can lead to very long training times.
RNNLIB is written in C++ and should compile on any platform. However it
is currently only tested for Linux and OSX.
Building it requires the following:
In addition, the following python packages are needed for the auxiliary
scripts in the ‘utils’ directory:
And these packages are needed to create and manipulate netcdf data files
with python, and to run the experiments in the ‘examples’ directory:
To build RNNLIB, first download the source, then enter the root
directory and type
./configure make
This should create the binary file ‘rnnlib’ in the ‘src’ directory. Note
that on most linux systems the default installation directory for the
Boost headers is ‘/usr/local/include/boost-VERSION_NUMBER’ which is not
on the standard include path. In this case type
CXXFLAGS=-I/usr/local/include/boost-VERSION_NUMBER/ ./configure make
If you wish to install the binary type:
make install
By default this will use ‘/usr’ as the installation root (for which you
will usually need administrator privileges). You can change the install
path with the --prefix option of the configure script (use ./configure
--help for other options)
It is recommended that you add the directory containing the ‘rnnlib’
binary to your path, as otherwise the tools in the ‘utilities’ directory
will not work.
Project files are provided for the following integrated development
environments in the ‘ide’ directory:
RNNLIB can be run from the command line as follows:
Usage: rnnlib [config_options] config_file config_options syntax: --<variable_name>=<variable_value> whitespace not allowed in variable names or values all config_file variables overwritten by config_options setting <variable_value> = "" removes the variable from the config repeated variables overwritten by last specified
All the parameters determining the network structure, experimental setup
etc. can be specified either in the config file or on the command line.
The main parameters are as follows:
Parameter | Type | Allowed Values | Default | Comment |
---|---|---|---|---|
autosave | boolean | true,false | false | see below |
batchLearn | boolean | true,false | true if RPROP is used, false otherwise | false => gradient descent updates at the end of each sequence, true => at the end of epochs only |
dataFraction | real | 0-1 | 1 | determines fraction of the data to load |
hiddenBlock | list of integer lists | all >=1 | Hidden layer block dimensions | |
hiddenSize | integer list | all >=1 | Sizes of the hidden layers | |
hiddenType | string | tanh, linear, logistic, lstm, linear_lstm, softsign | lstm | Type of units in the hidden layers |
inputBlock | integer list | all >= 1 | Input layer block dimensions | |
maxTestsNoBest | integer | >=0 | 20 | Number of error tests without improvement on the validation set before early stopping |
optimiser | steepest, rprop | steepest | ||
learnRate | real | 0-1 | 1e-4 | Learning rate (steepest descent optimiser only) |
momentum | real | 0-1 | 0.9 | Momentum (steepest descent optimiser only) |
subsampleSize | integer list | all >= 1 | Sizes of hidden subsample layers | |
task | string | classification, sequence_classification, transcription | Network task. sequence_* => one target for whole sequence (not for each point in the sequence). transcription => unsegmented sequence labelling with CTC. | |
trainFile | string list | Netcdf files used for training. Note that all datasets can consist of multiple files. During each training epoch, the files will be cycled through in random order, with the sequences cycled randomly within each file | ||
valFile | string list | Netcdf files used for validation / early stopping | ||
testFile | string list | Netcdf files used for testing | ||
verbose | boolean | true,false | false | Verbose console output |
mdl | boolean | true,false | false | Use adaptive weight noise (M)inimum (D)escription (L)ength regularisation |
mdlWeight | real | 0-1 | 1 | weight for MDL regularisation (0 => no regularisation; 1 => true MDL) |
mdlInitStdDev | real | >0 | 0.075 | initial std. dev. for MDL adaptive weight noise |
mdlSamples | int | >=1 | 1 | number of Monte Carlo samples to pick for each sequence to get stochastic derivs for MDL adaptive weight noise (more samples => less noisy derivatives, more computationl cost) |
mdlSymmetricSampling | boolean | true,false | false | if true, use symmetric (AKA antithetical) sampling to reduce variance in the derivatives |
Parameter names and values are separated by whitespace, and must
themselves contain no whitespace. Lists are comma separated, e.g.:
trainFile a.nc,b.nc,c.nc
and lists of lists are semicolon separated, e.g.:
hiddenBlock 3,3;4,4
See the ‘examples’ directory for examples of config files.
To override parameters at the command line, the syntax is:
rnnlib --OPTION_NAME=VALUE CONFIG_FILE
so e.g.
rnnlib --learnRate=1e-5 CONFIG_FILE
will override the learnRate set in the config file.
If the 'autosave' option is true the system will store all dynamic
information (e.g. network weights) as it runs. Without this there will
be no way to to resume an interrupted experiment (e.g. if a computer
crashes) and the final trained system will not be saved. If saving is
activated, timestamped config files with dynamic information appended
will be saved after each training epoch, and whenever one of the error
measures for the given task is improved on. In addition a timestamped
log file will be saved, containing all the console output. For example,
for a classification task, the command
rnnlib --autosave=true classification.config
might create the following files
All RNNLIB data files (for training, testing and validation) are in
netCDF format, a binary
file format designed for large scientific datasets.
A netCDF file has the following basic structure:
o …
o …
o …
Following the statement ‘Variables’ the variables that will listed in
the ‘Data’ section are declared. For example
float foo[ 3 ]
would declare an array of floats with size 3. For saving variable sized
array the size can be declared after ‘Dimensions’. So the example would
look like:
Dimensions: fooSize= 3 Variables: float foo[ fooSize ];
Following ‘Data’ the actual values are stored:
Data: foo = 1,2,3;
The data format for RNNLIB is specified below. The codes at the start
determine which tasks the dimension/variable is required for:
Dimensions:
Variables:
netCDF Operator provides several tools
for creating, manipulating and displaying netCDF files, and is
recommended for anyone wanting to make their own datasets. In particular
the toold ncgen and ncdump convert ASCII text files to and from netcdf
format.
The ‘examples’ directory provides example experiments that can be run
with RNNLIB. To run the experiments, the ‘utilities’ directory must be
added to your pythonpath, and the following python packages must be
installed:
In each subdirectory type
./build_netcdf
to build the netcdf datasets, then
rnnlib SAMPLE_NAME.config
to run the experiments. Note that some directories may contain more than
1 config file, since different tasks may be defined for the same data.
The results of these experiments will not correspond to published
results, because only a fraction of the complete dataset is used in each
case (to keep the size of the distribution down). In addition, early
stopping is not used, because no validation files are created. However
the same scripts can be used to build realistic experiments, given more
data.
If you want to adapt the python scripts to create netcdf files for your own experiments, here is a useful tutorial on using netcdf with python.
The ‘utilities’ directory provides a range of auxiliary tools for
RNNLIB. In order for these to work, the directory containing the
‘rnnlib’ binary must be added to your path. The ‘utilities’ directory
must be added to your pythonpath for the experiments in the ‘examples’
directory to work. The most important utilities are:
All files should provide a list of arguments if called with no
arguments.The python scripts will give a list of optional arguments,
defaults etc. if called with the single argument ‘-h’. The following
python libraries are required for some of the scripts:
RNNLIB’s best results so far have been in speech and handwriting
recognition. It has matched the best recorded performance in phoneme
recognition on the TIMIT database9, and recently won three handwriting
recognition competitions at the ICDAR 2009 conference, for offline
French10, offline Arabic11 and offline Farsi character
classification12. Unlike the competing systems, RNNLIB worked entirely
on raw inputs, and therefore did not require any preprocessing or
alphabet-specific feature extraction. It also has among the best
published results on the IAM Online and IAM offline English handwriting
databases13.
If you use RNNLIB for your research, please cite it with the following
reference:
@misc
{rnnlib,
Author = {Alex Graves},
Title = {RNNLIB: A recurrent neural network library for sequence learning problems},
howpublished = {\url{http://sourceforge.net/projects/rnnl/}}}
1 Sepp Hochreiter and Jürgen Schmidhuber
Long Short-Term Memory
Neural Computation, 9(8):1735-1780, 1997
2 Alex Graves and Jürgen Schmidhuber.
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Neural Networks, 18(5-6):602-610, June 2005
3 Alex Graves, Santiago Fernández, Faustino Gomez and Jürgen Schmidhuber.
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
International Conference on Machine Learning, June 2006, Pittsburgh
4 Alex Graves, Santiago Fernández and Jürgen Schmidhuber.
Multidimensional recurrent neural networks
International Conference on Artificial Neural Networks, September 2007,
Porto
5 Alex Graves.
Supervised Sequence Labelling with Recurrent Neural Networks
PhD thesis, July 2008, Technische Universität München
6 Alex Graves and Jürgen Schmidhuber.
Offline handwriting recognition with multidimensional recurrent neural networks
Advances in Neural Information Processing Systems, December 2008,
Vancouver
7 Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, and Jürgen Schmidhuber.
A novel connectionist system for unconstrained handwriting recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence,
31(5):855-868, May 2009
8 Alex Graves, Santiago Fernández, Marcus Liwicki, Horst Bunke, and Jürgen Schmidhuber.
Unconstrained online handwriting recognition with recurrent neural networks
Advances in Neural Information Processing Systems, December 2007,
Vancouver
9 Santiago Fernández, Alex Graves, and Jürgen Schmidhuber.
Phoneme recognition in TIMIT with BLSTM-CTC
Technical Report IDSIA-04-08, IDSIA, April 2008.
10 E. Grosicki, H. El Abed ICDAR 2009 Handwriting Recognition
Competition
International Conference on Document Analysis and Recognition, July
2009, Barcelona
11 V. Märgner and H. El Abed.
ICDAR 2009 Arabic Handwriting Recognition Competition
International Conference on Document Analysis and Recognition, July
2009, Barcelona
12 S. Mozaffari and H. Soltanizadeh.
ICDAR 2009 Handwritten
Farsi/Arabic Character Recognition Competition
International Conference on Document Analysis and Recognition, July
2009, Barcelona
13 Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami,
Horst Bunke, and Jürgen Schmidhuber.
A novel connectionist system for unconstrained handwriting recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence,
31(5):855-868, May 2009
14 Alex Graves.
Practical Variational Inference For Neural Networks
Advances in Neural Information Processing Systems, December 2011, Granada, Spain