Read Me
ECOC PAK is a C++ Library for the Error Correcting Output
Codes classification framework. It supports several coding
and decoding designs as well as several base classifiers.
ECOC PAK is available at
http://ecocpak.sourceforge.net/
Table of Contents
=================
- Quick Start
- Installation and Data Format
- `ecoc_pak' Program Usage
- Tips on Practical Use
- Examples
- Library Usage
- MEXfiles
- Building Windows Binaries
- Licence
- Additional Information
Quick Start
===========
Usage: ecoc_pak [options] training_file [test_file]
Examples:
1) For datasets having training (e.g., vowel.scale) and test file (e.g., vowel.scale.t).
$./ecoc_pak vowel.scale vowel.scale.t
2) For dataset having only training file (e.g., glass-scale) you can use K folds cross-validation.
$./ecoc_pak glass-scale
If the number K of folds is not specified, like in the above example, the default number of folds K = 10.
In both the above examples we are using default coding (i.e., 1 vs 1), default
decoding (i.e., Hamming) and default classifier (i.e., NMC).
In order to see all available options run `ecoc_pak' without any arguments.
In POSIX operating systems where the NCurse library is installed,
you can also activate the user friendly menus option by adding the -M option.
The user friendly menus option is not available for Windows systems.
Examples:
1) For datasets having training (e.g., vowel.scale) and test file (e.g., vowel.scale.t).
$./ecoc_pak -M vowel.scale vowel.scale.t
2) For datasets having only training file you can use cross-validation.
$./ecoc_pak -M -v 5 glass-scale
Installation and Data Format
============================
ECOC PAK is based on the Armadillo C++ Linear Algebra Library
so the existence of this specific library to enable compilation
is obligatory. Armadillo C++ Linear Algebra Library is available
at: http://arma.sourceforge.net
On Unix systems ECOC PAK uses the NCurses library to enable the
user friendly interface menus. Provided that Armadillo C++ Linear
Algebra Library and NCurses library are installed on your system
type `make' to build the ecoc_pak binary. Once compilation is
finished run the binary without arguments to see the usage of it.
On other systems, consult `Makefile' to build them (e.g., see
'Building Windows binaries' in this file) or use the pre-built
binaries, binaries for various operating systems and architectures
(i.e., windows 32bit, windows 64bit, linux 32 bit, linux 64 bit and
MACOSX 64 bit) are available at: http://ecocpak.sourceforge.net/download.html
ECOC PAK uses as SVM classifier the implementation of The LIBSVM.
The ECOC PAK comes with latest flavour of the LIBSVM. But in the case
that we missed a release just copy the svm.h and svm.cpp files located
in the new LIBSVM tarball in the ECOC PAK folder and compile (That easy!!!).
The format of training and test data files follows the LIBSVM -
SVMLIGHT format, that is:
<label> <index1>:<value1> <index2>:<value2> ...
.
.
.
Each line contains an instance and is ended by a '\n' character.
For classification, <label> is an integer indicating the class label
(multi-class is supported). <index> is an integer starting from 1
and <value> is a real number. Indices must be in ASCENDING order.
Labels in the test file are only used to calculate accuracy or
errors. If they are unknown, just fill the first column with any numbers.
Sample classification datasets included in this package is
`glass-scale', `iris-scale' and `vowel.scale', `vowel.scale.t'.
You can find the sample datasets under the directory `sample_datasets/'.
`ecoc_pak' Program Usage
=================
Usage: ecoc_pak [options] training_set_file [test_set_file]
> ECOC PAK options:
-M: activates user menu
-V filename: activates verbose output. If output file not specified
by user standard output is used. If output file not
specified by user and user menus are actived default
output file is OUTPUT.txt
-C type_of_classifier:
- 0 -- Nearest Class Centroid Classifier (NCC)
- 1 -- Fisher's Linear Discriminant followed by NCC (FLDA+NCC)
- 2 -- Support Vector Machine (SVM)
- 3 -- AdaBoost
- 4 -- Sum of Error Squares Classifier
- 5 -- Custom Classifer
-G coding_strategy:
- 0 -- One Versus One (default)
- 1 -- One Versus All
- 2 -- DECOC
- 3 -- subDECOC (DECOC with subclasses)
- 4 -- Dense Random
- 5 -- Sparse Random
- 6 -- ECOC One
- 7 -- Forest ECOC
- 8 filename -- Custom Coding with coding matrix stored in filename
-D decoding_strategy:
- 0 -- Hamming (default)
- 1 -- Euclidean
- 2 -- Laplacian
- 3 -- Hamming Attenuated
- 4 -- Euclidean Attenuated
- 5 -- Linear Loss Weighted Decoding
- 6 -- Exponential Loss Weighted Decoding
- 7 -- Linear Loss Based Decoding
- 8 -- Exponential Loss Based Decoding
- 9 -- Beta Density Decoding
- 10 -- Probabilistic Based Decoding
- 11 -- Inverse Hamming Decoding
- 12 -- Custom Decoding
-P performance: set performance threshold for subDECOC (default 0%)
-I improvement: set improvement threshold for subDECOC (default 1%)
-S size: set minimum cluster size threshold for subDECOC (default 2)
-R criterion: optimization criterion for the SFFS algorithm
- 0 -- Fast Quadratic Mutual Information (FQMI)
- 1 -- Fisher's Linear Discriminant Ratio (FLDR)
- 2 -- Custom Decomposition Criterion
-A n_trees: set n_trees to the number of maximum created trees in
the Forest ECOC coding strategy
-A n_matrices: set n_matrices to the number of valid matrices to be
examined for dense or sparse random coding
-L n_columns: set n_columns to the number of coding matrices number of
columns for dense or sparse random coding
-Q validation: percentage of validation set for ECOC ONE coding strategy
(default 15%)
-N init_coding: initial coding for ECOC ONE coding strategy (see option -G)
(default one versus all)
-U ecocone_mode: ECOC ONE mode
- 0 -- Pair of classes with highest error (default)
- 1 -- Exhaustive search via SFFS with specified criterion (see option -R)
-W wvalidation: Weight for validation set for ECOC ONE (default 0.5)
-E epsilon: ECOC ONE epsilon (default 0.05)
-X max_iter: Maximum iterations for ECOC ONE (default 10)
> Sum of Error Squares Classifier options:
-c reg_param: Value of regularization parameter (default 1)
> AdaBoost Classifier options:
-d n_weakclassifiers: Maximum number of weak classifiers (default 3)
> LIBSVM options:
-s svm_type: set type of SVM (default 0)
- 0 -- C-SVC
- 1 -- nu-SVC
-t kernel_type : set type of kernel function (default 2)
- 0 -- linear: u'*v
- 1 -- polynomial: (gamma*u'*v + coef0)^degree
- 2 -- radial basis function: exp(-gamma*|u-v|^2)
- 3 -- sigmoid: tanh(gamma*u'*v + coef0)
- 4 -- precomputed kernel (kernel values in training_set_file)
-d degree: set degree in kernel function (default 3)
-g gamma: set gamma in kernel function (default 1/k)
-r coef0: set coef0 in kernel function (default 0)
-c cost: set the parameter C of C-SVC (default 1)
-n nu: set the parameter nu of nu-SVC (default 0.5)
-m cachesize: set cache memory size in MB (default 100)
-e epsilon: set tolerance of termination criterion (default 0.001)
-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates: whether to train a SVC for probability estimates,
0 or 1 (default 0)
-wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)
-v n: n-fold cross validation mode
(Important!!!): The -M option does not apply for setting the lower letter options.
Tips on Practical Use
=====================
* In dense and sparse random codings choosing a large number for the columns
of the valid coding matrices can slow severely your experiments. A good
number to choose is close to the number of dataset's classes.
* Choosing FQMI as a decomposition criterion for DECOC evolves complexity of
O(N^2) (where N the number of dataset's samples), that is for datasets with
large number of samples can result depending on your system in slow experiments.
* The attributes of the classifiers are set via the command line. This is done
for backward compatibility with LIBSVM.
Examples
========
For the ECOC PAK library usage there are several example programs under the `examples'
directory. For the ECOC PAK program see examples below:
1) 7 fold Cross-validation with SVM classifier with user menu activated
setting C = 100 for SVM and linear kernel.
$./ecoc_pak -M -v 7 -C 2 -c 100 -t 0 glass-scale
2) Training - testing with FLDA followed by nearest centroids classifier, with command
line options, using the subDECOC framework, setting the thresholds to (perf = 0%,
size = 2, impr = 1%).
$./ecoc_pak -C 1 -P 0 -S 2 -I 1 vowel.scale vowel.scale.t
3) 5 fold Cross-validation with SVM classifier with command line options
using One VS One coding and Hamming decoding.
$./ecoc_pak -v 5 -C 2 -G 0 -D 0 glass-scale
Library Usage
=============
To see the library usage you can check the library reference available
in the ECOC PAK site at:
http://ecocpak.sourceforge.net/libref.html
MEXfiles
========
ECOC PAK also provides MEXfiles MATLAB interfaces for the majority of
its library functions. MEXfiles executables for various operating systems
and architectures as well as the source code to build them are available
at:
http://ecocpak.sourceforge.net/download.html
Building Windows Binaries
=========================
Although windows binaries are available and can be downloaded at:
http://ecocpak.sourceforge.net/download.html
you can also build your own binaries. This requires that you have
previously included in your project the Armadillo C++ Linear Algebra
library as well as the precompiled binaries for BLAS and LAPACK
library.
Aditionally at the SVN repository of the ECOC PAK project at:
http://ecocpak.svn.sourceforge.net/viewvc/ecocpak/windows/
you can download the file ecocpak_msvs2010.zip which contains
a Microsoft Visual Studio 2010 Project with the ECOC PAK program.
Licence
=======
The source code for the ECOC PAK library can be distributed
and/or modified under the terms of the GNU Lesser General
Public License (LGPL) as published by the Free Software
Foundation, either version 3 of the License or (at your option)
any later version.
LGPL v3 is formulated as an extension/modification of the
GPL v3 license. You are free to choose the license for work that
uses the ECOC PAK library (e.g. a proprietary application),
provided some conditions are met. Please see:
http://www.opensource.org/licenses/lgpl-3.0.html
and
http://www.opensource.org/licenses/gpl-3.0.html
Additional Information
======================
If you find ECOC PAK C++ helpful, please cite it as
Nikolaos Arvanitopoulos, Dimitrios Bouzas and Anastasios Tefas, Subclass
Error Correcting Output Codes using Fisher's Linear Discriminant Ratio,
International Conference on Pattern Recognition (ICPR), 2010.
Software available at http://ecocpak.sourceforge.net
For any questions and comments, please email bouzas@ieee.org or
niarvani@ieee.org or tefas@aiia.csd.auth.gr
Acknowledgements:
We would like to thank the Aristotle University of Thessaloniki
in general and in particular the staff of the Artificial
Intelligence and Information Analysis (AIIA) laboratory for the
substantial support both in resources and motivation.