Menu

Tree [r137] /
 History

HTTPS access


File Date Author Commit
 binaries 2010-10-09 niarvani [r73] MAC OS X binary
 examples 2012-03-16 dmpouzas [r134] update examples
 mex 2013-07-27 dmpouzas [r137] Bump up
 sample_datasets 2010-10-05 niarvani [r40] Added vowel dataset
 src 2013-07-27 dmpouzas [r137] Bump up
 windows 2010-10-06 dmpouzas [r54] corrections
 Makefile 2013-07-27 dmpouzas [r137] Bump up
 Makefile.mac 2013-07-27 dmpouzas [r137] Bump up
 README 2011-10-04 dmpouzas [r101] Corrections
 user_guide.pdf 2011-10-04 dmpouzas [r102] Uploaded User Guide

Read Me

ECOC PAK is a C++ Library for the Error Correcting Output
Codes classification framework. It supports several coding 
and decoding designs as well as several base classifiers. 

ECOC PAK is available at 
http://ecocpak.sourceforge.net/


Table of Contents
=================

 - Quick Start
 - Installation and Data Format
 - `ecoc_pak' Program Usage
 - Tips on Practical Use
 - Examples
 - Library Usage
 - MEXfiles
 - Building Windows Binaries
 - Licence
 - Additional Information


Quick Start
===========

 Usage: ecoc_pak [options] training_file [test_file]

 Examples:

 1) For datasets having training (e.g., vowel.scale) and test file (e.g., vowel.scale.t). 

  $./ecoc_pak vowel.scale vowel.scale.t
  
 2) For dataset having only training file (e.g., glass-scale) you can use K folds cross-validation.

  $./ecoc_pak glass-scale
  
  If the number K of folds is not specified, like in the above example, the default number of folds K = 10.
  
 In both the above examples we are using default coding (i.e., 1 vs 1), default
 decoding (i.e., Hamming) and default classifier (i.e., NMC). 

 In order to see all available options run `ecoc_pak' without any arguments.

 In POSIX operating systems where the NCurse library is installed,
 you can also activate the user friendly menus option by adding the -M option.
 The user friendly menus option is not available for Windows systems.

 Examples:

 1) For datasets having training (e.g., vowel.scale) and test file (e.g., vowel.scale.t).

  $./ecoc_pak -M vowel.scale vowel.scale.t
  
 2) For datasets having only training file you can use cross-validation.

  $./ecoc_pak -M -v 5 glass-scale


Installation and Data Format
============================

 ECOC PAK is based on the Armadillo C++ Linear Algebra Library
 so the existence of this specific library to enable compilation
 is obligatory. Armadillo C++ Linear Algebra Library is available
 at: http://arma.sourceforge.net

 On Unix systems ECOC PAK uses the NCurses library to enable the
 user friendly interface menus. Provided that Armadillo C++ Linear
 Algebra Library and NCurses library are installed on your system 
 type `make' to build the ecoc_pak binary. Once compilation is 
 finished run the binary without arguments to see the usage of it.

 On other systems, consult `Makefile' to build them (e.g., see
 'Building Windows binaries' in this file) or use the pre-built
 binaries, binaries for various operating systems and architectures
 (i.e., windows 32bit, windows 64bit, linux 32 bit, linux 64 bit and
 MACOSX 64 bit) are available at: http://ecocpak.sourceforge.net/download.html 

 ECOC PAK uses as SVM classifier the implementation of The LIBSVM.
 The ECOC PAK comes with latest flavour of the LIBSVM. But in the case
 that we missed a release just copy the svm.h and svm.cpp files located
 in the new LIBSVM tarball in the ECOC PAK folder and compile (That easy!!!).

 The format of training and test data files follows the LIBSVM -
 SVMLIGHT format, that is:

 <label> <index1>:<value1> <index2>:<value2> ...
 .
 .
 .

 Each line contains an instance and is ended by a '\n' character.  
 For classification, <label> is an integer indicating the class label
 (multi-class is supported). <index> is an integer starting from 1 
 and <value> is a real number. Indices must be in ASCENDING order. 
 Labels in the test file are only used to calculate accuracy or 
 errors. If they are unknown, just fill the first column with any numbers.

 Sample classification datasets included in this package is
 `glass-scale', `iris-scale' and `vowel.scale', `vowel.scale.t'.
 You can find the sample datasets under the directory `sample_datasets/'. 


`ecoc_pak' Program Usage
=================

 Usage: ecoc_pak [options] training_set_file [test_set_file]

 > ECOC PAK options:
  -M: activates user menu
  -V filename: activates verbose output. If output file not specified
               by user standard output is used. If output file not
               specified by user and user menus are actived default
               output file is OUTPUT.txt
  -C type_of_classifier:
     - 0 -- Nearest Class Centroid Classifier (NCC)
     - 1 -- Fisher's Linear Discriminant followed by NCC (FLDA+NCC)
     - 2 -- Support Vector Machine (SVM)
     - 3 -- AdaBoost
     - 4 -- Sum of Error Squares Classifier
     - 5 -- Custom Classifer
  -G coding_strategy:
     - 0 -- One Versus One (default)
     - 1 -- One Versus All
     - 2 -- DECOC
     - 3 -- subDECOC (DECOC with subclasses)
     - 4 -- Dense Random
     - 5 -- Sparse Random
     - 6 -- ECOC One
     - 7 -- Forest ECOC
     - 8 filename -- Custom Coding with coding matrix stored in filename
  -D decoding_strategy:
     - 0  -- Hamming (default)
     - 1  -- Euclidean
     - 2  -- Laplacian
     - 3  -- Hamming Attenuated
     - 4  -- Euclidean Attenuated
     - 5  -- Linear Loss Weighted Decoding
     - 6  -- Exponential Loss Weighted Decoding
     - 7  -- Linear Loss Based Decoding
     - 8  -- Exponential Loss Based Decoding
     - 9  -- Beta Density Decoding
     - 10 -- Probabilistic Based Decoding
     - 11 -- Inverse Hamming Decoding
     - 12 -- Custom Decoding
  -P performance: set performance threshold for subDECOC (default 0%)
  -I improvement: set improvement threshold for subDECOC (default 1%)
  -S size: set minimum cluster size threshold for subDECOC (default 2)
  -R criterion: optimization criterion for the SFFS algorithm
     - 0 -- Fast Quadratic Mutual Information (FQMI)
     - 1 -- Fisher's Linear Discriminant Ratio (FLDR)
     - 2 -- Custom Decomposition Criterion
  -A n_trees: set n_trees to the number of maximum created trees in
              the Forest ECOC coding strategy
  -A n_matrices: set n_matrices to the number of valid matrices to be
                 examined for dense or sparse random coding
  -L n_columns:  set n_columns to the number of coding matrices number of
                 columns for dense or sparse random coding
  -Q validation: percentage of validation set for ECOC ONE coding strategy
                 (default 15%)
  -N init_coding: initial coding for ECOC ONE coding strategy (see option -G)
                  (default one versus all)
  -U ecocone_mode: ECOC ONE mode
     - 0 -- Pair of classes with highest error (default)
     - 1 -- Exhaustive search via SFFS with specified criterion (see option -R)
  -W wvalidation: Weight for validation set for ECOC ONE (default 0.5)
  -E epsilon: ECOC ONE epsilon (default 0.05)
  -X max_iter: Maximum iterations for ECOC ONE (default 10)

> Sum of Error Squares Classifier options:
  -c reg_param: Value of regularization parameter (default 1)

> AdaBoost Classifier options:
  -d n_weakclassifiers: Maximum number of weak classifiers (default 3)

> LIBSVM options:
  -s svm_type: set type of SVM (default 0)
    - 0 -- C-SVC
    - 1 -- nu-SVC
  -t kernel_type : set type of kernel function (default 2)
    - 0 -- linear: u'*v
    - 1 -- polynomial: (gamma*u'*v + coef0)^degree
    - 2 -- radial basis function: exp(-gamma*|u-v|^2)
    - 3 -- sigmoid: tanh(gamma*u'*v + coef0)
    - 4 -- precomputed kernel (kernel values in training_set_file)
  -d degree: set degree in kernel function (default 3)
  -g gamma: set gamma in kernel function (default 1/k)
  -r coef0: set coef0 in kernel function (default 0)
  -c cost: set the parameter C of C-SVC (default 1)
  -n nu: set the parameter nu of nu-SVC (default 0.5)
  -m cachesize: set cache memory size in MB (default 100)
  -e epsilon: set tolerance of termination criterion (default 0.001)
  -h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)
  -b probability_estimates: whether to train a SVC for probability estimates,
                            0 or 1 (default 0)
  -wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)
  -v n: n-fold cross validation mode

  (Important!!!): The -M option does not apply for setting the lower letter options.


Tips on Practical Use
=====================

 * In dense and sparse random codings choosing a large number for the columns
   of the valid coding matrices can slow severely your experiments. A good
   number to choose is close to the number of dataset's classes.

 * Choosing FQMI as a decomposition criterion for DECOC evolves complexity of
   O(N^2) (where N the number of dataset's samples), that is for datasets with
   large number of samples can result depending on your system in slow experiments.
  
 * The attributes of the classifiers are set via the command line. This is done
   for backward compatibility with LIBSVM.


Examples
========

 For the ECOC PAK library usage there are several example programs under the `examples'
 directory. For the ECOC PAK program see examples below:

 1) 7 fold Cross-validation with SVM classifier with user menu activated
    setting C = 100 for SVM and linear kernel.

  $./ecoc_pak -M -v 7 -C 2 -c 100 -t 0 glass-scale
  
 2) Training - testing with FLDA followed by nearest centroids classifier, with command 
    line options, using the subDECOC framework, setting the thresholds to (perf = 0%,
    size = 2, impr = 1%).
   
  $./ecoc_pak -C 1 -P 0 -S 2 -I 1 vowel.scale vowel.scale.t
  
 3) 5 fold Cross-validation with SVM classifier with command line options
    using One VS One coding and Hamming decoding.
   
  $./ecoc_pak -v 5 -C 2 -G 0 -D 0 glass-scale


Library Usage
=============
 
 To see the library usage you can check the library reference available
 in the ECOC PAK site at:
 
 http://ecocpak.sourceforge.net/libref.html

 
MEXfiles
========

 ECOC PAK also provides MEXfiles MATLAB interfaces for the majority of
 its library functions. MEXfiles executables for various operating systems 
 and architectures as well as the source code to build them are available
 at:
 
 http://ecocpak.sourceforge.net/download.html 


Building Windows Binaries
=========================

 Although windows binaries are available and can be downloaded at:
 http://ecocpak.sourceforge.net/download.html
 
 you can also build your own binaries. This requires that you have
 previously included in your project the Armadillo C++ Linear Algebra
 library as well as the precompiled binaries for BLAS and LAPACK
 library.
 
 Aditionally at the SVN repository of the ECOC PAK project at:
 http://ecocpak.svn.sourceforge.net/viewvc/ecocpak/windows/
 
 you can download the file ecocpak_msvs2010.zip which contains
 a Microsoft Visual Studio 2010 Project with the ECOC PAK program.


Licence
=======

 The source code for the ECOC PAK library can be distributed 
 and/or modified under the terms of the GNU Lesser General 
 Public License (LGPL) as published by the Free Software 
 Foundation, either version 3 of the License or (at your option) 
 any later version. 

 LGPL v3 is formulated as an extension/modification of the 
 GPL v3 license. You are free to choose the license for work that 
 uses the ECOC PAK library (e.g. a proprietary application), 
 provided some conditions are met. Please see:

 http://www.opensource.org/licenses/lgpl-3.0.html

 and

 http://www.opensource.org/licenses/gpl-3.0.html


Additional Information
======================

 If you find ECOC PAK C++ helpful, please cite it as

 Nikolaos Arvanitopoulos, Dimitrios Bouzas and Anastasios Tefas, Subclass 
 Error Correcting Output Codes using Fisher's Linear Discriminant Ratio,
 International Conference on Pattern Recognition (ICPR), 2010.
 Software available at http://ecocpak.sourceforge.net

 For any questions and comments, please email bouzas@ieee.org or
 niarvani@ieee.org or tefas@aiia.csd.auth.gr

 Acknowledgements:
 We would like to thank the Aristotle University of Thessaloniki
 in general and in particular the staff of the Artificial 
 Intelligence and Information Analysis (AIIA) laboratory for the 
 substantial support both in resources and motivation.