In data classification, there are no particular classifiers that perform consistently in every case. This is even worst in case of both the high dimensional and class-imbalanced datasets.

To overcome the limitations of class-imbalanced data, we split the dataset using a random sub-sampling to balance them. Then, we apply the (alpha,beta)-k feature set method to select a better subset of features and combine their outputs to get a consolidated feature set for classifier training.

To enhance classification performances, we propose an ensemble of classifiers that combine the classification outputs of base classifiers using the simplest and largely used majority voting approach.

Instead of creating the ensemble using all base classifiers, we have implemented a genetic algorithm (GA) to search for the best combination from heterogeneous base classifiers.

The classification performances achieved by the proposed method method on the chosen datasets are promising.

Features

  • Generate Cross Validation folds and save datasets into disk for future usage in ARFF format
  • Generate and serialize into disk of Classifier Models for all cross validation Training Folds for use by GA-EoC
  • Generate and serialize into disk of All Base Classifier Models using the Full Training dataset.
  • Search for best ensemble combinations to create heterogeneous ensemble of classifiers using k-fold cross validation on training dataset (using pre-generated CV dataset and models)
  • Evaluate the performance of best ensemble combination on unknown Testing Data (use pre-generated models using full training data)

Project Samples

Project Activity

See All Activity >

Categories

Machine Learning

License

Creative Commons Attribution Non-Commercial License V2.0

Follow GA-EoC

GA-EoC Web Site

You Might Also Like
Component Content Management System for Software Documentation Icon
Component Content Management System for Software Documentation

Great tool for serious technical writers

Paligo is an end-to-end Component Content Management System (CCMS) solution for technical documentation, policies and procedures, knowledge management, and more.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of GA-EoC!

Additional Project Details

Operating Systems

Linux, BSD

Intended Audience

Information Technology, Science/Research, Education, Advanced End Users, Developers

User Interface

Console/Terminal, Command-line

Programming Language

Java

Related Categories

Java Machine Learning Software

Registered

2013-08-07