This project is being moved to github. In the process, I am merging it with two other libaries: libpetey and ctraj since the three libraries have a number of dependencies. You can find the project here:
https://github.com/Peteysoft/libmsci
There is a very serious bug in the class borders sampling routines affecting small datasets only. Discovering this bug has influenced my decision to start writing unit-tests as well as improving test coverage overall (black box as well as white box testing). See my blog post: Test driven development The library is due for a new release anyway since there are many new developments, including continuum retrievals based on classification results. Until that time, however, you can download the affected file, "sample_class_borders.cc", directly from the SVN repository.... read more
A major new feature of libAGF is the ability to build arbitrary multi-class classification models by combining LIBSVM binary models. These models can then be "accelerated" by converting them completely to libAGF-native, border-sampling models. See my blog post for details.
New in this version:
bug fixes: svm file conversion works properly and is more general
non-hierarchical multi-borders has 3 options for solving for the conditional
probabilities: matrix inversion, voting, and matrix inversion over-ridden
by voting, with re-normalization
multi-borders now works with external binary classifiers (especially LIBSVM)
random numbers resolve a tie when selecting classes based on probabilities... read more
In this version:
hierarchical clustering
multi-class classification using a recursive control language
For more details, check the NEWS file.
I'm ramping up for a big new release. There are many changes and (hopefully) improvements, but the most significant is the addition of "multi-borders" classification: multi-class class classification with the AGF borders technique. I've tested the technique already and I'm very pleased with just how well it works. Also on the plate: improved pre-processing including singular value decomposition (SVD), dendrograms, re-organizing the library and main routines, as well as two new test cases. Getting all this up and running is a fair bit of effort, so the release may not be for a while yet.
to those who submitted bug reports. The previous version was loaded with them. Hopefully the latest release should be a little more stable.
Not much really new here. The most exciting is a script for validating probablility density function estimates. It works by first generating a simulated dataset using the metropolis method. The distribution of this simulated dataset should roughly match that of the training data. The training dataset is then split up into two or more subsets and PDF's estimated at each point in the simulated dataset and a cross correlation matrix calculated. Note that parameters used for testing (k, W) will need to be scaled up to match the full size of the training dataset--i.e. multiply by the number of divisions.
The conversion utility, svm2agf, will allow you to convert many of the datasets collected on the LIBSVM website:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
for comparison and testing.
One of the file conversion utilities (svm2agf -- for converting LIBSVM-compatible ASCII files to libagf-compatible binary files) did not work as expected. The patch addresses this problem.
New in this version:
- all functions (except IO) have been templated
- all variables in main routines have been typedef'd
- improved file conversion utilities
Several changes to this version: the "libpetey" library is no longer part of the "libagf" distribution. Instead, look for it in "msci". Nonetheless, I have placed the latest version in the download repository. Also, the class borders codes will no longer generate duplicate border samples.
A paper on the AGF algorithm has been accepted for publication in the International Journal of Remote Sensing. It is entitled, "Efficient statistical classification of satellite measurements," and should be appearing in print shortly.
1. The project website boasts about performing k-nearest-neighbours searches in n log k time. Without any initial binning, this could, in fact, be done in only n time if the selection algorithm were updated to something based on a quicksort. Code already exists in the repository. Tests show a significant speed improvement.
2. Updating the n-fold cross-validation program to work with pdf estimation and non-linear regression.
3. There may be duplicates in the border samples, especially with small datasets. This needs to be fixed.
The project website has been updated. While the basic content remains the same, figures have been improved for better viewing. A donation button has been added. Also: it works properly on Internet Explorer! The previous version had only been tested on Firefox.
AGF has been used to retrieve water vapour in the upper troposphere. The work has been published in Computers & Geosciences, volume 35, pages 2020-2031. See project website for details.
If you have any problems with this software, please e-mail me at:
peteymills@hotmail.com
If you can describe the problem clearly and succinctly, I can probably have a patch ready for you within the next day or two.
A simple clustering analysis program has been added to the mix. My initial goal was to use a threshold density and find all the iso-surfaces for that density, possibly using an algorithm similar to that for finding class borders. This should be simpler and faster than a hierarchical clustering, although less general since the analysis will need to be repeated every time we try a different threshold. I didn't want to just write a hierarchichal clustering algorithm since it doesn't really "fit" with the rest of the library.... read more
The new release includes many bug fixes--they were legion--but also some improved functionality.
Direct classification routines will now return the joint probabilities in addition to conditional probabilities. Since you can use either for classification, the joint probabilities provide more information.
It is now possible to search for a class border at some point other than R=0. This is useful if the classes differ greatly in size or if the relative number of samples do not reflect the actual class sizes.... read more
Through my own use of the software, at least two short-comings have become apparent:
1. when the classes are broadly separated, class_borders fails to converge. The fix is likely fairly simple, but will involve a minor "hack."
2. to calculate joint probabilities, you need to make two calls: one to a classification routine and then another to a pdf calculation routine. This is wasteful as it could be done in a single step.
The libagf project has been sitting on sourceforge for over two years now and I've witnessed a steady stream of downloads throughout that time. In spite of this I have yet see a single e-mail either asking for help, requesting bug fixes, thanking me or simply informing me of its use.
If you use this package in your work, please help spread the word by referencing either the website, the included documentation or the following paper:
Peter Mills 2009, "Isoline retrieval: An optimal method for validation of advected contours." Computers & Geosciences, in press (available online).
Version 0.91 has been released. No substantial changes, just bug fixes.
For a concise introduction to the method along with interesting and original applications, visit:
http://libagf.sourceforge.net
The first release of the Adaptive Gaussian Filtering library (libagf) is now available. Very high performance Gaussian kernel estimators and KNN algorithms. Training can be done in as little as 1/25 the time and classifications in better than 1/100 the time as LIBSVM.
libagf, a suite of software for statistical classification, PDF estimation and non-linear regression, has just been posted to sourceforge. The software implements an algorithm called Adaptive Gaussian Filtering (AGF), a high speed, variable bandwidth kernel estimation technique. Stay tuned for the first official release!