MOLL Code
Brought to you by:
opx
File | Date | Author | Commit |
---|---|---|---|
doc | 2009-06-21 |
![]() |
[c086db] created git repo |
examples | 2009-09-08 |
![]() |
[06c580] transition mostly done |
java | 2009-06-21 |
![]() |
[c086db] created git repo |
python | 2009-09-09 |
![]() |
[a56168] cleanup |
scripts | 2009-06-21 |
![]() |
[c086db] created git repo |
tests | 2009-06-21 |
![]() |
[c086db] created git repo |
LICENSE | 2009-06-21 |
![]() |
[c086db] created git repo |
README | 2009-06-21 |
![]() |
[c086db] created git repo |
TODO | 2009-06-21 |
![]() |
[c086db] created git repo |
This is MOLL, the Machine learning library, version 0.3 beta ---- Features -------- * multi-core support * run and analyse everything from python * very easy to test different algorithms and parameters * tested on large data (around a giga-byte per each set) * currently implemented algorithms: MLP, GA, GP, ESN and RBF History ------- I originally created a few python scripts that would allow me running ML experiments using multiple ML libraries at once. Then came the need to run them in parallel and I added some result analysis tools and here we go, I decided to call it a library. The company I was doing research for, RSJ Invest, was kind enough to allow releasing the code to the public, under the Apache licence. After all, we were using other libraries, so it is only natural to give something back to the community Status ------ This is the first public release. From the user point of view, it has some rough edges, but it usable. It was actually used to run quite a few large-scale experiments (gigs of training data). From the developer point of view, there are parts that would deserve a rewrite, especially the dataset pipelines and caching. In the likely case that I will use the lib for future projects, I will keep updating it so that it fits my needs. Should anyone else from the community want to add some features or rewrite some parts, I will add documentation where needed. License ------- Apache License 2.0 - see the LICENSE file Contents -------- python/ - the core python sources and wrappers for various ML models java/ - JAVA sources for the ECJ wrapper examples/ - some examples Installing ---------- There's no python egg installer or a distribution-specific package for Moll yet. So grab the source, sort out the dependencies you need: Core dependencies: Numpy multiprocessing (or python >= 2.6) Specific dependencies: ESN: Aureservoir (tested with SVN rev.60), http://aureservoir.sf.net Genetic (GP / GA): ECJ (tested with v18) http://www.cs.gmu.edu/~eclab/projects/ecj/ MLP nets: ffnet (tested with SVN rev. 272), http://ffnet.sf.net/ Plotting: pylab / pygraphviz Moll was tested on the Linux amd64 (Debian). Directory structure ------------------- * java/ and python/ contain the source files. * examples/ - moll usage examples * analyse/ - result analysis utils * scripts/ - shell scripts for easier invocation of multiple experiments, joining the results etc. Running ------- Make sure you have moll and the needed dependencies in PYTHONPATH. Use moll.data to create a dataset. Probably the easiest is to use MatrixDataset that just wraps an ordinary numpy array say arr'. You can either create and train a model of your choice on that dataset directly: dataset = moll.data.MatrixDataset(arr, ins=2) nn = moll.ml.nn.FFNET((1, 10, 1)) nn.train(dataset, iterations=1000, descent_algo='cg') output = nn.run(dataset.inputs) .. or, have Hustler deal with the dirty stuff. In that case, the main difference is that you're defining the jobs not by actually creating datasets and models, but by supplying the appropriate parameters: kfold = CrossValidator(5, Hustler(), TrainJob) kfold.add_job(MatrixDataset, {'_arr':arr, 'ins':1}, FFNET, {'topology': (1, 10, 1)}, {'iterations':5000, 'descent_algo':'tnc'}) kfold.add_job(MatrixDataset, {'_arr':arr, 'ins':1}, RBF, {'nodes': 15}, {}) hustler.go() As you might have noticed, we've employed the 5-fold cross-validator as well. Thus each job is actually run 5 times on the folded parts of the dataset. Hustler uses all available cores to run the jobs by default. The resulting models and errors are then in hustler.jobs. Voila! Todo's ------ See the TODO file