| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| README.libesom.txt | 2011-09-16 | 3.2 kB | |
| libesom1_1.0_amd64.deb | 2011-09-15 | 16.0 kB | |
| libesom-dev_1.0_amd64.deb | 2011-09-15 | 19.6 kB | |
| libesom-1.0.zip | 2011-09-15 | 40.3 kB | |
| Totals: 4 Items | 79.1 kB | 0 |
API:
- Basic example:
-------------------------------------------------------------------
esom::ToroidGrid grid(150, 200, subset.dimension());
const int epochs = 10;
esom::distance::Correlation distance;
esom::bestmatch::Linear bestmatch(distance);
esom::neighbourhood::Gauss neighbourhood;
esom::cooling::Linear cooling(.5, .1, epochs);
esom::cooling::Linear radiusCooling(50, 1, epochs);
esom::OnlineSOM som(grid, bestmatch, neighbourhood, cooling, radiusCooling);
// srand(time(NULL));
som.init();
for(int i = 0; i < epochs; i++) {
som.train(norm_subset.data);
som.endEpoch();
}
std::vector<int> bestmatches;
for(unsigned int i=0; i < subset.inputs(); i++) {
esom::Vector ve = norm_subset.data(i);
bestmatches.push_back(bestmatch(ve));
}
esom::UMatrix um(distance, grid);
um.calculate();
esom::Watershed ws(um);
esom::LabelTree tree = ws.tree(bestmatches);
-------------------------------------------------------------------
- Data types esom::Vector and esom::Matrix are defined in esom/Data.h
Both are glorified pointers to memory spaces containing doubles.
- Right now the OnlineSOM class stores references to all its arguments,
so make sure not to deallocate them while still using the OnlineSOM object.
- init() initializes the grid with random values between 0 and 2 by calling
rand()
Suggestions for improvement:
- Remove the arbitrary initialization of the grid to [0 2], instead add an
interface to esom::Grid for modifying models and use that for initializing
the Grid. Provide some basic initialization functions.
- Remove the som.init() function if nothing needs initializing anymore (also
see the comment below regarding bestmatch functions)
- The API could use some improvements and streamlining regarding memory
allocation policy. Storing references to the distance, cooling etc. functions
in the SOM object is a bad idea, since it can lead to non-obvious crashes.
For example:
OnlineSOM *som;
{
EuclideanDistanceFunction distance;
som = new SOM(grid, distance, ...);
}
som->train(sample);
crashes, since the distance object is destroyed when it goes out of scope,
but the som object still accesses a reference to it. The API should be
modified so copies of small objects are stored and pointers used for larger
ones.
- Make bestmatch functions take a Grid& argument for the operator() method so
they can be copied around. The current API was conceived with the possibility
for SAT trees in mind that need to keep a persistent map of the dataspace.
It is overly complex for the common case of a stateless function.
- in Watershed::tree(), make the LabelTree an output argument instead of a
return value. Prevents unnecessary copying when the tree is going to be
stored in a struct somewhere else in the program anyway.
- also in Watershed, right now when multiple basins are flooded simultaneously,
they are saved into the binary LabelTree in an arbitrary hierarchy. Use n-ary
trees to improve the accuracy of the cluster representation.
Notes:
- I tried using Atlas BLAS to improve performance in loop where we try to find
the best matching vector, there were no gains.