Name | Modified | Size | Downloads / Week |
---|---|---|---|
README.libesom.txt | 2011-09-16 | 3.2 kB | |
libesom1_1.0_amd64.deb | 2011-09-15 | 16.0 kB | |
libesom-dev_1.0_amd64.deb | 2011-09-15 | 19.6 kB | |
libesom-1.0.zip | 2011-09-15 | 40.3 kB | |
Totals: 4 Items | 79.1 kB | 0 |
API: - Basic example: ------------------------------------------------------------------- esom::ToroidGrid grid(150, 200, subset.dimension()); const int epochs = 10; esom::distance::Correlation distance; esom::bestmatch::Linear bestmatch(distance); esom::neighbourhood::Gauss neighbourhood; esom::cooling::Linear cooling(.5, .1, epochs); esom::cooling::Linear radiusCooling(50, 1, epochs); esom::OnlineSOM som(grid, bestmatch, neighbourhood, cooling, radiusCooling); // srand(time(NULL)); som.init(); for(int i = 0; i < epochs; i++) { som.train(norm_subset.data); som.endEpoch(); } std::vector<int> bestmatches; for(unsigned int i=0; i < subset.inputs(); i++) { esom::Vector ve = norm_subset.data(i); bestmatches.push_back(bestmatch(ve)); } esom::UMatrix um(distance, grid); um.calculate(); esom::Watershed ws(um); esom::LabelTree tree = ws.tree(bestmatches); ------------------------------------------------------------------- - Data types esom::Vector and esom::Matrix are defined in esom/Data.h Both are glorified pointers to memory spaces containing doubles. - Right now the OnlineSOM class stores references to all its arguments, so make sure not to deallocate them while still using the OnlineSOM object. - init() initializes the grid with random values between 0 and 2 by calling rand() Suggestions for improvement: - Remove the arbitrary initialization of the grid to [0 2], instead add an interface to esom::Grid for modifying models and use that for initializing the Grid. Provide some basic initialization functions. - Remove the som.init() function if nothing needs initializing anymore (also see the comment below regarding bestmatch functions) - The API could use some improvements and streamlining regarding memory allocation policy. Storing references to the distance, cooling etc. functions in the SOM object is a bad idea, since it can lead to non-obvious crashes. For example: OnlineSOM *som; { EuclideanDistanceFunction distance; som = new SOM(grid, distance, ...); } som->train(sample); crashes, since the distance object is destroyed when it goes out of scope, but the som object still accesses a reference to it. The API should be modified so copies of small objects are stored and pointers used for larger ones. - Make bestmatch functions take a Grid& argument for the operator() method so they can be copied around. The current API was conceived with the possibility for SAT trees in mind that need to keep a persistent map of the dataspace. It is overly complex for the common case of a stateless function. - in Watershed::tree(), make the LabelTree an output argument instead of a return value. Prevents unnecessary copying when the tree is going to be stored in a struct somewhere else in the program anyway. - also in Watershed, right now when multiple basins are flooded simultaneously, they are saved into the binary LabelTree in an arbitrary hierarchy. Use n-ary trees to improve the accuracy of the cluster representation. Notes: - I tried using Atlas BLAS to improve performance in loop where we try to find the best matching vector, there were no gains.