Home
Name Modified Size InfoDownloads / Week
README.libesom.txt 2011-09-16 3.2 kB
libesom1_1.0_amd64.deb 2011-09-15 16.0 kB
libesom-dev_1.0_amd64.deb 2011-09-15 19.6 kB
libesom-1.0.zip 2011-09-15 40.3 kB
Totals: 4 Items   79.1 kB 0
API:
- Basic example:
-------------------------------------------------------------------
  esom::ToroidGrid grid(150, 200, subset.dimension());
  const int epochs = 10;
  esom::distance::Correlation distance;
  esom::bestmatch::Linear bestmatch(distance);
  esom::neighbourhood::Gauss neighbourhood;
  esom::cooling::Linear cooling(.5, .1, epochs);
  esom::cooling::Linear radiusCooling(50, 1, epochs);
  esom::OnlineSOM som(grid, bestmatch, neighbourhood, cooling, radiusCooling);

//  srand(time(NULL));

  som.init();  
  for(int i = 0; i < epochs; i++) {
    som.train(norm_subset.data);
    som.endEpoch();
  }

  std::vector<int> bestmatches;
  for(unsigned int i=0; i < subset.inputs(); i++) {
    esom::Vector ve = norm_subset.data(i);
    bestmatches.push_back(bestmatch(ve));
  }

  esom::UMatrix um(distance, grid);
  um.calculate();
  esom::Watershed ws(um);
  esom::LabelTree tree = ws.tree(bestmatches);
-------------------------------------------------------------------

- Data types esom::Vector and esom::Matrix are defined in esom/Data.h
  Both are glorified pointers to memory spaces containing doubles.

- Right now the OnlineSOM class stores references to all its arguments,
  so make sure not to deallocate them while still using the OnlineSOM object.

- init() initializes the grid with random values between 0 and 2 by calling 
  rand()

Suggestions for improvement:
- Remove the arbitrary initialization of the grid to [0 2], instead add an
  interface to esom::Grid for modifying models and use that for initializing 
  the Grid. Provide some basic initialization functions.
- Remove the som.init() function if nothing needs initializing anymore (also
  see the comment below regarding bestmatch functions)
- The API could use some improvements and streamlining regarding memory 
  allocation policy. Storing references to the distance, cooling etc. functions
  in the SOM object is a bad idea, since it can lead to non-obvious crashes.
  For example:

  OnlineSOM *som;
  {
	EuclideanDistanceFunction distance;
	som = new SOM(grid, distance, ...);
  }
  som->train(sample);

  crashes, since the distance object is destroyed when it goes out of scope,
  but the som object still accesses a reference to it. The API should be 
  modified so copies of small objects are stored and pointers used for larger
  ones.
- Make bestmatch functions take a Grid& argument for the operator() method so
  they can be copied around. The current API was conceived with the possibility
  for SAT trees in mind that need to keep a persistent map of the dataspace.
  It is overly complex for the common case of a stateless function.

- in Watershed::tree(), make the LabelTree an output argument instead of a 
  return value. Prevents unnecessary copying when the tree is going to be 
  stored in a struct somewhere else in the program anyway.
- also in Watershed, right now when multiple basins are flooded simultaneously,
  they are saved into the binary LabelTree in an arbitrary hierarchy. Use n-ary
  trees to improve the accuracy of the cluster representation.

Notes:
- I tried using Atlas BLAS to improve performance in loop where we try to find
  the best matching vector, there were no gains.
Source: README.libesom.txt, updated 2011-09-16