From:
<fa...@in...> - 2005-04-30 12:42:44
|
Hi Michael, let me just add a few comments to what Mario said. - you don't neccessarily need more neurons than data points. it all depends on the view on your data that you want to have. first off, ESOM need to be large, otherwise they would be k-Means SOM where one neuron is one cluster. Large starts somewhere > 1000, we hardly ever go below 50x82=4100 (~4096=64x64 what we used before we found out about rectangular maps). with small dataset you will have enough room on the map to see cluster structure (if there is any) _and_ inner cluster relations. the more data you have, the less room there will be and data points are placed on top of each other. you will still see the global structure, but less details. enlargening the map will help, but slow down the training of course. i would always start with the default size and go larger if it seems neccessary based on the result. i have also successfully used sampling on a large (30K) dataset. i trained 50x82 maps on a 3K sample, identified clusters with the class mask tool, and used the classification mode to transfer the result to the complete data. - start radius: like mario said about half the smaller grid size. a too large value will 'waste' the early training episodes, because almost the whole map will be pulled back and forth by the updates. what you want is to have part of the map pulled towards a cluster by the updates of the corresponding data points and other parts towards other clusters. if you start with too small a radius, there is a danger of 'loosing' neurons that will nevery be pulled anywhere and keep their random values from the initialization. and even though you didn't ask, but someone else might soon: - end radius: small (=1) if you want a lot of detail, a little larger to concentrate on coarser sturctures. - episodes: the number of training episodes isn't closely related to the choice of the other parameters. in some publications several thousand training episodes are mentioned. this is a complete waste of computing power. somewhere between 20 and 50 should provide a slow enough cooling of the parameters. the toy examples like hepta are a good starting point to explore the behaviour of ESOM given different parameter settings. you do have to make some extreme choices, however, to really not make it work. we consider it fairly robust w.r.t. the parameters from our long year experiences. please note the technichal report, that covers some of the above questions, but with a less hands-on tough: [Ultsch 2005b] Ultsch, A., Moerchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM, Technical Report Dept. of Mathematics and Computer Science, University of Marburg, Germany, No. 46, (2005) http://www.mathematik.uni-marburg.de/~databionics/downloads/papers/ultsch05esom.pdf bye fabian p.s. Michael did reply to the list, only a sf filter sent it to the list admin (me) for approval first. |