From: Christian S. <st...@Ma...> - 2005-05-29 15:20:53
|
Hi Michael, sorry for the delay. I'll try and answer some of your questions. 1) I can't base it on hard empirical results, but imho gridsize should be chosen from the number of datapoints only. Dimension of the dataset can be neglected. 2) pca Initialization will work on toroidal maps, too. There won't be a smooth transition, where the grid is flipped, but the algorithm will still work and produce meaningful results. 3) A density based visualization is implemented with the p-matrix. The ESOM algorithm inherently overrepresents sparsely populated regions of the dataspace, i.e. you will find more bestmatches per neuron in dense regions. Rare events will be rather isolated. UMatrix and twoMatch are a good choice for identifying those. regards Christian Michael Dell Junior said: > Dear All, > > > Thanks for your help and your warm welcome to the community. > > There a couple of more issues that came up as a result of your previous > commmunication that I would like to address: > > > 1) Grid size: We have talked about this extensively in relation to the > number of instances present but not in relation to the dimensionality of > the data. Fabian's tutorial on ESOM deals with, I believe, a six > dimensional problem. What has been your experience in dealing with ESOM's > that have an order of magnitude larger dimensionality (e.g 100 > dimensions)? Does the grid size matter in terms of getting robust results? > Is there any scientific literature/papers that deal with the > dimensionality Vs "size of the grid" issue? Your comments will be very > much appreciated. > > > 2) I understand that there is a random initialization of weights that goes > on up front. One of the remedies that has been suggested was to try to > initialize using the -pca command. The assumption here is that the -pca > would produce better comparable maps in different runs. However, there > appears to be a conflict with the usage of the -pca as it appears to deal > with *border only* maps. By reading the ESOM tutorial it becomes evident > that the conflict appears by the utility/usefulness of the border-less > map-which appears to add value- (e.g. toroid) with the utility/usefullnes > of having comparable maps using the -pca command. Do you have any > suggestions/work around on this issue? > > > 3) My last question is a little bit more conceptual in nature. Most of the > literature on ESOM's and the visualization development that accompanies > the literature, focuses on the objective of identifying regions of high > density that will eventually be identified as clusters. > > I am more interested in the topological assortment of instances according > to a degree of similarity provided by the ESOM and not necessarily in the > formation of a high-density cluster (whatever the density definition might > be). So, in U-matrix terms, my interest might lie more on the mountains > and less on the valleys, as the area I am researching deals more with rare > events. However, it often appears that these "rare events" occur in areas > of low density (however that is defined) but at the same time appear to > be *adjacent* to each other on the 2-D map. In U-matrix terms, this would > be a series of mountains on the map that would form a mountain range. This > mountain range might be of immense interest to my research. > > > Would you happen to have any understanding/awareness as to: > > 1) What kind of parameter tuning can best sort out / display these "rare > instances"? Would it be any different from the the current methodology, in > your opinion? > > 2) The availability of any scientific literature that deals with this > issue? > > > > Again, your commnets / help will be extremely appreciated. I hope I will > be able to contribute more *actively* to the list in the future, as my > understanding of the Databionic ESOM tool grows. > > > Regards, > Michael > > > > dat...@li... wrote: > Send Databionic-ESOM-User mailing list submissions to > dat...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user > or, via email, send a message with subject or body 'help' to > dat...@li... > > You can reach the person managing the list at > dat...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Databionic-ESOM-User digest..." > > > Today's Topics: > > 1. Re: Parameter Selection & ...More (Mario Noecker) > 2. Re: Parameter Selection & ...More (Christian Stamm) > 3. Re: Parameter Selection & ...More (=?ISO-8859-1?Q?Fabian_M=F6rchen?=) > > --__--__-- > > Message: 1 > Date: Sat, 30 Apr 2005 17:08:52 +0000 > From: Mario Noecker > To: Michael Dell Junior > CC: dat...@li... > Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More > > Hi Michael, > > only a short answer, because I am in a hurry. > > yes, there is a random component, the grid initialisation. Try the pca > initialisation to get maybe more similar maps. > > mario > > Michael Dell Junior wrote: > >> Dear Mario & Fabian, >> >> >> Thank you very much for your valuable comments and prompt response. I >> will implement your recommendations on an immediate basis. >> >> >> I have one other request pertaining to the training using the >> Databionics ESOM tool. What are the actions one needs to take if with >> two *identical runs* (in terms of parameter selection, and training >> set) obtains a different end result in terms of the >> proximity/clustering of different instances to one another? (I am >> aware that the Visual Part might be different on each run but my >> expectation would be that that the underlying structural sorting of >> the instances should be the same. e.g. In Run 1, Instance "234" is >> surrounded by instances "456", "789" & "123". Shouldn't the same "234" >> instance be surrounded by the same "456", "789" & "123" instances in >> Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >> underlying process that I am not aware of? Is there a random component >> that I am not aware of? >> >> Your comments and suggestion on this issue will be very much >> appreciated. >> >> >> Regards, >> Michael >> >> >> >> >> ------------------------------------------------------------------------ >> Post your free ad now! *Yahoo! Canada Personals* >> > > > > > --__--__-- > > Message: 2 > Date: Sat, 30 Apr 2005 17:15:15 +0200 (CEST) > Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More > From: "Christian Stamm" > To: dat...@li... > > Dear Michael, > > the ESOM - algorithm is indeed non-convergent. Every map you train will be > unique. This is because of the random initialization of the map and the > (optional) permutation of the input data during the training process. The > overall structure of the map will be similar, but inter as will as intra > cluster neighbourhoods may be twisted or sorted in another fashion, > without though beeing less meaningful. e.g. the U-Matrix view on the map > will unveal where large or low distances are present. > > welcome to the user community! > > mfg Christian > > Michael Dell Junior said: >> Dear Mario & Fabian, >> >> >> Thank you very much for your valuable comments and prompt response. I >> will >> implement your recommendations on an immediate basis. >> >> >> I have one other request pertaining to the training using the >> Databionics >> ESOM tool. What are the actions one needs to take if with two *identical >> runs* (in terms of parameter selection, and training set) obtains a >> different end result in terms of the proximity/clustering of different >> instances to one another? (I am aware that the Visual Part might be >> different on each run but my expectation would be that that the >> underlying >> structural sorting of the instances should be the same. e.g. In Run 1, >> Instance "234" is surrounded by instances "456", "789" & "123". >> Shouldn't >> the same "234" instance be surrounded by the same "456", "789" & "123" >> instances in Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >> underlying process that I am not aware of? Is there a random component >> that I am not aware of? >> >> Your comments and suggestion on this issue will be very much >> appreciated. >> >> >> Regards, >> Michael >> >> >> >> >> >> --------------------------------- >> Post your free ad now! Yahoo! Canada Personals >> > > > > > --__--__-- > > Message: 3 > Date: Sat, 30 Apr 2005 19:49:40 +0200 > From: =?ISO-8859-1?Q?Fabian_M=F6rchen?= > > Organization: Databionics Research Group > To: Christian Stamm , > dat...@li... > Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More > > hi, > > the ESOM training has not been proven to converge in the sense k-Means > or more generally EM does, except for simple 1D settings. the > 'convergence' is simulated by the cooling of the parameters, the > neighborhood in particular (a constant learning rate doesn't hurt). > > you can obtain exactly reproducible maps by _not_ using "-p" (permute > data) and "-i pca", as explained by Mario and Christian, but this may > not be what you want. "-p" in particular should always be active if you > data is sorted in a particular order, e.g. by known clusters as for many > of the toy examples. training the map with all data points from one > cluster before introducing points from other regions in the data space > may distort the map towards the first cluster. the pca initialization is > a good way of obtaining visually more comparable maps from several runs. > > if there are clusters in the data, they should show up on different maps > run with the same parameters. the local neighborhood of a few points you > described is usually not expected to be the same over several runs. i > don't know the dimensionality of your data, but assume it is > 3d. > imagine what happens to the 2D grid of ESOM prototypes in the high > dimensional space. it adjusts to the data: many prototypes are placed > where the data resides, while the grid is stretched in the regions > between. the following picture show this for the chainlink datasets: > > http://www.mathematik.uni-marburg.de/~databionics/de//images/chainlink_esom3d.png > > if you have a, say 10 dimensional space and are looking at some data > points inside a densely populated region, there is no way of predicting, > how the 2D ESOM grid will locally adjust to this 10 dimensional cloud, > thus the local neighborhood relations will not show reproducible > behaviour. if your 5 points are relatively far from each other in an > otherwise empty region, they should be represented on different maps in > a similar way, but this is a special case. > > bye > fabian > > Christian Stamm wrote: >> Dear Michael, >> >> the ESOM - algorithm is indeed non-convergent. Every map you train will >> be >> unique. This is because of the random initialization of the map and the >> (optional) permutation of the input data during the training process. >> The >> overall structure of the map will be similar, but inter as will as intra >> cluster neighbourhoods may be twisted or sorted in another fashion, >> without though beeing less meaningful. e.g. the U-Matrix view on the map >> will unveal where large or low distances are present. >> >> welcome to the user community! >> >> mfg Christian >> >> Michael Dell Junior said: >> >>>Dear Mario & Fabian, >>> >>> >>>Thank you very much for your valuable comments and prompt response. I >>> will >>>implement your recommendations on an immediate basis. >>> >>> >>>I have one other request pertaining to the training using the >>> Databionics >>>ESOM tool. What are the actions one needs to take if with two *identical >>>runs* (in terms of parameter selection, and training set) obtains a >>>different end result in terms of the proximity/clustering of different >>>instances to one another? (I am aware that the Visual Part might be >>>different on each run but my expectation would be that that the >>> underlying >>>structural sorting of the instances should be the same. e.g. In Run 1, >>>Instance "234" is surrounded by instances "456", "789" & "123". >>> Shouldn't >>>the same "234" instance be surrounded by the same "456", "789" & "123" >>>instances in Run 2? ) >>> >>> Is this a sign of non-convergence? Is this a sign of some other >>>underlying process that I am not aware of? Is there a random component >>>that I am not aware of? >>> >>>Your comments and suggestion on this issue will be very much >>> appreciated. >>> >>> >>>Regards, >>>Michael >>> >>> >>> >>> >>> >>>--------------------------------- >>>Post your free ad now! Yahoo! Canada Personals >>> >> >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. >> Get your fingers limbered up and give it your best shot. 4 great events, >> 4 >> opportunities to win big! Highest score wins.NEC IT Guy Games. Play to >> win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 >> _______________________________________________ >> Databionic-ESOM-User mailing list >> Dat...@li... >> https://lists.sourceforge.net/lists/listinfo/databionic-esom-user > > > > --__--__-- > > _______________________________________________ > Databionic-ESOM-User mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user > > > End of Databionic-ESOM-User Digest > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > |