From: Michael D. J. <mic...@ya...> - 2005-05-02 21:18:13
|
Dear All, Thanks for your help and your warm welcome to the community. There a couple of more issues that came up as a result of your previous commmunication that I would like to address: 1) Grid size: We have talked about this extensively in relation to the number of instances present but not in relation to the dimensionality of the data. Fabian's tutorial on ESOM deals with, I believe, a six dimensional problem. What has been your experience in dealing with ESOM's that have an order of magnitude larger dimensionality (e.g 100 dimensions)? Does the grid size matter in terms of getting robust results? Is there any scientific literature/papers that deal with the dimensionality Vs "size of the grid" issue? Your comments will be very much appreciated. 2) I understand that there is a random initialization of weights that goes on up front. One of the remedies that has been suggested was to try to initialize using the -pca command. The assumption here is that the -pca would produce better comparable maps in different runs. However, there appears to be a conflict with the usage of the -pca as it appears to deal with *border only* maps. By reading the ESOM tutorial it becomes evident that the conflict appears by the utility/usefulness of the border-less map-which appears to add value- (e.g. toroid) with the utility/usefullnes of having comparable maps using the -pca command. Do you have any suggestions/work around on this issue? 3) My last question is a little bit more conceptual in nature. Most of the literature on ESOM's and the visualization development that accompanies the literature, focuses on the objective of identifying regions of high density that will eventually be identified as clusters. I am more interested in the topological assortment of instances according to a degree of similarity provided by the ESOM and not necessarily in the formation of a high-density cluster (whatever the density definition might be). So, in U-matrix terms, my interest might lie more on the mountains and less on the valleys, as the area I am researching deals more with rare events. However, it often appears that these "rare events" occur in areas of low density (however that is defined) but at the same time appear to be *adjacent* to each other on the 2-D map. In U-matrix terms, this would be a series of mountains on the map that would form a mountain range. This mountain range might be of immense interest to my research. Would you happen to have any understanding/awareness as to: 1) What kind of parameter tuning can best sort out / display these "rare instances"? Would it be any different from the the current methodology, in your opinion? 2) The availability of any scientific literature that deals with this issue? Again, your commnets / help will be extremely appreciated. I hope I will be able to contribute more *actively* to the list in the future, as my understanding of the Databionic ESOM tool grows. Regards, Michael dat...@li... wrote: Send Databionic-ESOM-User mailing list submissions to dat...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/databionic-esom-user or, via email, send a message with subject or body 'help' to dat...@li... You can reach the person managing the list at dat...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Databionic-ESOM-User digest..." Today's Topics: 1. Re: Parameter Selection & ...More (Mario Noecker) 2. Re: Parameter Selection & ...More (Christian Stamm) 3. Re: Parameter Selection & ...More (=?ISO-8859-1?Q?Fabian_M=F6rchen?=) --__--__-- Message: 1 Date: Sat, 30 Apr 2005 17:08:52 +0000 From: Mario Noecker To: Michael Dell Junior CC: dat...@li... Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More Hi Michael, only a short answer, because I am in a hurry. yes, there is a random component, the grid initialisation. Try the pca initialisation to get maybe more similar maps. mario Michael Dell Junior wrote: > Dear Mario & Fabian, > > > Thank you very much for your valuable comments and prompt response. I > will implement your recommendations on an immediate basis. > > > I have one other request pertaining to the training using the > Databionics ESOM tool. What are the actions one needs to take if with > two *identical runs* (in terms of parameter selection, and training > set) obtains a different end result in terms of the > proximity/clustering of different instances to one another? (I am > aware that the Visual Part might be different on each run but my > expectation would be that that the underlying structural sorting of > the instances should be the same. e.g. In Run 1, Instance "234" is > surrounded by instances "456", "789" & "123". Shouldn't the same "234" > instance be surrounded by the same "456", "789" & "123" instances in > Run 2? ) > > Is this a sign of non-convergence? Is this a sign of some other > underlying process that I am not aware of? Is there a random component > that I am not aware of? > > Your comments and suggestion on this issue will be very much appreciated. > > > Regards, > Michael > > > > > ------------------------------------------------------------------------ > Post your free ad now! *Yahoo! Canada Personals* > --__--__-- Message: 2 Date: Sat, 30 Apr 2005 17:15:15 +0200 (CEST) Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More From: "Christian Stamm" To: dat...@li... Dear Michael, the ESOM - algorithm is indeed non-convergent. Every map you train will be unique. This is because of the random initialization of the map and the (optional) permutation of the input data during the training process. The overall structure of the map will be similar, but inter as will as intra cluster neighbourhoods may be twisted or sorted in another fashion, without though beeing less meaningful. e.g. the U-Matrix view on the map will unveal where large or low distances are present. welcome to the user community! mfg Christian Michael Dell Junior said: > Dear Mario & Fabian, > > > Thank you very much for your valuable comments and prompt response. I will > implement your recommendations on an immediate basis. > > > I have one other request pertaining to the training using the Databionics > ESOM tool. What are the actions one needs to take if with two *identical > runs* (in terms of parameter selection, and training set) obtains a > different end result in terms of the proximity/clustering of different > instances to one another? (I am aware that the Visual Part might be > different on each run but my expectation would be that that the underlying > structural sorting of the instances should be the same. e.g. In Run 1, > Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't > the same "234" instance be surrounded by the same "456", "789" & "123" > instances in Run 2? ) > > Is this a sign of non-convergence? Is this a sign of some other > underlying process that I am not aware of? Is there a random component > that I am not aware of? > > Your comments and suggestion on this issue will be very much appreciated. > > > Regards, > Michael > > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > --__--__-- Message: 3 Date: Sat, 30 Apr 2005 19:49:40 +0200 From: =?ISO-8859-1?Q?Fabian_M=F6rchen?= Organization: Databionics Research Group To: Christian Stamm , dat...@li... Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More hi, the ESOM training has not been proven to converge in the sense k-Means or more generally EM does, except for simple 1D settings. the 'convergence' is simulated by the cooling of the parameters, the neighborhood in particular (a constant learning rate doesn't hurt). you can obtain exactly reproducible maps by _not_ using "-p" (permute data) and "-i pca", as explained by Mario and Christian, but this may not be what you want. "-p" in particular should always be active if you data is sorted in a particular order, e.g. by known clusters as for many of the toy examples. training the map with all data points from one cluster before introducing points from other regions in the data space may distort the map towards the first cluster. the pca initialization is a good way of obtaining visually more comparable maps from several runs. if there are clusters in the data, they should show up on different maps run with the same parameters. the local neighborhood of a few points you described is usually not expected to be the same over several runs. i don't know the dimensionality of your data, but assume it is > 3d. imagine what happens to the 2D grid of ESOM prototypes in the high dimensional space. it adjusts to the data: many prototypes are placed where the data resides, while the grid is stretched in the regions between. the following picture show this for the chainlink datasets: http://www.mathematik.uni-marburg.de/~databionics/de//images/chainlink_esom3d.png if you have a, say 10 dimensional space and are looking at some data points inside a densely populated region, there is no way of predicting, how the 2D ESOM grid will locally adjust to this 10 dimensional cloud, thus the local neighborhood relations will not show reproducible behaviour. if your 5 points are relatively far from each other in an otherwise empty region, they should be represented on different maps in a similar way, but this is a special case. bye fabian Christian Stamm wrote: > Dear Michael, > > the ESOM - algorithm is indeed non-convergent. Every map you train will be > unique. This is because of the random initialization of the map and the > (optional) permutation of the input data during the training process. The > overall structure of the map will be similar, but inter as will as intra > cluster neighbourhoods may be twisted or sorted in another fashion, > without though beeing less meaningful. e.g. the U-Matrix view on the map > will unveal where large or low distances are present. > > welcome to the user community! > > mfg Christian > > Michael Dell Junior said: > >>Dear Mario & Fabian, >> >> >>Thank you very much for your valuable comments and prompt response. I will >>implement your recommendations on an immediate basis. >> >> >>I have one other request pertaining to the training using the Databionics >>ESOM tool. What are the actions one needs to take if with two *identical >>runs* (in terms of parameter selection, and training set) obtains a >>different end result in terms of the proximity/clustering of different >>instances to one another? (I am aware that the Visual Part might be >>different on each run but my expectation would be that that the underlying >>structural sorting of the instances should be the same. e.g. In Run 1, >>Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't >>the same "234" instance be surrounded by the same "456", "789" & "123" >>instances in Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >>underlying process that I am not aware of? Is there a random component >>that I am not aware of? >> >>Your comments and suggestion on this issue will be very much appreciated. >> >> >>Regards, >>Michael >> >> >> >> >> >>--------------------------------- >>Post your free ad now! Yahoo! Canada Personals >> > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. > Get your fingers limbered up and give it your best shot. 4 great events, 4 > opportunities to win big! Highest score wins.NEC IT Guy Games. Play to > win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 > _______________________________________________ > Databionic-ESOM-User mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user --__--__-- _______________________________________________ Databionic-ESOM-User mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/databionic-esom-user End of Databionic-ESOM-User Digest --------------------------------- Post your free ad now! Yahoo! Canada Personals |