You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
(11) |
May
(6) |
Jun
(9) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(7) |
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(1) |
Feb
(2) |
Mar
(3) |
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(4) |
Jul
(4) |
Aug
(13) |
Sep
|
Oct
(14) |
Nov
|
Dec
(3) |
2008 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(5) |
Jun
(1) |
Jul
(11) |
Aug
(3) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(3) |
2009 |
Jan
|
Feb
(2) |
Mar
|
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
(1) |
2010 |
Jan
(4) |
Feb
(1) |
Mar
(4) |
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <mit...@we...> - 2005-06-19 18:44:18
|
Dear Mario, I followed the whole procedure. Used one train data set (about 1000 records) apply z-transformation trained the data load the cls file created the class mask .cls file loaded the train data (about 2250 records) applied z-transformation project the data (while project was running it appeared for every row bm==-1 I don't know if this is relevant for the problem I faced later) and when I selected from the menu tools--> classify no cls file was created error creating cls file what did I do wrong??? Should the test data set have the same size (in records) as the train data set? Can I use this tool for really big files about 60.000 records (10 features)? or is it to slow? thank you in advance it is really imporant for me to get the classification of the test data. Katerina > Hi Katerina > >> ok but does the training test data have to include a column with the >> class of each sample or it is the same with having a separate .cls >> file. (in some example .lrn files there is a column for the class of >> the data) Is it necessary one of the fields of the .lrn file to be >> labeled as unique key (9)? > > if a *.cls file exists you do not have to create a column in you lrn > file. But you need a column labeled with 9. This column has to contain > the unique keys of the datasets. > >> - optional: load *.cls with known classification of training data >> the *.cls file should be loaded before or after the training process. >> (I suppose before) >> It is loaded from the tab classes or class mask? > > classes! the classes tab shows the classification of the data. the > classmask tab shows the classification of the neurons. > > >> >>> - identify clusters and create class mask (also *.cls) >> how do I identify clusters?? and create class mask? >> Do I use the classify selection from the tools menu? >> for some reason it doesn't seem to work although I press the start >> button the procedure thoes not start and no output .cls file is >> created. > > just enable the classmask tab and select regions on the map with the > polygon. a right mouse click will finish the selection and one class > will be created and shown in the tab. and so on... > > you can save this classmask by saving as *.cls. (button in tab) > > There are 2 kinds of *.cls: 1. classification of data points(classes) > and 2. classification of neurons(classmask) > >>> - save newly created *.cls for test data >> How do I create the new *.cls data > > maybe the user guide is not up to date. you can project new data on the > map. (Tools - Project...) A loaded *.lrn file will be projected on the > map and a *.bm file will be created. A *.bm file holds the information > of the datasets positions on the map. > > You can also classify the loaded bestmatches by using a classmask, which > has been created before.(Tools - Classify..) Every bm (from the loaded > *.bm) looks in the classmask for his class number and that will be > written in the new classification. After that you can save the > classification (file menu). > > > hope that helps > mario |
From: <fa...@in...> - 2005-06-01 14:00:41
|
mit...@we... wrote: > Dear Mr. Mörchen and Mr. Efthymiou , > I really appreciate your response which did help me understand and make > clear some things but I still have some questions and problems with ESOM. > > I tried to follow the steps of your guidelines but I have some questions. > > - create two seperate *.lrn for training and test data > > ok but does the training test data have to include a column with the class > of each sample or it is the same with having a separate .cls file. (in > some example .lrn files there is a column for the class of the data) the training should not be performed using class information, it is an unsupervised process, after all. creating a class mask for the esom can be done without any prior classification. having a *.cls file for the training data comes in handy when you are actually searching for clusters corresponding or at least similar to some known classes. displaying these known classes as the best match color might help you in finding a better class mask corresponding to your ground truth. if the known classes do not correspond to the bm positions and the distance structures at all, they might not be related to the data vectors. > Is it necessary one of the fields of the .lrn file to be labeled as unique > key (9)? we highly recommend using a unique key column (9) in the *.lrn files, because this key will be used to map bestmatches to data vectors and classification labels. the class column in *.lrn files is deprecated and should not be used. > > -train ESOM with training data > ok > > - optional: load *.cls with known classification of training data > the *.cls file should be loaded before or after the training process. (I > suppose before) > It is loaded from the tab classes or class mask? no, afterwards as explained above. the classes tab. >>- identify clusters and create class mask (also *.cls) > > how do I identify clusters?? and create class mask? > Do I use the classify selection from the tools menu? > for some reason it doesn't seem to work although I press the start button > the procedure thoes not start and no output .cls file is created. you can identify distance based clusters with the U-Matrix display. depending on the color gradient used (e.g. gray) dark regions correspond to valleys, i.e. core cluster regions and light regoins correspond to mountains, i.e. cluster boundaries. the class masks are created with the 3rd toolbar button. only if bestmatches and a classmask are loaded you can classify the bestmatches according to their positions and possible masks in that area. the result will be displayed in the classes tab. > - load *.lrn with test data > ok > >>- project this data on ESOM > > ok > >>- save newly created *.cls for test data > > How do I create the new *.cls data see above. > >>- optional: analyze *.cls for test data, e.g. compare to *.cls with >>known classification of test data. >> >> we offer no tools for the last step which is rather easy however. i >> could post some matlab code, if you wish. > Yes please send me the matlab code (e-mail mit...@un...) % using loadcls that ships with the ESOM tools true_cls = loadcls('true.cls'); esom_cls = loadcls('esom.cls'); % check if key columns are the same if sum(true_cls(:,1)~=esom_cls(:,1))>0 error('key columns do not match'); end % accuracy acc = sum(true_cls(:,2)==esom_cls(:,2))/size(true_cls,1); % contingency table (using stats toolbox) ct = crosstab(true_cls(:,2),esom_cls(:,2)); bye fabian |
From: <mit...@we...> - 2005-06-01 13:16:15
|
Dear Mr. Mörchen and Mr. Efthymiou , I really appreciate your response which did help me understand and make clear some things but I still have some questions and problems with ESOM. I tried to follow the steps of your guidelines but I have some questions. - create two seperate *.lrn for training and test data ok but does the training test data have to include a column with the class of each sample or it is the same with having a separate .cls file. (in some example .lrn files there is a column for the class of the data) Is it necessary one of the fields of the .lrn file to be labeled as unique key (9)? -train ESOM with training data ok - optional: load *.cls with known classification of training data the *.cls file should be loaded before or after the training process. (I suppose before) It is loaded from the tab classes or class mask? > - identify clusters and create class mask (also *.cls) how do I identify clusters?? and create class mask? Do I use the classify selection from the tools menu? for some reason it doesn't seem to work although I press the start button the procedure thoes not start and no output .cls file is created. - load *.lrn with test data ok > - project this data on ESOM ok > - save newly created *.cls for test data How do I create the new *.cls data > - optional: analyze *.cls for test data, e.g. compare to *.cls with > known classification of test data. I would really appreciate it if you could give me answer in my questions. It very very important for me. I really admire your work. Looking forward for your response and your future work. Please reply to me to this e-mail address mit...@un... Not the address that I am currently send you this e-mail, because I wont receive it. Thank you in advance. Katerina >> @fabian could you please set the reply-to headers for the list? > > done > >> Katerina Mitrokotsa wrote: >> >>>I have recently tried to use ESOm and although I have found it really >>> interesting I can't understand if there is a way to inspect neuron >>> values Does this tool permit us to see which samples of data >>> correspond to which neuron? >> >> >> You can select the samples in the Data tab at the bottom, which are >> then highlighted. > > addition: you can also select neurons in the map with the data mouse > (activate the leftmost icon in the toolbar). the data points assigned to > these neurons will be displayed in the data tab at the bottom. you can > also load a *.names file with text labels for the data points. these > will be displayed in the last columns of the data table. > >>>Furthermore which is the procedure in order to use a dataset for >>> training and then another dataset for testing. >> >> To do this you have to add classmasks to your ESOM and then use the >> Project tool to see if the test set is projected into the correct >> classes. The prosses isn't automated as far as i know (fabian?) > > creating the class masks is manual (in cvs there is some semi-automated > support with flood filling already). projection is automated and can be > run via the menu or the command line. short summary: > > - create two seperate *.lrn for training and test data > - train ESOM with training data > - optional: load *.cls with known classification of training data > - identify clusters and create class mask (also *.cls) > - load *.lrn with test data > - project this data on ESOM > - save newly created *.cls for test data > - optional: analyze *.cls for test data, e.g. compare to *.cls with > known classification of test data. > > we offer no tools for the last step which is rather easy however. i > could post some matlab code, if you wish. > > bye > fabian > > > ------------------------------------------------------- > This SF.Net email is sponsored by Yahoo. > Introducing Yahoo! Search Developer Network - Create apps using Yahoo! > Search APIs Find out how you can build Yahoo! directly into your own > Applications - visit > http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 > _______________________________________________ > Databionic-ESOM-User mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user |
From: Christian S. <st...@Ma...> - 2005-05-29 15:20:53
|
Hi Michael, sorry for the delay. I'll try and answer some of your questions. 1) I can't base it on hard empirical results, but imho gridsize should be chosen from the number of datapoints only. Dimension of the dataset can be neglected. 2) pca Initialization will work on toroidal maps, too. There won't be a smooth transition, where the grid is flipped, but the algorithm will still work and produce meaningful results. 3) A density based visualization is implemented with the p-matrix. The ESOM algorithm inherently overrepresents sparsely populated regions of the dataspace, i.e. you will find more bestmatches per neuron in dense regions. Rare events will be rather isolated. UMatrix and twoMatch are a good choice for identifying those. regards Christian Michael Dell Junior said: > Dear All, > > > Thanks for your help and your warm welcome to the community. > > There a couple of more issues that came up as a result of your previous > commmunication that I would like to address: > > > 1) Grid size: We have talked about this extensively in relation to the > number of instances present but not in relation to the dimensionality of > the data. Fabian's tutorial on ESOM deals with, I believe, a six > dimensional problem. What has been your experience in dealing with ESOM's > that have an order of magnitude larger dimensionality (e.g 100 > dimensions)? Does the grid size matter in terms of getting robust results? > Is there any scientific literature/papers that deal with the > dimensionality Vs "size of the grid" issue? Your comments will be very > much appreciated. > > > 2) I understand that there is a random initialization of weights that goes > on up front. One of the remedies that has been suggested was to try to > initialize using the -pca command. The assumption here is that the -pca > would produce better comparable maps in different runs. However, there > appears to be a conflict with the usage of the -pca as it appears to deal > with *border only* maps. By reading the ESOM tutorial it becomes evident > that the conflict appears by the utility/usefulness of the border-less > map-which appears to add value- (e.g. toroid) with the utility/usefullnes > of having comparable maps using the -pca command. Do you have any > suggestions/work around on this issue? > > > 3) My last question is a little bit more conceptual in nature. Most of the > literature on ESOM's and the visualization development that accompanies > the literature, focuses on the objective of identifying regions of high > density that will eventually be identified as clusters. > > I am more interested in the topological assortment of instances according > to a degree of similarity provided by the ESOM and not necessarily in the > formation of a high-density cluster (whatever the density definition might > be). So, in U-matrix terms, my interest might lie more on the mountains > and less on the valleys, as the area I am researching deals more with rare > events. However, it often appears that these "rare events" occur in areas > of low density (however that is defined) but at the same time appear to > be *adjacent* to each other on the 2-D map. In U-matrix terms, this would > be a series of mountains on the map that would form a mountain range. This > mountain range might be of immense interest to my research. > > > Would you happen to have any understanding/awareness as to: > > 1) What kind of parameter tuning can best sort out / display these "rare > instances"? Would it be any different from the the current methodology, in > your opinion? > > 2) The availability of any scientific literature that deals with this > issue? > > > > Again, your commnets / help will be extremely appreciated. I hope I will > be able to contribute more *actively* to the list in the future, as my > understanding of the Databionic ESOM tool grows. > > > Regards, > Michael > > > > dat...@li... wrote: > Send Databionic-ESOM-User mailing list submissions to > dat...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user > or, via email, send a message with subject or body 'help' to > dat...@li... > > You can reach the person managing the list at > dat...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Databionic-ESOM-User digest..." > > > Today's Topics: > > 1. Re: Parameter Selection & ...More (Mario Noecker) > 2. Re: Parameter Selection & ...More (Christian Stamm) > 3. Re: Parameter Selection & ...More (=?ISO-8859-1?Q?Fabian_M=F6rchen?=) > > --__--__-- > > Message: 1 > Date: Sat, 30 Apr 2005 17:08:52 +0000 > From: Mario Noecker > To: Michael Dell Junior > CC: dat...@li... > Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More > > Hi Michael, > > only a short answer, because I am in a hurry. > > yes, there is a random component, the grid initialisation. Try the pca > initialisation to get maybe more similar maps. > > mario > > Michael Dell Junior wrote: > >> Dear Mario & Fabian, >> >> >> Thank you very much for your valuable comments and prompt response. I >> will implement your recommendations on an immediate basis. >> >> >> I have one other request pertaining to the training using the >> Databionics ESOM tool. What are the actions one needs to take if with >> two *identical runs* (in terms of parameter selection, and training >> set) obtains a different end result in terms of the >> proximity/clustering of different instances to one another? (I am >> aware that the Visual Part might be different on each run but my >> expectation would be that that the underlying structural sorting of >> the instances should be the same. e.g. In Run 1, Instance "234" is >> surrounded by instances "456", "789" & "123". Shouldn't the same "234" >> instance be surrounded by the same "456", "789" & "123" instances in >> Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >> underlying process that I am not aware of? Is there a random component >> that I am not aware of? >> >> Your comments and suggestion on this issue will be very much >> appreciated. >> >> >> Regards, >> Michael >> >> >> >> >> ------------------------------------------------------------------------ >> Post your free ad now! *Yahoo! Canada Personals* >> > > > > > --__--__-- > > Message: 2 > Date: Sat, 30 Apr 2005 17:15:15 +0200 (CEST) > Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More > From: "Christian Stamm" > To: dat...@li... > > Dear Michael, > > the ESOM - algorithm is indeed non-convergent. Every map you train will be > unique. This is because of the random initialization of the map and the > (optional) permutation of the input data during the training process. The > overall structure of the map will be similar, but inter as will as intra > cluster neighbourhoods may be twisted or sorted in another fashion, > without though beeing less meaningful. e.g. the U-Matrix view on the map > will unveal where large or low distances are present. > > welcome to the user community! > > mfg Christian > > Michael Dell Junior said: >> Dear Mario & Fabian, >> >> >> Thank you very much for your valuable comments and prompt response. I >> will >> implement your recommendations on an immediate basis. >> >> >> I have one other request pertaining to the training using the >> Databionics >> ESOM tool. What are the actions one needs to take if with two *identical >> runs* (in terms of parameter selection, and training set) obtains a >> different end result in terms of the proximity/clustering of different >> instances to one another? (I am aware that the Visual Part might be >> different on each run but my expectation would be that that the >> underlying >> structural sorting of the instances should be the same. e.g. In Run 1, >> Instance "234" is surrounded by instances "456", "789" & "123". >> Shouldn't >> the same "234" instance be surrounded by the same "456", "789" & "123" >> instances in Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >> underlying process that I am not aware of? Is there a random component >> that I am not aware of? >> >> Your comments and suggestion on this issue will be very much >> appreciated. >> >> >> Regards, >> Michael >> >> >> >> >> >> --------------------------------- >> Post your free ad now! Yahoo! Canada Personals >> > > > > > --__--__-- > > Message: 3 > Date: Sat, 30 Apr 2005 19:49:40 +0200 > From: =?ISO-8859-1?Q?Fabian_M=F6rchen?= > > Organization: Databionics Research Group > To: Christian Stamm , > dat...@li... > Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More > > hi, > > the ESOM training has not been proven to converge in the sense k-Means > or more generally EM does, except for simple 1D settings. the > 'convergence' is simulated by the cooling of the parameters, the > neighborhood in particular (a constant learning rate doesn't hurt). > > you can obtain exactly reproducible maps by _not_ using "-p" (permute > data) and "-i pca", as explained by Mario and Christian, but this may > not be what you want. "-p" in particular should always be active if you > data is sorted in a particular order, e.g. by known clusters as for many > of the toy examples. training the map with all data points from one > cluster before introducing points from other regions in the data space > may distort the map towards the first cluster. the pca initialization is > a good way of obtaining visually more comparable maps from several runs. > > if there are clusters in the data, they should show up on different maps > run with the same parameters. the local neighborhood of a few points you > described is usually not expected to be the same over several runs. i > don't know the dimensionality of your data, but assume it is > 3d. > imagine what happens to the 2D grid of ESOM prototypes in the high > dimensional space. it adjusts to the data: many prototypes are placed > where the data resides, while the grid is stretched in the regions > between. the following picture show this for the chainlink datasets: > > http://www.mathematik.uni-marburg.de/~databionics/de//images/chainlink_esom3d.png > > if you have a, say 10 dimensional space and are looking at some data > points inside a densely populated region, there is no way of predicting, > how the 2D ESOM grid will locally adjust to this 10 dimensional cloud, > thus the local neighborhood relations will not show reproducible > behaviour. if your 5 points are relatively far from each other in an > otherwise empty region, they should be represented on different maps in > a similar way, but this is a special case. > > bye > fabian > > Christian Stamm wrote: >> Dear Michael, >> >> the ESOM - algorithm is indeed non-convergent. Every map you train will >> be >> unique. This is because of the random initialization of the map and the >> (optional) permutation of the input data during the training process. >> The >> overall structure of the map will be similar, but inter as will as intra >> cluster neighbourhoods may be twisted or sorted in another fashion, >> without though beeing less meaningful. e.g. the U-Matrix view on the map >> will unveal where large or low distances are present. >> >> welcome to the user community! >> >> mfg Christian >> >> Michael Dell Junior said: >> >>>Dear Mario & Fabian, >>> >>> >>>Thank you very much for your valuable comments and prompt response. I >>> will >>>implement your recommendations on an immediate basis. >>> >>> >>>I have one other request pertaining to the training using the >>> Databionics >>>ESOM tool. What are the actions one needs to take if with two *identical >>>runs* (in terms of parameter selection, and training set) obtains a >>>different end result in terms of the proximity/clustering of different >>>instances to one another? (I am aware that the Visual Part might be >>>different on each run but my expectation would be that that the >>> underlying >>>structural sorting of the instances should be the same. e.g. In Run 1, >>>Instance "234" is surrounded by instances "456", "789" & "123". >>> Shouldn't >>>the same "234" instance be surrounded by the same "456", "789" & "123" >>>instances in Run 2? ) >>> >>> Is this a sign of non-convergence? Is this a sign of some other >>>underlying process that I am not aware of? Is there a random component >>>that I am not aware of? >>> >>>Your comments and suggestion on this issue will be very much >>> appreciated. >>> >>> >>>Regards, >>>Michael >>> >>> >>> >>> >>> >>>--------------------------------- >>>Post your free ad now! Yahoo! Canada Personals >>> >> >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. >> Get your fingers limbered up and give it your best shot. 4 great events, >> 4 >> opportunities to win big! Highest score wins.NEC IT Guy Games. Play to >> win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 >> _______________________________________________ >> Databionic-ESOM-User mailing list >> Dat...@li... >> https://lists.sourceforge.net/lists/listinfo/databionic-esom-user > > > > --__--__-- > > _______________________________________________ > Databionic-ESOM-User mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user > > > End of Databionic-ESOM-User Digest > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > |
From:
<fa...@in...> - 2005-05-27 15:25:11
|
> @fabian could you please set the reply-to headers for the list? done > Katerina Mitrokotsa wrote: > >>I have recently tried to use ESOm and although I have found it really >>interesting I can't understand if there is a way to inspect neuron values >>Does this tool permit us to see which samples of data correspond to >>which neuron? > > > You can select the samples in the Data tab at the bottom, which are then > highlighted. addition: you can also select neurons in the map with the data mouse (activate the leftmost icon in the toolbar). the data points assigned to these neurons will be displayed in the data tab at the bottom. you can also load a *.names file with text labels for the data points. these will be displayed in the last columns of the data table. >>Furthermore which is the procedure in order to use a dataset for >>training and then another dataset for testing. > > To do this you have to add classmasks to your ESOM and then use the > Project tool to see if the test set is projected into the correct > classes. The prosses isn't automated as far as i know (fabian?) creating the class masks is manual (in cvs there is some semi-automated support with flood filling already). projection is automated and can be run via the menu or the command line. short summary: - create two seperate *.lrn for training and test data - train ESOM with training data - optional: load *.cls with known classification of training data - identify clusters and create class mask (also *.cls) - load *.lrn with test data - project this data on ESOM - save newly created *.cls for test data - optional: analyze *.cls for test data, e.g. compare to *.cls with known classification of test data. we offer no tools for the last step which is rather easy however. i could post some matlab code, if you wish. bye fabian |
From: Niko E. <ne...@Ma...> - 2005-05-27 14:41:01
|
@fabian could you please set the reply-to headers for the list? -- Niko Efthymiou Tel: 06421/898565 Geschw.-Scholl-Str.11a www: www.mathematik.uni-marburg.de/~nefthy 35039 Marburg pgp-key: 0xE6BF2487 @ www.keyserver.net |
From: Katerina M. <mit...@un...> - 2005-05-27 13:15:42
|
I have recently tried to use ESOm and although I have found it really interesting I can't understand if there is a way to inspect neuron values Does this tool permit us to see which samples of data correspond to which neuron? Furthermore which is the procedure in order to use a dataset for training and then another dataset for testing. This information is very valuable for me, looking foprward for your response. Thank you in advance. Katerina Mitrokotsa |
From:
<fa...@in...> - 2005-05-04 16:57:48
|
forwarded from Alfred Ultsch: > From the cases we have seen such data are typically placed within a little funnel- like structure if there are more than one in close neighborhood. On a P-Matrix the densities would be rather low, however on a U-Matrix they would lie in a valley. So I would try the standard workhorse ESOM with it. < > 3) My last question is a little bit more conceptual in nature. Most of > the literature on ESOM's and the visualization development that > accompanies the literature, focuses on the objective of identifying > regions of high density that will eventually be identified as clusters. > > I am more interested in the topological assortment of instances > according to a degree of similarity provided by the ESOM and not > necessarily in the formation of a high-density cluster (whatever the > density definition might be). So, in U-matrix terms, my interest might > lie more on the mountains and less on the valleys, as the area I am > researching deals more with rare events. However, it often appears that > these "rare events" occur in areas of low density (however that is > defined) but at the same time appear to be *adjacent* to each other on > the 2-D map. In U-matrix terms, this would be a series of mountains on > the map that would form a mountain range. This mountain range might be > of immense interest to my research. > > > Would you happen to have any understanding/awareness as to: > > 1) What kind of parameter tuning can best sort out / display these "rare > instances"? Would it be any different from the the current methodology, > in your opinion? > > 2) The availability of any scientific literature that deals with this issue? not really. i know of a paper using SOM for novelty detection [1], however. SOM are trained with normal data and novelties are recognized by the projection error. is this what you are looking for? we have not implemented displays for projection errors, but it would be rather straight forward to do so. bye fabian [1] A. Ypma and R.P.W. Duin, Novelty detection using Self-Organizing Maps, rogress in Connectionist-Based Information Systems - Proceedings of ICONIP97, Dunedin (New-Zealand), 1997 |
From: Michael D. J. <mic...@ya...> - 2005-05-02 21:18:13
|
Dear All, Thanks for your help and your warm welcome to the community. There a couple of more issues that came up as a result of your previous commmunication that I would like to address: 1) Grid size: We have talked about this extensively in relation to the number of instances present but not in relation to the dimensionality of the data. Fabian's tutorial on ESOM deals with, I believe, a six dimensional problem. What has been your experience in dealing with ESOM's that have an order of magnitude larger dimensionality (e.g 100 dimensions)? Does the grid size matter in terms of getting robust results? Is there any scientific literature/papers that deal with the dimensionality Vs "size of the grid" issue? Your comments will be very much appreciated. 2) I understand that there is a random initialization of weights that goes on up front. One of the remedies that has been suggested was to try to initialize using the -pca command. The assumption here is that the -pca would produce better comparable maps in different runs. However, there appears to be a conflict with the usage of the -pca as it appears to deal with *border only* maps. By reading the ESOM tutorial it becomes evident that the conflict appears by the utility/usefulness of the border-less map-which appears to add value- (e.g. toroid) with the utility/usefullnes of having comparable maps using the -pca command. Do you have any suggestions/work around on this issue? 3) My last question is a little bit more conceptual in nature. Most of the literature on ESOM's and the visualization development that accompanies the literature, focuses on the objective of identifying regions of high density that will eventually be identified as clusters. I am more interested in the topological assortment of instances according to a degree of similarity provided by the ESOM and not necessarily in the formation of a high-density cluster (whatever the density definition might be). So, in U-matrix terms, my interest might lie more on the mountains and less on the valleys, as the area I am researching deals more with rare events. However, it often appears that these "rare events" occur in areas of low density (however that is defined) but at the same time appear to be *adjacent* to each other on the 2-D map. In U-matrix terms, this would be a series of mountains on the map that would form a mountain range. This mountain range might be of immense interest to my research. Would you happen to have any understanding/awareness as to: 1) What kind of parameter tuning can best sort out / display these "rare instances"? Would it be any different from the the current methodology, in your opinion? 2) The availability of any scientific literature that deals with this issue? Again, your commnets / help will be extremely appreciated. I hope I will be able to contribute more *actively* to the list in the future, as my understanding of the Databionic ESOM tool grows. Regards, Michael dat...@li... wrote: Send Databionic-ESOM-User mailing list submissions to dat...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/databionic-esom-user or, via email, send a message with subject or body 'help' to dat...@li... You can reach the person managing the list at dat...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Databionic-ESOM-User digest..." Today's Topics: 1. Re: Parameter Selection & ...More (Mario Noecker) 2. Re: Parameter Selection & ...More (Christian Stamm) 3. Re: Parameter Selection & ...More (=?ISO-8859-1?Q?Fabian_M=F6rchen?=) --__--__-- Message: 1 Date: Sat, 30 Apr 2005 17:08:52 +0000 From: Mario Noecker To: Michael Dell Junior CC: dat...@li... Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More Hi Michael, only a short answer, because I am in a hurry. yes, there is a random component, the grid initialisation. Try the pca initialisation to get maybe more similar maps. mario Michael Dell Junior wrote: > Dear Mario & Fabian, > > > Thank you very much for your valuable comments and prompt response. I > will implement your recommendations on an immediate basis. > > > I have one other request pertaining to the training using the > Databionics ESOM tool. What are the actions one needs to take if with > two *identical runs* (in terms of parameter selection, and training > set) obtains a different end result in terms of the > proximity/clustering of different instances to one another? (I am > aware that the Visual Part might be different on each run but my > expectation would be that that the underlying structural sorting of > the instances should be the same. e.g. In Run 1, Instance "234" is > surrounded by instances "456", "789" & "123". Shouldn't the same "234" > instance be surrounded by the same "456", "789" & "123" instances in > Run 2? ) > > Is this a sign of non-convergence? Is this a sign of some other > underlying process that I am not aware of? Is there a random component > that I am not aware of? > > Your comments and suggestion on this issue will be very much appreciated. > > > Regards, > Michael > > > > > ------------------------------------------------------------------------ > Post your free ad now! *Yahoo! Canada Personals* > --__--__-- Message: 2 Date: Sat, 30 Apr 2005 17:15:15 +0200 (CEST) Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More From: "Christian Stamm" To: dat...@li... Dear Michael, the ESOM - algorithm is indeed non-convergent. Every map you train will be unique. This is because of the random initialization of the map and the (optional) permutation of the input data during the training process. The overall structure of the map will be similar, but inter as will as intra cluster neighbourhoods may be twisted or sorted in another fashion, without though beeing less meaningful. e.g. the U-Matrix view on the map will unveal where large or low distances are present. welcome to the user community! mfg Christian Michael Dell Junior said: > Dear Mario & Fabian, > > > Thank you very much for your valuable comments and prompt response. I will > implement your recommendations on an immediate basis. > > > I have one other request pertaining to the training using the Databionics > ESOM tool. What are the actions one needs to take if with two *identical > runs* (in terms of parameter selection, and training set) obtains a > different end result in terms of the proximity/clustering of different > instances to one another? (I am aware that the Visual Part might be > different on each run but my expectation would be that that the underlying > structural sorting of the instances should be the same. e.g. In Run 1, > Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't > the same "234" instance be surrounded by the same "456", "789" & "123" > instances in Run 2? ) > > Is this a sign of non-convergence? Is this a sign of some other > underlying process that I am not aware of? Is there a random component > that I am not aware of? > > Your comments and suggestion on this issue will be very much appreciated. > > > Regards, > Michael > > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > --__--__-- Message: 3 Date: Sat, 30 Apr 2005 19:49:40 +0200 From: =?ISO-8859-1?Q?Fabian_M=F6rchen?= Organization: Databionics Research Group To: Christian Stamm , dat...@li... Subject: Re: [Databionic-ESOM-User] Parameter Selection & ...More hi, the ESOM training has not been proven to converge in the sense k-Means or more generally EM does, except for simple 1D settings. the 'convergence' is simulated by the cooling of the parameters, the neighborhood in particular (a constant learning rate doesn't hurt). you can obtain exactly reproducible maps by _not_ using "-p" (permute data) and "-i pca", as explained by Mario and Christian, but this may not be what you want. "-p" in particular should always be active if you data is sorted in a particular order, e.g. by known clusters as for many of the toy examples. training the map with all data points from one cluster before introducing points from other regions in the data space may distort the map towards the first cluster. the pca initialization is a good way of obtaining visually more comparable maps from several runs. if there are clusters in the data, they should show up on different maps run with the same parameters. the local neighborhood of a few points you described is usually not expected to be the same over several runs. i don't know the dimensionality of your data, but assume it is > 3d. imagine what happens to the 2D grid of ESOM prototypes in the high dimensional space. it adjusts to the data: many prototypes are placed where the data resides, while the grid is stretched in the regions between. the following picture show this for the chainlink datasets: http://www.mathematik.uni-marburg.de/~databionics/de//images/chainlink_esom3d.png if you have a, say 10 dimensional space and are looking at some data points inside a densely populated region, there is no way of predicting, how the 2D ESOM grid will locally adjust to this 10 dimensional cloud, thus the local neighborhood relations will not show reproducible behaviour. if your 5 points are relatively far from each other in an otherwise empty region, they should be represented on different maps in a similar way, but this is a special case. bye fabian Christian Stamm wrote: > Dear Michael, > > the ESOM - algorithm is indeed non-convergent. Every map you train will be > unique. This is because of the random initialization of the map and the > (optional) permutation of the input data during the training process. The > overall structure of the map will be similar, but inter as will as intra > cluster neighbourhoods may be twisted or sorted in another fashion, > without though beeing less meaningful. e.g. the U-Matrix view on the map > will unveal where large or low distances are present. > > welcome to the user community! > > mfg Christian > > Michael Dell Junior said: > >>Dear Mario & Fabian, >> >> >>Thank you very much for your valuable comments and prompt response. I will >>implement your recommendations on an immediate basis. >> >> >>I have one other request pertaining to the training using the Databionics >>ESOM tool. What are the actions one needs to take if with two *identical >>runs* (in terms of parameter selection, and training set) obtains a >>different end result in terms of the proximity/clustering of different >>instances to one another? (I am aware that the Visual Part might be >>different on each run but my expectation would be that that the underlying >>structural sorting of the instances should be the same. e.g. In Run 1, >>Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't >>the same "234" instance be surrounded by the same "456", "789" & "123" >>instances in Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >>underlying process that I am not aware of? Is there a random component >>that I am not aware of? >> >>Your comments and suggestion on this issue will be very much appreciated. >> >> >>Regards, >>Michael >> >> >> >> >> >>--------------------------------- >>Post your free ad now! Yahoo! Canada Personals >> > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. > Get your fingers limbered up and give it your best shot. 4 great events, 4 > opportunities to win big! Highest score wins.NEC IT Guy Games. Play to > win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 > _______________________________________________ > Databionic-ESOM-User mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user --__--__-- _______________________________________________ Databionic-ESOM-User mailing list Dat...@li... https://lists.sourceforge.net/lists/listinfo/databionic-esom-user End of Databionic-ESOM-User Digest --------------------------------- Post your free ad now! Yahoo! Canada Personals |
From:
<fa...@in...> - 2005-04-30 17:50:01
|
hi, the ESOM training has not been proven to converge in the sense k-Means or more generally EM does, except for simple 1D settings. the 'convergence' is simulated by the cooling of the parameters, the neighborhood in particular (a constant learning rate doesn't hurt). you can obtain exactly reproducible maps by _not_ using "-p" (permute data) and "-i pca", as explained by Mario and Christian, but this may not be what you want. "-p" in particular should always be active if you data is sorted in a particular order, e.g. by known clusters as for many of the toy examples. training the map with all data points from one cluster before introducing points from other regions in the data space may distort the map towards the first cluster. the pca initialization is a good way of obtaining visually more comparable maps from several runs. if there are clusters in the data, they should show up on different maps run with the same parameters. the local neighborhood of a few points you described is usually not expected to be the same over several runs. i don't know the dimensionality of your data, but assume it is > 3d. imagine what happens to the 2D grid of ESOM prototypes in the high dimensional space. it adjusts to the data: many prototypes are placed where the data resides, while the grid is stretched in the regions between. the following picture show this for the chainlink datasets: http://www.mathematik.uni-marburg.de/~databionics/de//images/chainlink_esom3d.png if you have a, say 10 dimensional space and are looking at some data points inside a densely populated region, there is no way of predicting, how the 2D ESOM grid will locally adjust to this 10 dimensional cloud, thus the local neighborhood relations will not show reproducible behaviour. if your 5 points are relatively far from each other in an otherwise empty region, they should be represented on different maps in a similar way, but this is a special case. bye fabian Christian Stamm wrote: > Dear Michael, > > the ESOM - algorithm is indeed non-convergent. Every map you train will be > unique. This is because of the random initialization of the map and the > (optional) permutation of the input data during the training process. The > overall structure of the map will be similar, but inter as will as intra > cluster neighbourhoods may be twisted or sorted in another fashion, > without though beeing less meaningful. e.g. the U-Matrix view on the map > will unveal where large or low distances are present. > > welcome to the user community! > > mfg Christian > > Michael Dell Junior said: > >>Dear Mario & Fabian, >> >> >>Thank you very much for your valuable comments and prompt response. I will >>implement your recommendations on an immediate basis. >> >> >>I have one other request pertaining to the training using the Databionics >>ESOM tool. What are the actions one needs to take if with two *identical >>runs* (in terms of parameter selection, and training set) obtains a >>different end result in terms of the proximity/clustering of different >>instances to one another? (I am aware that the Visual Part might be >>different on each run but my expectation would be that that the underlying >>structural sorting of the instances should be the same. e.g. In Run 1, >>Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't >>the same "234" instance be surrounded by the same "456", "789" & "123" >>instances in Run 2? ) >> >> Is this a sign of non-convergence? Is this a sign of some other >>underlying process that I am not aware of? Is there a random component >>that I am not aware of? >> >>Your comments and suggestion on this issue will be very much appreciated. >> >> >>Regards, >>Michael >> >> >> >> >> >>--------------------------------- >>Post your free ad now! Yahoo! Canada Personals >> > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. > Get your fingers limbered up and give it your best shot. 4 great events, 4 > opportunities to win big! Highest score wins.NEC IT Guy Games. Play to > win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 > _______________________________________________ > Databionic-ESOM-User mailing list > Dat...@li... > https://lists.sourceforge.net/lists/listinfo/databionic-esom-user |
From: Christian S. <st...@Ma...> - 2005-04-30 15:15:22
|
Dear Michael, the ESOM - algorithm is indeed non-convergent. Every map you train will be unique. This is because of the random initialization of the map and the (optional) permutation of the input data during the training process. The overall structure of the map will be similar, but inter as will as intra cluster neighbourhoods may be twisted or sorted in another fashion, without though beeing less meaningful. e.g. the U-Matrix view on the map will unveal where large or low distances are present. welcome to the user community! mfg Christian Michael Dell Junior said: > Dear Mario & Fabian, > > > Thank you very much for your valuable comments and prompt response. I will > implement your recommendations on an immediate basis. > > > I have one other request pertaining to the training using the Databionics > ESOM tool. What are the actions one needs to take if with two *identical > runs* (in terms of parameter selection, and training set) obtains a > different end result in terms of the proximity/clustering of different > instances to one another? (I am aware that the Visual Part might be > different on each run but my expectation would be that that the underlying > structural sorting of the instances should be the same. e.g. In Run 1, > Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't > the same "234" instance be surrounded by the same "456", "789" & "123" > instances in Run 2? ) > > Is this a sign of non-convergence? Is this a sign of some other > underlying process that I am not aware of? Is there a random component > that I am not aware of? > > Your comments and suggestion on this issue will be very much appreciated. > > > Regards, > Michael > > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > |
From: Mario N. <noe...@Ma...> - 2005-04-30 15:07:46
|
Hi Michael, only a short answer, because I am in a hurry. yes, there is a random component, the grid initialisation. Try the pca initialisation to get maybe more similar maps. mario Michael Dell Junior wrote: > Dear Mario & Fabian, > > > Thank you very much for your valuable comments and prompt response. I > will implement your recommendations on an immediate basis. > > > I have one other request pertaining to the training using the > Databionics ESOM tool. What are the actions one needs to take if with > two *identical runs* (in terms of parameter selection, and training > set) obtains a different end result in terms of the > proximity/clustering of different instances to one another? (I am > aware that the Visual Part might be different on each run but my > expectation would be that that the underlying structural sorting of > the instances should be the same. e.g. In Run 1, Instance "234" is > surrounded by instances "456", "789" & "123". Shouldn't the same "234" > instance be surrounded by the same "456", "789" & "123" instances in > Run 2? ) > > Is this a sign of non-convergence? Is this a sign of some other > underlying process that I am not aware of? Is there a random component > that I am not aware of? > > Your comments and suggestion on this issue will be very much appreciated. > > > Regards, > Michael > > > > > ------------------------------------------------------------------------ > Post your free ad now! *Yahoo! Canada Personals* > <http://ca.personals.yahoo.com/> |
From: Michael D. J. <mic...@ya...> - 2005-04-30 14:57:51
|
Dear Mario & Fabian, Thank you very much for your valuable comments and prompt response. I will implement your recommendations on an immediate basis. I have one other request pertaining to the training using the Databionics ESOM tool. What are the actions one needs to take if with two *identical runs* (in terms of parameter selection, and training set) obtains a different end result in terms of the proximity/clustering of different instances to one another? (I am aware that the Visual Part might be different on each run but my expectation would be that that the underlying structural sorting of the instances should be the same. e.g. In Run 1, Instance "234" is surrounded by instances "456", "789" & "123". Shouldn't the same "234" instance be surrounded by the same "456", "789" & "123" instances in Run 2? ) Is this a sign of non-convergence? Is this a sign of some other underlying process that I am not aware of? Is there a random component that I am not aware of? Your comments and suggestion on this issue will be very much appreciated. Regards, Michael --------------------------------- Post your free ad now! Yahoo! Canada Personals |
From:
<fa...@in...> - 2005-04-30 12:42:44
|
Hi Michael, let me just add a few comments to what Mario said. - you don't neccessarily need more neurons than data points. it all depends on the view on your data that you want to have. first off, ESOM need to be large, otherwise they would be k-Means SOM where one neuron is one cluster. Large starts somewhere > 1000, we hardly ever go below 50x82=4100 (~4096=64x64 what we used before we found out about rectangular maps). with small dataset you will have enough room on the map to see cluster structure (if there is any) _and_ inner cluster relations. the more data you have, the less room there will be and data points are placed on top of each other. you will still see the global structure, but less details. enlargening the map will help, but slow down the training of course. i would always start with the default size and go larger if it seems neccessary based on the result. i have also successfully used sampling on a large (30K) dataset. i trained 50x82 maps on a 3K sample, identified clusters with the class mask tool, and used the classification mode to transfer the result to the complete data. - start radius: like mario said about half the smaller grid size. a too large value will 'waste' the early training episodes, because almost the whole map will be pulled back and forth by the updates. what you want is to have part of the map pulled towards a cluster by the updates of the corresponding data points and other parts towards other clusters. if you start with too small a radius, there is a danger of 'loosing' neurons that will nevery be pulled anywhere and keep their random values from the initialization. and even though you didn't ask, but someone else might soon: - end radius: small (=1) if you want a lot of detail, a little larger to concentrate on coarser sturctures. - episodes: the number of training episodes isn't closely related to the choice of the other parameters. in some publications several thousand training episodes are mentioned. this is a complete waste of computing power. somewhere between 20 and 50 should provide a slow enough cooling of the parameters. the toy examples like hepta are a good starting point to explore the behaviour of ESOM given different parameter settings. you do have to make some extreme choices, however, to really not make it work. we consider it fairly robust w.r.t. the parameters from our long year experiences. please note the technichal report, that covers some of the above questions, but with a less hands-on tough: [Ultsch 2005b] Ultsch, A., Moerchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM, Technical Report Dept. of Mathematics and Computer Science, University of Marburg, Germany, No. 46, (2005) http://www.mathematik.uni-marburg.de/~databionics/downloads/papers/ultsch05esom.pdf bye fabian p.s. Michael did reply to the list, only a sf filter sent it to the list admin (me) for approval first. |
From: Michael D. J. <mic...@ya...> - 2005-04-30 12:07:52
|
Hi Mario, Thank you for the valuable information you have provided. Michael Mario Noecker <noe...@Ma...> wrote:Hi Michael, there are no real researchs on gridsize, but I can tell you, what I prefere. By using a dataset with 5000 instances, I would use a grid with about 90 rows and 125 columns. if you have 5000 instances then you need at least 10000 neurons. In any event the neighborhood radius is a question of the kind of neighbohoodfunction you are using. Mostly I use the gaussian neighborhood. By gaussian neighborhood a good starting value is half of the smaller dimension of the grid. In a training with our 90*125 grid, this will be about 45. More then this isn`t really usefull, because some neurons get changed more then one time during the update of one neuron. (if you are using the toroid grid, what we are recommending) But in the first epoch it is usefull to change nearly the whole map in one update. If you are using the bubble neighborhoodfunction probably it is senseless to update the whole map, because every neuron will be changed the same way. A training with such a big dataset, grid and radius will take a really long time. Maybe you can use the batch version, which gives a speed up of 25%. sent your next message to the whole list, maybe some other guys want to replie something. bye mario > Dear Mario, > > Thank you very much for your VERY QUICK response. It is very much > appreciated. > > I am glad that you mention that the size of the grid depends on the > size of the dataset. I am aware that we need much more neurons than > datapoints. However, is there an *approximate* rule for this? (E.g. If > you have a dataset made up of 5000 instances, what would be a range > for the size of the grid?)From your experience, what is the size of > the dataset you use when the size of the grid is 50 by 82 and 70 by 110? > > > In addition, in terms of the size of the neighbourhood, what is a good > approximate rule to follow? Moreover, in a grid where there are much > more neurons than datapoints (e.g. a 100 to 1 ratio), what would the > impact of a bigger or smaller neighborhood be? > > I would like to thank you in advance for your continued support. > Regards, > Michael > > > */Mario Noecker /* wrote: > > Hi Michael, > > the best size of the grid depends on the size of the dataset. Using > emergent SOMs you need of course much more neurons then datapoints. > Mostly we use grids with 50 rows and 82 columns, or 70 rows and 110 > columns. We realized grids with a rectangular shape offer the best > results. > > We mostly use 20-30 epochs of training without relating to the > gridsize. > > if there are more questions, please ask. > > mario > > Michael Dell Junior wrote: > > > Hi all, > > > > Congratulations on the development effort for the production of > such a > > good tool. > > > > I have a few questions regarding the use of the tool: > > > > > > 1) Is there a "right number" for the size of the grid for > convergence > > to occur? > > 2) Is there a relationship (or a rule of thumb) between the > number of > > traning instances used and the right size of the grid? > > > > > > > > Your help will be very much appreciated. > > > > Regards, > > Michael Dell Junior > > > > > > > > > ------------------------------------------------------------------------ > > Post your free ad now! *Yahoo! Canada Personals* > > > > > > ------------------------------------------------------------------------ > Post your free ad now! *Yahoo! Canada Personals* > --------------------------------- Post your free ad now! Yahoo! Canada Personals |
From: Mario N. <noe...@Ma...> - 2005-04-30 11:53:55
|
...to the whole list |
From: Michael D. J. <mic...@ya...> - 2005-04-30 11:09:42
|
Dear Mario, Thank you very much for your VERY QUICK response. It is very much appreciated. I am glad that you mention that the size of the grid depends on the size of the dataset. I am aware that we need much more neurons than datapoints. However, is there an *approximate* rule for this? (E.g. If you have a dataset made up of 5000 instances, what would be a range for the size of the grid?)From your experience, what is the size of the dataset you use when the size of the grid is 50 by 82 and 70 by 110? In addition, in terms of the size of the neighbourhood, what is a good approximate rule to follow? Moreover, in a grid where there are much more neurons than datapoints (e.g. a 100 to 1 ratio), what would the impact of a bigger or smaller neighborhood be? I would like to thank you in advance for your continued support. Regards, Michael Mario Noecker <noe...@Ma...> wrote: Hi Michael, the best size of the grid depends on the size of the dataset. Using emergent SOMs you need of course much more neurons then datapoints. Mostly we use grids with 50 rows and 82 columns, or 70 rows and 110 columns. We realized grids with a rectangular shape offer the best results. We mostly use 20-30 epochs of training without relating to the gridsize. if there are more questions, please ask. mario Michael Dell Junior wrote: > Hi all, > > Congratulations on the development effort for the production of such a > good tool. > > I have a few questions regarding the use of the tool: > > > 1) Is there a "right number" for the size of the grid for convergence > to occur? > 2) Is there a relationship (or a rule of thumb) between the number of > traning instances used and the right size of the grid? > > > > Your help will be very much appreciated. > > Regards, > Michael Dell Junior > > > > ------------------------------------------------------------------------ > Post your free ad now! *Yahoo! Canada Personals* > --------------------------------- Post your free ad now! Yahoo! Canada Personals |
From: Mario N. <noe...@Ma...> - 2005-04-30 09:48:36
|
Hi Michael, the best size of the grid depends on the size of the dataset. Using emergent SOMs you need of course much more neurons then datapoints. Mostly we use grids with 50 rows and 82 columns, or 70 rows and 110 columns. We realized grids with a rectangular shape offer the best results. We mostly use 20-30 epochs of training without relating to the gridsize. if there are more questions, please ask. mario Michael Dell Junior wrote: > Hi all, > > Congratulations on the development effort for the production of such a > good tool. > > I have a few questions regarding the use of the tool: > > > 1) Is there a "right number" for the size of the grid for convergence > to occur? > 2) Is there a relationship (or a rule of thumb) between the number of > traning instances used and the right size of the grid? > > > > Your help will be very much appreciated. > > Regards, > Michael Dell Junior > > > > ------------------------------------------------------------------------ > Post your free ad now! *Yahoo! Canada Personals* > <http://ca.personals.yahoo.com/> |
From: Michael D. J. <mic...@ya...> - 2005-04-30 09:22:57
|
Hi all, Congratulations on the development effort for the production of such a good tool. I have a few questions regarding the use of the tool: 1) Is there a "right number" for the size of the grid for convergence to occur? 2) Is there a relationship (or a rule of thumb) between the number of traning instances used and the right size of the grid? Your help will be very much appreciated. Regards, Michael Dell Junior --------------------------------- Post your free ad now! Yahoo! Canada Personals |
From:
<fa...@in...> - 2005-04-22 08:15:45
|
changes: http://databionic-esom.sourceforge.net/changes-report.html known bugs: - bestmatch size gets bigger (to 5) if you toggle the display of class letters. - bold display of classes doesn't make all bestmatches of the classbold. - class "0" (= no class) doesn't always contain the complement of the classified bestmatches. workaround: press the clear button once before starting to classify. - autoload is sometimes active even if the checkbox is turned on feature request: - colored letter bestmatches. bye fabian |