You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(9) |
Sep
(4) |
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(3) |
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
(13) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Harris, K. D <ken...@im...> - 2011-02-28 16:55:10
|
It doesn't just do one M E and C step - it starts a whole new KlustaKwik from the randomly chosen start point, and iterates until finished. We have found this often splits clusters that need splitting. -----Original Message----- From: Peter Nathan Steinmetz [mailto:pet...@st...] Sent: 26 February 2011 04:38 To: klu...@li... Cc: mic...@co...; Harris, Kenneth D Subject: does splitting work Hi All, While I was looking through this, I started to wonder if the current splitting strategy results in useful splits? We don't do much with the TrySplits set in our lab, but it appears the points in the existing cluster are randomly allocated to two different clusters and an M E and C step are then taken. It seems like this will result in the means of the two new clusters being quite close to the old cluster center and each other. Would a better strategy be a PC decomposition of the points, then either looking for minima in the empirical distribution of the coordinates of the 1st PC as the place to split, or even just setting them to the mean + - one or two standard deviations? Peter P.S. for Ken: do you want to have your new email in the klustakwik-develop mailing list? And Michael, do you want to join? These options available at https://lists.sourceforge.net/lists/listinfo/klustakwik-develop -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter |
From: Peter N. S. <pet...@st...> - 2011-02-26 04:37:57
|
Hi All, While I was looking through this, I started to wonder if the current splitting strategy results in useful splits? We don't do much with the TrySplits set in our lab, but it appears the points in the existing cluster are randomly allocated to two different clusters and an M E and C step are then taken. It seems like this will result in the means of the two new clusters being quite close to the old cluster center and each other. Would a better strategy be a PC decomposition of the points, then either looking for minima in the empirical distribution of the coordinates of the 1st PC as the place to split, or even just setting them to the mean + - one or two standard deviations? Peter P.S. for Ken: do you want to have your new email in the klustakwik-develop mailing list? And Michael, do you want to join? These options available at https://lists.sourceforge.net/lists/listinfo/klustakwik-develop -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter |
From: Peter N. S. <pet...@st...> - 2011-02-26 04:18:46
|
Hi Michael, I was just looking at this a bit in the repository available at: http://klustakwik.git.sourceforge.net/git/gitweb-index.cgi I've also never heard of a 2.0, and there wasn't such a tag in the repository. Perhaps it was done outside the project itself. In the old release system, a release could have a name as well as a number. Looking at the dates, I think Subsetter was just the name for the 1.8 release. It was just after this that I decided to start on the Java port. The changes made to the C++ version along the 1.8 branch had been made for a number of reasons. Mostly to allow use of the core computing functionality in a more flexible way (e.g. not having to have features in a file named .fet.1, not having to set binary flags for the features to use, etc) as well as to try and clarify the structure of the code. Since we use primarily Java in the lab, I just decided it would be easier to have a Java version that would fit in with our other code and be more readily callable from it. There were a number of other perceived advantages that also motivated this: Easy use of the JUnit testing framework to have explicit tests for things like the M-step working correctly, even after changes are made; binary multi-platform compatibility (don't need to recompile can just run it from the jar on any system with a JVM); and easier programming (I think I program ten times more quickly in Java than in C++, though I am expert in both, likely due to the simpler syntax of Java and only one file for each class, instead of a .h and .cpp that have to be kept in sync). In terms of performance, the Java version runs perhaps 10-30% longer than the C++ version, really not bad at all. We've used Java on a number of clustering systems, the commands queue up just fine. Another reason for this move was our desire to migrate toward using Hadoop (http://hadoop.apache.org/), which permits loosely coupled cluster computing on disparate hardware, and this is more easily done with Java (though can also be done with C++). The Java port was really a background project, way down in priority below grants and papers, and so I just released R1.0 last week. Your testing or and comments on it would certainly be most welcome if you'd like to give it a spin. cheers, Peter On Feb 25, 2011, at 2:51 AM, Michael Zugaro wrote: > Hi Peter, > > Actually, I am not even sure versions 1.8 and 2.0 are actually different. > Also, there is the 'subsetter' version which may or may not be the same as > these. Finally, it's the first time I've heard of a Java port... Why was > KlustaKwik ported to Java (I could guess 'platform independence', but > gui-less C/C++ is portable)? How efficient is the Java code compared to > C/C++? How easy is it to have jobs managed by a queuing system? > > Cheers, > > -- > Michaël Zugaro > CNRS - Collège de France, LPPA > 11, place Marcelin Berthelot > 75005 Paris > Tel (33) 1 44 27 12 93 > Fax (33) 1 44 27 13 82 -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter |
From: Michael Z. <mic...@co...> - 2011-02-25 09:52:13
|
Hi Peter, Actually, I am not even sure versions 1.8 and 2.0 are actually different. Also, there is the 'subsetter' version which may or may not be the same as these. Finally, it's the first time I've heard of a Java port... Why was KlustaKwik ported to Java (I could guess 'platform independence', but gui-less C/C++ is portable)? How efficient is the Java code compared to C/C++? How easy is it to have jobs managed by a queuing system? Cheers, -- Michaël Zugaro CNRS - Collège de France, LPPA 11, place Marcelin Berthelot 75005 Paris Tel (33) 1 44 27 12 93 Fax (33) 1 44 27 13 82 > Hi Michael, > > Thanks for the update. I've committed the same change in the jKlustaKwik > version. > > As you are working on the R1.5 C++ branch, let me know if you'd like any > assistance with check in and check out of the source code from the git > repository. I'd never heard of a version 2 and if we can keep things > checked in, it may flow more smoothly for the next person or group in the > future. > > cheers, > Peter > > On Feb 24, 2011, at 4:35 AM, Michael Zugaro wrote: > > Hi Peter and Ken, > > > > Yes, that was my concern too. Some versions of KlustaKwik had > > subdirectories (linux, mac, windows) and others didn't. But there were > > many versions floating around (some with a number, e.g. 2.0, others with > > a suffix, e.g. Subsetter), and nobody seemed to remember which was the > > latest. > > > > Anyway, although debugging the recurrent problems we and others were > > experiencing did take some effort, in the end the fix turned out to be > > *very* simple. The problem arises when KlustaKwik tries to split a > > cluster, ends up with all points in the same cluster, then later aborts > > because splitting should have produced two clusters. > > > > To correct this, it is sufficient to replace the following code (on line > > 700 of KlustaKwik 1.5, but this may vary depending on the version): > > > > if(SplitScore<UnsplitScore) { > > > > with: > > > > if(K2.nClustersAlive<2) Output("Split failed - leaving alone\n"); > > if(SplitScore<UnsplitScore&K2.nClustersAlive>=2) { > > > > That's it! > > > > Michaël > > > > -- > > Michaël Zugaro > > CNRS - Collège de France, LPPA > > 11, place Marcelin Berthelot > > 75005 Paris > > Tel (33) 1 44 27 12 93 > > Fax (33) 1 44 27 13 82 |
From: Peter N. S. <pet...@st...> - 2011-02-24 21:49:34
|
Hi Michael, Thanks for the update. I've committed the same change in the jKlustaKwik version. As you are working on the R1.5 C++ branch, let me know if you'd like any assistance with check in and check out of the source code from the git repository. I'd never heard of a version 2 and if we can keep things checked in, it may flow more smoothly for the next person or group in the future. cheers, Peter On Feb 24, 2011, at 4:35 AM, Michael Zugaro wrote: > > Hi Peter and Ken, > > Yes, that was my concern too. Some versions of KlustaKwik had subdirectories > (linux, mac, windows) and others didn't. But there were many versions > floating around (some with a number, e.g. 2.0, others with a suffix, > e.g. Subsetter), and nobody seemed to remember which was the latest. > > Anyway, although debugging the recurrent problems we and others were > experiencing did take some effort, in the end the fix turned out to be *very* > simple. The problem arises when KlustaKwik tries to split a cluster, ends up > with all points in the same cluster, then later aborts because splitting > should have produced two clusters. > > To correct this, it is sufficient to replace the following code (on line > 700 of KlustaKwik 1.5, but this may vary depending on the version): > > if(SplitScore<UnsplitScore) { > > with: > > if(K2.nClustersAlive<2) Output("Split failed - leaving alone\n"); > if(SplitScore<UnsplitScore&K2.nClustersAlive>=2) { > > That's it! > > Michaël > > -- > Michaël Zugaro > CNRS - Collège de France, LPPA > 11, place Marcelin Berthelot > 75005 Paris > Tel (33) 1 44 27 12 93 > Fax (33) 1 44 27 13 82 |
From: Michael Z. <mic...@co...> - 2011-02-24 11:36:09
|
> Hi Ken & Michael, > > In terms of your question: > > Does the latest C++ version compile on a Mac? (Warnings are one thing, > > errors another...) > > Yes, it compiles and runs, it just throws a whole bunch of these warnings, > mostly about const char * > > In terms of the bug which was fixed, can you indicate which lines or files > were affected? I'd like to make sure the java version incorporates any fix, > if needed. > > thanks, > Peter Hi Peter and Ken, Yes, that was my concern too. Some versions of KlustaKwik had subdirectories (linux, mac, windows) and others didn't. But there were many versions floating around (some with a number, e.g. 2.0, others with a suffix, e.g. Subsetter), and nobody seemed to remember which was the latest. Anyway, although debugging the recurrent problems we and others were experiencing did take some effort, in the end the fix turned out to be *very* simple. The problem arises when KlustaKwik tries to split a cluster, ends up with all points in the same cluster, then later aborts because splitting should have produced two clusters. To correct this, it is sufficient to replace the following code (on line 700 of KlustaKwik 1.5, but this may vary depending on the version): if(SplitScore<UnsplitScore) { with: if(K2.nClustersAlive<2) Output("Split failed - leaving alone\n"); if(SplitScore<UnsplitScore&K2.nClustersAlive>=2) { That's it! Michaël -- Michaël Zugaro CNRS - Collège de France, LPPA 11, place Marcelin Berthelot 75005 Paris Tel (33) 1 44 27 12 93 Fax (33) 1 44 27 13 82 |
From: Peter N. S. <pet...@st...> - 2011-02-23 17:01:07
|
Hi Ken & Michael, In terms of your question: > Does the latest C++ version compile on a Mac? (Warnings are one thing, errors another...) Yes, it compiles and runs, it just throws a whole bunch of these warnings, mostly about const char * In terms of the bug which was fixed, can you indicate which lines or files were affected? I'd like to make sure the java version incorporates any fix, if needed. thanks, Peter |
From: Harris, K. D <ken...@im...> - 2011-02-23 15:06:06
|
Hi Peter, The reason for the update was that Michael Zugaro (CC'ed) found and fixed a bug. I'm afraid the versions have got in a bit of a mess. When Michael found the bug, he asked me what was the latest version and I had to admit I didn't know. We are going to be working on it here soon (trying to get it to work for large channel counts). We will only work on the C++ version, and will do our best to avoid more version forks. Does the latest C++ version compile on a Mac? (Warnings are one thing, errors another...) All the best, Kenneth. -----Original Message----- From: Peter Nathan Steinmetz [mailto:pet...@st...] Sent: 20 February 2011 20:23 To: klu...@li... Subject: [Klustakwik-develop] could always reverse master/ warnings Hi All, I suppose we could always change the master branch to be the C++ development line and branch the java line, just a matter of the branch you checkout and work on. BTW, I just tried compiling the C++ version here on a Mac and it throws a log of warnings about a redefinition of M_PI and a lot about conversions from a string constant to char* (in general, variables that are string constants should be const char* to avoid this). cheers, Peter ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ Klustakwik-develop mailing list Klu...@li... https://lists.sourceforge.net/lists/listinfo/klustakwik-develop |
From: Peter N. S. <pet...@st...> - 2011-02-23 00:16:58
|
Hi All, Just put up a rev 1.0 for jKlustaKwik. At least it is now sorting the test case properly. Any further testing and bashing on would be much appreciated. Note for Ken: I changed the copyright claim in the readme for the Java version to be me, as it is mostly a big rewrite, but if you'd rather I put it back to you, that is fine also. Peter |
From: Peter N. S. <pet...@st...> - 2011-02-20 20:23:20
|
Hi All, I suppose we could always change the master branch to be the C++ development line and branch the java line, just a matter of the branch you checkout and work on. BTW, I just tried compiling the C++ version here on a Mac and it throws a log of warnings about a redefinition of M_PI and a lot about conversions from a string constant to char* (in general, variables that are string constants should be const char* to avoid this). cheers, Peter |
From: Peter N. S. <pet...@st...> - 2011-02-20 20:15:25
|
Hi All, Just made a set of branches in the git source code repository. The code for the 2.01 release is quite close to the older R1.5, so I created a R1.5 branch and a R2.01 release tag so we can always get this code back. I would suggest that further development of the simplified C++ version take place on the R1.5 branch. Java development can remain on the master branch. cheers, Peter |
From: Peter N. S. <pet...@st...> - 2011-02-20 04:55:53
|
Hi All, Just turned off subversion source code control after converting source repository to git. Peter |
From: Peter N. S. <pet...@st...> - 2011-02-20 01:46:51
|
Hi All, I just noticed the check-in of the new files, but these aren't in the source repository. They also look like older sources, if memory serves. How did we want to have this set up? cheers, Peter -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter |
From: Kenneth D. H. <kdh...@ru...> - 2009-06-22 17:15:29
|
Hi Peter, I think this was because most of the data sets wound up being split into at least 20 clusters in. Even if there were only 2 real cells in the recording. So if we started with 2 it just split them up so much anyway. On this subject: one thing that many people have found useful is to use ellipsoidal t-distributions instead of Gaussians for the cluster models (Best ref I know for this is Shy Shoham et al). I hear this leads to a big reduction in overclustering, because a lot of the overclustering occurs because it is trying to fit Gaussians to distributions that are not truly Gaussian ... In this case, starting with 2 to 10 clusters might actually. It should be fairly simple to incorporate into KK ... -----Original Message----- From: Peter N. Steinmetz [mailto:Pet...@st...] Sent: Thursday, June 18, 2009 8:37 PM To: klu...@li... Subject: [Klustakwik-develop] why 20-30 clusters? Does anyone know why a MinClusters of 20 and MaxClusters of 30 was introduced as the default values? This is now the setting in the SubSetter release, which incorporating the initial subset code from one of Ken's versions. Should this be kept or changed back to 2-10, which is what it was in initial release? cheers, Peter -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter ---------------------------------------------------------------------------- -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Klustakwik-develop mailing list Klu...@li... https://lists.sourceforge.net/lists/listinfo/klustakwik-develop |
From: Peter N. S. <Pet...@st...> - 2009-06-19 02:39:54
|
Does anyone know why a MinClusters of 20 and MaxClusters of 30 was introduced as the default values? This is now the setting in the SubSetter release, which incorporating the initial subset code from one of Ken's versions. Should this be kept or changed back to 2-10, which is what it was in initial release? cheers, Peter -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter |
From: Peter N. S. <Pet...@st...> - 2008-12-17 04:35:54
|
I just checked in as the jKlustaKwik directory of the source tree a java version of KlustaKwik. This initial version includes the option for sorting of a random subset of the data, resulting in significant speed improvement. In general, this runs in about 130% the time of an optimized C++ compilation, but is multi-platform. Options similar to the C++ command are available as the jKlustaKwik.LegacyCmd class. I anticipate adding another command front end, which doesn't require specific file naming conventions, in the near future. Other changes needed are a change of all the logging and output to use Log4J, as well as modifications of the build.xml file, to incorporate jars from more generic locations. cheers, Peter |
From: Peter N. S. <Pet...@st...> - 2008-12-09 00:54:26
|
Hi All, OK, I just released two C++ versions. R1-8 along the main line, permitting saves of state and with the refactorings. The Subsetter release has Ken's code for subsets of points, branched from R1-5. I changed the source code management to SVN, for the future, and incorporated previous CVS history into it. I'm going to soon release a new package under this project, jKlustaKwik. This is a java version of KlustaKwik, incorporating features from R1-7 and Subsetter. This seems to run at about 130% of the time for the C++ version and is completely multi-platform. I also find it easier to use unit testing and refactoring from Java, and so will confine my future efforts to this java version. cheers, Peter |
From: Kenneth D. H. <kdh...@ru...> - 2008-12-01 19:15:40
|
Cluster 0 corresponds to a uniform distribution. First the data is all normalized to lie in the range 0 to 1. The probability model is a mixture of a variable number of Gaussians and a non-negotiable uniform distribution (cluster 0). This way, any crazy outliers will be assigned to cluster 0 rather than one of the Gaussians (which would mess them up). Cluster 0 is not allowed to die, because any point has to be assignable to cluster 0 at any time. Ken. -----Original Message----- From: Peter N. Steinmetz [mailto:Pet...@st...] Sent: Saturday, November 29, 2008 7:18 PM To: klu...@li... Subject: [Klustakwik-develop] what happens with cluster 0 Can someone (perhaps Ken) let me know what is going on with the noise cluster? I can't understand exactly how this is used, even back in rev 1.5. The description for nClustersAlive in KlustaKwik.h there says it is the number of clusters alive, excluding the noise cluster Yet the code for Reindex, explicitly sets AliveIndex[0] = 0 and sets nClustersAlive=1, as though the cluster 0 is counted and is alive, and is indexed by 0. Then the m-step iterates the cc index variable from 0 only to nClustersAlive-1 (inclusive), which then acts like cluster 0 is considered alive, but supposedly nClustersAlive didn't count the noise cluster, cluster 0. The output cluster file code talks about needing to add 1 to the cluster numbers in the array when writing, as though 0 was the value in Class[j] if the point j was considered to belong to class 1. So how does a point ever belong to the noise cluster? If so, what would it's entry in Class be? thanks, Peter ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Klustakwik-develop mailing list Klu...@li... https://lists.sourceforge.net/lists/listinfo/klustakwik-develop |
From: Peter N. S. <Pet...@st...> - 2008-11-30 00:17:57
|
Can someone (perhaps Ken) let me know what is going on with the noise cluster? I can't understand exactly how this is used, even back in rev 1.5. The description for nClustersAlive in KlustaKwik.h there says it is the number of clusters alive, excluding the noise cluster Yet the code for Reindex, explicitly sets AliveIndex[0] = 0 and sets nClustersAlive=1, as though the cluster 0 is counted and is alive, and is indexed by 0. Then the m-step iterates the cc index variable from 0 only to nClustersAlive-1 (inclusive), which then acts like cluster 0 is considered alive, but supposedly nClustersAlive didn't count the noise cluster, cluster 0. The output cluster file code talks about needing to add 1 to the cluster numbers in the array when writing, as though 0 was the value in Class[j] if the point j was considered to belong to class 1. So how does a point ever belong to the noise cluster? If so, what would it's entry in Class be? thanks, Peter |
From: <kdh...@an...> - 2008-11-29 22:28:26
|
Hi I have a modification that makes it run up to 10 times faster, which I wrote years ago, and have unfortunately not got round to releasing. The way it works is with a -Subset option, which uses a subset of points to fit the Gaussians, then classifies all points accordingly. The reason is that I had written on top of my last code version (1.5 i think), and so it forked the old code. I'm afraid I don't have time to merge them. But I have attached the code. Ken. > Hi All, > > I know there hasn't been much work lately, but I've decided to abandon > the main source tree with the extensive refactorings and prepare a > release to bring the header files and using std:: items up to date. > > The 1.7 release source no longer compiles out of the box on ubuntu and > some other linuxes. > > I've created a branch, named StdlibHeaders, that appears to correct > these problems, and is based on the 1.7 release version. > > Several questions: > > 1. Can Ken merge the kdharris branch into this? > > 2. Do we want to provide the binaries, if so, I would need a windows > build. > > 3. Shall we delete the copies of the test data from the main directory > and leave it in the test subdir? > > cheers, > Peter > > > -- > Peter N. Steinmetz, M.D.,Ph.D. > Program Director, Neuroengineering > Barrow Neurological Institute > Pet...@st... > 602-406-3258 > http://steinmetz.org/peter > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Klustakwik-develop mailing list > Klu...@li... > https://lists.sourceforge.net/lists/listinfo/klustakwik-develop > |
From: Peter N. S. <Pet...@st...> - 2008-11-29 04:53:14
|
Hi All, I know there hasn't been much work lately, but I've decided to abandon the main source tree with the extensive refactorings and prepare a release to bring the header files and using std:: items up to date. The 1.7 release source no longer compiles out of the box on ubuntu and some other linuxes. I've created a branch, named StdlibHeaders, that appears to correct these problems, and is based on the 1.7 release version. Several questions: 1. Can Ken merge the kdharris branch into this? 2. Do we want to provide the binaries, if so, I would need a windows build. 3. Shall we delete the copies of the test data from the main directory and leave it in the test subdir? cheers, Peter -- Peter N. Steinmetz, M.D.,Ph.D. Program Director, Neuroengineering Barrow Neurological Institute Pet...@st... 602-406-3258 http://steinmetz.org/peter |
From: Kenneth D. H. <kdh...@an...> - 2005-09-07 13:07:34
|
Hi everyone I am going to try to speed up KlustaKwik using the k-means = subsclustering trick of Kleinfeld et al. I don't have much time to devote to this. I also haven't looked at the = code in a long time, and I don't know what changes people have made since the last version I worked on (1.6).=20 My current plan is to edit version 1.6 directly; I don't have time to = learn what changes have been made since then, but I'll try to make the changes = as modular as possible, and leave incorporating them with other people's = recent edits to someone else. All the best, Ken. |
From: Ken H. <kdh...@an...> - 2004-10-19 21:44:39
|
The way I understood it was that k-means is a quick shortcut to an approximate solution, which is then used as an initialization for CEM. This is not what I originally had in mind, which was to do what = Kleinfeld Fee and Mitra do, i.e. first run k-means to produce a large number of clusters containing a small number of points (say 10); then treat each = of these clusters as a single point for CEM.=20 However, if it saves time without lowering performance, then that's = good, however it's done. I would make one further suggestion. For k-means you want approximately spherical clusters. Since all clusters have vaguely similar shapes, this can be done with a linear transformation. The = question is which one. I saw a poster a few years ago where they did something = like this with a Hadamard transform (a specific matrix of -1s and 1s). This = might be an idea - Hadamard transform the data before running k-means, then = run CEM on the original, untransformed data. Ken. -----Original Message----- From: klu...@li... [mailto:klu...@li...] On Behalf Of = Chris Thorp Sent: Thursday, October 14, 2004 11:54 PM To: Peter N.Steinmetz Cc: klu...@li... Subject: Re: [Klustakwik-develop] should k-means go in a new release? Peter, The k-means will only reduce the number of clusters going into CEM if it = is poorly nitialized. The reduction in cluster number is because of the = deletion of clusters with only a small number of points contained within = them during CEM. This results in decrease in the final score because in = almost all trials, exactly two clusters were produced at the end of=20 CEM. If the k-means clusters are initialized with random points from=20 the data set (Forgy) instead of truly random starting clusters, the=20 result is an overall slowdown (though the result is marginally better). -Chris Peter N.Steinmetz wrote: > Chris T. has been looking at initialization and performance and it > seems like k-means will reduce the number of starting clusters going=20 > into the CEM steps. And that is the main thing responsible for = speed-up. ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give = us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out = more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Klustakwik-develop mailing list Klu...@li... https://lists.sourceforge.net/lists/listinfo/klustakwik-develop |
From: Chris T. <th...@sp...> - 2004-10-15 04:02:22
|
Peter, The k-means will only reduce the number of clusters going into CEM if it is poorly nitialized. The reduction in cluster number is because of the deletion of clusters with only a small number of points contained within them during CEM. This results in decrease in the final score because in almost all trials, exactly two clusters were produced at the end of CEM. If the k-means clusters are initialized with random points from the data set (Forgy) instead of truly random starting clusters, the result is an overall slowdown (though the result is marginally better). -Chris Peter N.Steinmetz wrote: > Chris T. has been looking at initialization and performance and it > seems like k-means will reduce the number of starting clusters going > into the CEM steps. And that is the main thing responsible for speed-up. |
From: Peter N. S. <pe...@tc...> - 2004-10-14 16:28:17
|
Hi All, I'm getting ready to make a new KlustaKwik release (for Neuroscience). I'm wondering if the latest k-means option should be included? Chris T. has been looking at initialization and performance and it seems like k-means will reduce the number of starting clusters going into the CEM steps. And that is the main thing responsible for speed-up. cheers, Peter -- Peter N. Steinmetz, M.D.,Ph.D. Asst. Professor, Biomedical Eng., U. of Minnesota 612.624.7158 office 801.409.1839 fax pe...@tc... http://www.tc.umn.edu/~peter |