From: <fa...@my...> - 2007-12-27 14:05:09
|
Dick, Prof. Ultsch has developed a method for automated clustering, see http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/2005/ultsch05clustering http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/2005/ultsch05ustarc If you contact him directly, he might be willing to share the matlab =20 code with you. I know that a student is working on integrating the =20 method into the ESOM tools but I don't know if and when it would be =20 available. For the comparison of clusterings or evaluation with know results I =20 recommend to look up measures to compare classifications, for example =20 precision/recall, sensitivity/specificity. you can average these =20 measures over multiple clustering runs. Please be more specific if =20 these general methods do not suffice. best fabian Quoting Greg Dick <gd...@be...>: > Dear ESOM users, > > I am using ESOM to cluster DNA sequences from environmental =20 > microorganisms, based on genome wide signatures (tetranucleotide =20 > frequency). Overall I am very happy with the results and it has =20 > proven to be an extremely valuable tool for our research group. =20 > There are two areas that we are hoping to develop further and I am =20 > curious if anyone has suggestions or comments: > > (1) Are there any automated methods for clustering data? The =20 > boundaries for our clusters range from obvious to questionable. =20 > While this variable strength of clustering is useful information in =20 > itself, we would like to develop an automated method for defining =20 > clusters in order to avoid potential errors in where we draw the =20 > lines (it is not always entirely clear how to do so). > > (2) Are there statistical tools that have been developed or applied =20 > to ESOM to evaluate the robustness of clustering (ideally on a =20 > per-cluster basis)? We are interested in such an analysis, which =20 > would either be based on the U-matrix distance structure and/or an =20 > evaluation of the accuracy of the clustering (for much of our data =20 > we know the true cluster affiliations). > > Any suggestions or references relevant to these areas would be =20 > greatly appreciated. > > Greg Dick > Postdoctoral Research > University of California, Berkeley > > |