From: Greg D. <gd...@be...> - 2007-12-20 23:00:37
|
Dear ESOM users, I am using ESOM to cluster DNA sequences from environmental = microorganisms, based on genome wide signatures (tetranucleotide = frequency). Overall I am very happy with the results and it has proven = to be an extremely valuable tool for our research group. There are two = areas that we are hoping to develop further and I am curious if anyone = has suggestions or comments: (1) Are there any automated methods for clustering data? The boundaries = for our clusters range from obvious to questionable. While this = variable strength of clustering is useful information in itself, we = would like to develop an automated method for defining clusters in order = to avoid potential errors in where we draw the lines (it is not always = entirely clear how to do so). (2) Are there statistical tools that have been developed or applied to = ESOM to evaluate the robustness of clustering (ideally on a per-cluster = basis)? We are interested in such an analysis, which would either be = based on the U-matrix distance structure and/or an evaluation of the = accuracy of the clustering (for much of our data we know the true = cluster affiliations). Any suggestions or references relevant to these areas would be greatly = appreciated. Greg Dick Postdoctoral Research University of California, Berkeley =20 |