[Databionic-ESOM-User] automated data clustering?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear ESOM users,

I am using ESOM to cluster DNA sequences from environmental =
microorganisms, based on genome wide signatures (tetranucleotide =
frequency).  Overall I am very happy with the results and it has proven =
to be an extremely valuable tool for our research group.  There are two =
areas that we are hoping to develop further and I am curious if anyone =
has suggestions or comments:

(1) Are there any automated methods for clustering data?  The boundaries =
for our clusters range from obvious to questionable.  While this =
variable strength of clustering is useful information in itself, we =
would like to develop an automated method for defining clusters in order =
to avoid potential errors in where we draw the lines (it is not always =
entirely clear how to do so).

(2) Are there statistical tools that have been developed or applied to =
ESOM to evaluate the robustness of clustering (ideally on a per-cluster =
basis)?  We are interested in such an analysis, which would either be =
based on the U-matrix distance structure and/or an evaluation of the =
accuracy of the clustering (for much of our data we know the true =
cluster affiliations).

Any suggestions or references relevant to these areas would be greatly =
appreciated.

Greg Dick
Postdoctoral Research
University of California, Berkeley     =20