[Clusterviz-devel] Backporting and handling 0-responsibilities

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I merged a couple of patches that had been applied to the RC branch back to=
=20
trunk. This required subsequently to make sure that the data center=20
calculation happens at a later point in time, more specifically after the=20
data transformation is done.

Apart from that I finally setteled on a way to handle the case of 0=20
responsibilities of individual data samples in soft k-means and mixture=20
models. This is a problem that is more important in higher dimensional data=
=20
as there our usual euclidean distance amounts to higher relative difference=
s=20
between data samples and cluster centers which leads to a more uneven=20
distribution of responsibilities among the data samples. When the=20
responsibilitiy for the total of clusters of an individual sample gets=20
rounded down to 0 we are in numerical problems.
As this is a non-trivial issue with these algorithms I decided that we shou=
ld=20
at least prompt the user and point him to the issue.
Possibilities in this case include: restarting the algorithm and hoping for=
=20
the best, or using hard k-means which seemingly does not suffer from this=20
problem.

We might consider implementing algorithmic remedies to deal with this probl=
em=20
and / or the problem of tiny cluster center stdandard deviations. For the=20
latter we could maybe enforce minimal values for the standard deviations.=20
This might even help in certain of the prior cases. For the prior cases we=
=20
could possibly also resort to a technique along the notion of pseudo counts=
,=20
but I did not yet think this through I must say.

=46eedback highly appreciated - as always!

With this issue settled for the moment I would consider the moment come to=
=20
release the version 0.2 of Clusterviz - at this time still without the new=
=20
additions of Karsten's in trunk. What are your thoughts? Give this RC a tes=
t=20
and tell me if there are any issues left in your opinion.

Kind regards, Jonas.