Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

python-cluster / News: Recent posts

python-cluster release 1.1.2

Fixed bug caused by numpy arrays and KMeansClustering.

The KCuster constructor now accepts an optional function to test for item equality. If the clusters contain numpy arrays, you can pass "numpy.array_equals".

Posted by Michel Albert 2013-03-11

Project moved to github

The project is now available in github here:
http://www.github.com/exhuma/python-cluster

The news-feed and project page will remain on sourceforge for now.

Posted by Michel Albert 2013-03-11

Migrated to SVN

Finally migrated to SVN (https://sourceforge.net/svn/?group_id=170665).

I did not bother about migrating the history as well. Not really necessary for such a small project. I will leave CVS access enabled for a while but will take it down eventually. So if you pull from CVS, make sure that you switch to the SVN repo as soon as possible.

Posted by Michel Albert 2007-10-14

Broken Link

A helpful hand discovered a broken link in the source.

Broken link: http://mail.python.org/pipermail/python-list/2004-December/253517.html

Alternative link: http://mail.python.org/pipermail/python-list/2004-December/294990.html

It's not the same link, but contains similar information. I hope this link will stay.

Posted by Michel Albert 2007-05-24

python-cluster release 1.1.0b1

1.1.1b1
- Applied patch [1535137] (thanks ajaksu)
--> Topology output supported
--> data and raw_data are now properties.

Posted by Michel Albert 2006-08-19

python-cluster release 1.1.0b1

K-Means clustering is now implemented

Posted by Michel Albert 2006-07-12

python-cluster release 1.0.1b3

This is a bug-fix release!
It fixes bug #1516204 that caused the clustering algorithm to raise an exception if an empty list or a list of only one item was supplied as argument.

In addition I added some unit-tests. This keeps development and bug-tracking a lot easier.

Posted by Michel Albert 2006-07-06

Typo. Not "ngram" but "cluster"

Eeeks. The last header in the news read "new release for python-ngram". Although that is somewhat true, it really should have read "python-cluster".

Sorry.

Posted by Michel Albert 2006-07-05

Release of python-ngram 1.0.1b2

Finally I got around to build the dist-files and upload them. Enjoy.

You might notice that there are only the source distribution and the windows binary distribution. I decided to drop the rest because they are in my opinion unneeded. They all behave exactly the same as the source distribution anyway. Yes, I could have created the other files as well for convenience, but how hard is it really to type "python setup.py install"? ;)... read more

Posted by Michel Albert 2006-07-05

New version in CVS (1.0.1b2)

I finished a new version today. Now the hierarchical clustering works twice(!) as fast. This is achieved because the distance-matrix which is generated internally is symmetric. So I only need to calculate one half of the possible combinations.

More improvement is possible. But I gave up on that today. Too complex.

I also started to work out the details for a K-Means algorithm on the Airport this weekend. On paper it looks sensible and I beleive it should work as I wrote it down. Now I only need to beam my scribblings onto the harddisk. And hope it all works ;)

Posted by Michel Albert 2006-07-03

K-Means

Late last night I started having a look into the K-Means algorithm. Seems easy as such. But.... as I want to keep a general approach so one could cluster any object it becomes a bit less a trivial task.

The problem/assumption of the K-Means algorithm is that the data-elements need to be representable in vector-space. This is something I cannot get around.

It does not look too complicated. I just have to find a general approach how to calculate the centroid of a set of objects. Maybe this requires that (similar to the distance function with the hierarchical cluster) the user needs to supply a utility function. But I'll try to avoid that.... read more

Posted by Michel Albert 2006-06-27

New Release

1.0.1b1 is released.
This now supports different linkage algorithms.

It is becoming obvious though that some rethinking is needed soon to implement other clustering algorithms. We will see.

Iam hoping that this won't be the case, as I intend to enable the different algorithms first and worry about optimizing later.

Posted by Michel Albert 2006-06-26

Revision Control

I gave up for on SVN. Somehow sourceforge does not like me. As this module has a pretty simple file-structure anyway, I can live with that.

For now it's available in CVS. Information on accessing CVS can be found here: http://sourceforge.net/cvs/?group_id=170665

Posted by Michel Albert 2006-06-25

Organization

Alright. The project home-page is up. There's not much on it yet, but it's there so people don't get presented this empty dir-listing anymore.

I tried to get the project into SVN, but I get some errors. I am still investigating that.

Posted by Michel Albert 2006-06-24

Wehere do we go from here

The first algorithm is finished. Next I will implement the different methods of calculating the distance between one cluster and another. Once that is done I will implement the other clustering algorithms. This second part could take a while as I don't need it. I would only do it to make this package more complete.

Posted by Michel Albert 2006-06-23

Optimization

I identified two sources for optimisation.
- Every iteration during clustering the matrix is completely re-generated. Instead when clustreing a pair of items, it should only remove those two elements from the list and append the new cluster. This would save an awful lot of operations.
- The distance from A to B is the same as from B to A. That means that the matrix is symmetric. Therefore, we only need to generate and examine half of the matrix. Again, that would be a massive speedup.

Posted by Michel Albert 2006-06-23

Bugs fixed

Good good. It's working.
One last test with a larger data-set is currently running. Once that's done and shows proper results, I'll submit new files. Due to the horrid complexity of most clustering algorithms, and because I did not yet worry about optimizing this stuff, it runns terribly slow on large data sets.

Posted by Michel Albert 2006-06-22

Still not quite done

The crash bug is now solved.
But now something else crept up. Somehow the data returned is not quite correct. Fiddling around with that method again makes me shiver. Tried many times, but somehow I always screwed up with that one. Although it should be really easy.

Posted by Michel Albert 2006-06-22

Oh bugger

Great. Now I hammered together a quick webpage for this thing, and have to realise, that the only way to upload it is via SSH. That means, no webpage for this project before I'm getting home. I'm disappointed :|

Oh well. All you need is in the python-docs anyway. If you need to know how to work things, run a python shell, and do this:

from cluster import *
help(HierarchicalClustering)

Posted by Michel Albert 2006-06-22