[Rdkit-discuss] Clustering
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Chris S. <sw...@ma...> - 2017-06-04 07:08:14
|
Hi, I want to do clustering on around 4 million structures The Rdkit cookbook (http://www.rdkit.org/docs/Cookbook.html <http://www.rdkit.org/docs/Cookbook.html>) suggests "For large sets of molecules (more than 1000-2000), it’s most efficient to use the Butina clustering algorithm” However it is quite a step up from a few thousand to several million and I wondered if anyone had used this algorithm on larger data sets? As far as I can tell it is not possible to define the number of clusters, is this correct? Cheers, Chris |