From: Rajarshi G. <rg...@in...> - 2008-02-19 18:45:31
|
On Feb 19, 2008, at 12:03 PM, Peter Maas wrote: > Well I copied most of your examples. > Please find it enclosed. > I'm running it against our 10 mg stock (>250k structures). Hmm, I ran it on OS X (2.2 GHz, 1GB RAM, JDK 1.5) and it processed 277 relatively small molecules in 7 sec. I did some rough testing of my own and I used a molecule from PubChem (CID = 52) which has 55 heavy atoms. It turns out that the polarizability code takes nearly a minute to run. I tweaked the polarizability calculation so that now it takes 4sec to run, bringing the processing time for this molecule down to 4.9s. Also, the 277 SDF file I mentioned above now takes 3.1s However it does still slow down for some large molecules (such as Pubchem CID 182) and I suspect that path length calculation could be improved. I'll look at that in a few days. Are your molecules very large? In any case, the latest improvements are in SVN, so you should sync and recompile. Things should go faster. > I like to give R a shot clustering it but I'm afraid R also will > not be up > to it. Well creating a 250K x 250K distance matrix will bring most machines to their knees, unless you have a very large amount of RAM. But you could look at methods like spectral clustering etc which can be more efficient for larger datasets ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- After an instrument has been assembled, extra components will be found on the bench. |