From: richard a. <ric...@ya...> - 2009-02-13 16:09:29
|
--- On Fri, 2/13/09, Andrew Dalke <da...@da...> wrote: > Does anyone have pointers to a better data set? Ideal would > be 5,000 > structures randomly selected from PubChem submissions. > I'm going to > ask them, but even though it's public data I suspect > they might > (rightly) consider the submissions as private. You might check out the links contained in the following - some of them appear to match your criteria: http://tinyurl.com/chemdatasets The terms of use for PubChem data are here, AFAIK: http://www.ncbi.nlm.nih.gov/About/disclaimer.html You can easily create your own molecular dataset by saving a PubChem query as XML: http://tinyurl.com/pugml I'm not quite sure I understand you concern about public/private. What specific right is in your mind in question? Copyright? Patent? Trade secret? In MX: http://code.google.com/p/mx-java/ and cheminfbenchmark: http://github.com/egonw/cheminfbenchmark/tree/master I've taken the same approach. It's worked pretty well so far. The nice thing is I can keep a copy of the PubChem XML query that generated the results under revision control, for example: http://tinyurl.com/chemdata If I ever need a variant of the dataset (more structures, for example), I just edit the query, submit to PubChem, and be done with it. > I've never heard of such a data set, and looking around > now I didn't > find anything. Most of the fingerprint interest I know of > is for > similarity searching, and so not really appropriate. > > Andrew > da...@da... ___________________________________ Richard L. Apodaca http://depth-first.com Blog http://metamolecular.com Company > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, > San Francisco, CA > -OSBC tackles the biggest issue in open source: Open > Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open > source participation > -Receive a $600 discount off the registration fee with the > source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |