From: Joos K. <jo...@su...> - 2012-02-05 17:34:56
|
Dear CDK Users, for my Master Thesis in computer science (Master of Advanced Studies) I have created a simple framework based on the CDK for storing and searching chemical structures in a relational database. This framework can be found on bitbucket: https://bitbucket.org/kienerj/chemdb/overview. If you are interested please have a look at it and if question arise feel free to ask me. Also please report if you encounter any bugs. For information on how to use it you may need to have a look at the wiki: https://bitbucket.org/kienerj/chemdb/wiki/Home The tagret audience are users that need to be able to quickly and easily create a database application with chemical structure search capabilities. It is not meant to break any search speed performance records (it does not) although databases containing 100'000 records can be handled on standard desktop hardware (dual-core, 4 GB RAM). Technical Info + Features: - Search is done in application. The database is only used for storing the data. - runs on typical Open-source relational databases: MySQL, PostgreSQL, HSQLDB - structure formats supported: mol files and smiles (but you could easily create an additional interface implementation for any other format) - import of SDF including data part For code examples see: https://bitbucket.org/kienerj/chemdb/wiki/Code%20Examples For "playing around" I suggest to download the jar with dependencies. It also contains the needed parts of cdk (not full version). Note that newest CDK versions won't work as IMolecule interface is used in the framework. Will need to change that for next release. Best Regards, Joos |
From: Egon W. <ego...@gm...> - 2012-02-06 13:18:09
|
Hi Joos, On Sun, Feb 5, 2012 at 6:34 PM, Joos Kiener <jo...@su...> wrote: > for my Master Thesis in computer science (Master of Advanced Studies) I have > created a simple framework based on the CDK for storing and searching > chemical structures in a relational database. How does it optimize substructure searching? > Note that newest > CDK versions won't work as IMolecule interface is used in the framework. > Will need to change that for next release. CDK 1.4.x will be the stable release for at least the next 6 months... Also, the 'master' branch is far from being frozen and we expect more API changes... just keep that in mind :) Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Joos K. <jo...@su...> - 2012-02-06 14:47:51
|
Hi Egon, About substructure search: Basic principle is simple. First screens based on Fingerprint, then uses UIT for Subgraph matching. The user can configure any implementation of IFingerprinter. "Optimizations:" - You can specify the number of search threads (eg. threads running UIT) so one could make use of modern CPUs. - You can configure to keep the molecules in memory (in the format used in OrChem = as Strings). - limit maximum number of hits - Results are immediately available so the first search hits can already be displayed while search continues in background - Search can be limited to a list of IDs (id = primary key column). The idea behind this is, if you search for substructure + property you limit the substructure search to structures matching the property value. Best Regards, Joos Am 06.02.2012 14:17, schrieb Egon Willighagen: > Hi Joos, > > On Sun, Feb 5, 2012 at 6:34 PM, Joos Kiener<jo...@su...> wrote: >> for my Master Thesis in computer science (Master of Advanced Studies) I have >> created a simple framework based on the CDK for storing and searching >> chemical structures in a relational database. > How does it optimize substructure searching? > >> Note that newest >> CDK versions won't work as IMolecule interface is used in the framework. >> Will need to change that for next release. > CDK 1.4.x will be the stable release for at least the next 6 months... > Also, the 'master' branch is far from being frozen and we expect more > API changes... just keep that in mind :) > > Egon > |
From: Joos K. <jo...@su...> - 2012-02-08 17:11:02
Attachments:
ChemDBSimpleWebApp.png
|
Hi all, for anyone who is interested please try out the Simple Web Application I created based on the framework. See attachment for a screen-shot. You can try it out at: http://joos.home.dyndns.org:35903/ChemDBSimpleWebApp If it asks you to install something, that is the JChemPaint Applet. Note that the application might not always run (like during night time in Europe) since it is on my PC which does not run 24/7. Also not that the actual search time is much faster than the "hits counter" can make you believe. "Fun Facts": - HSQLDB Database (Memory Tables) - Database size: ~ 21'000 Structures - RAM Usage: 1 GB (Tomcat, Application and Database) - Lines of Code Required for this Application: < 1000 (including Java, HTML and JavaScript) Best Regards, Joos |