[Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Riccardo V. <ric...@gm...> - 2011-07-01 16:22:26
|
Hi all, I've started working on an extension of the SQLAlchemy database toolkit that is aimed to support direct access from python to the functions and data types exposed by the database chemical cartridge. In brief this means that instead of interacting with the RDBMS using raw SQL queries, it may become possible to execute the entire workflow (data preprocessing and cleanup, insertion, selection and further processing) without leaving the python interpreter, and at the same time delegating the construction of the required SQL expressions to a higher-level API. Just to make a simple example, instead of using select count(*) from molecules where structure @> 'O=C1OC2=CC=CC=C2C=C1'; one might type something like the following: >>> constraint = Molecule.structure.contains('O=C1OC2=CC=CC=C2C=C1') >>> print session.query(Molecule).filter(constraint).count() (ok, in this specific case the python expression is a bit more verbose, but it's a very simple SQL query :-) The project is still in an initial phase, and the code is far from being mature, but the development is currently strongly focused on the RDKit postgresql extension. Structure searches and molecular descriptors should be fully supported, and bit fingerprints and associated similarity operators are also available (but modifying the default threshold similarity values is not yet possible). The code is currently hosted on github https://github.com/rvianello/razi and some draft documentation (at the moment mainly intended to illustrate the idea than providing a detailed reference) is also available: http://razi.readthedocs.org If you use the RDKit chemical cartridge or SQLAlchemy (or both), I hope you will find the idea interesting and I'd love to hear from you. Comments, ideas and suggestions would be very welcome. Cheers, Riccardo |