Re: [Rdkit-discuss] molecule standardization in cartridge search
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Tim D. <tdu...@gm...> - 2015-09-27 07:58:46
|
Jan and others Thanks for all your suggestions. Just what I need. Tim On 26/09/2015 12:26, Jan Holst Jensen wrote: > Hi Tim, > > Soren (cc:ed) wrote me and asked about molvs. Thanks to Soren for > reminding that the original question was about standardization more > than calling Python code from Postgres :-). > http://molvs.readthedocs.org/en/latest/ > > Take a look at molvs - it's got lots of functionality that you will > need. We also use molvs as the backbone for much of our standardization. > > Cheers > -- Jan > ________________________ > > Hi Tim, > > A simple getting-started example is: > > CREATE FUNCTION smiles2molfile(smiles text) RETURNS text > LANGUAGE plpythonu AS $$ > import rdkit > from rdkit import Chem > > mol = Chem.MolFromSmiles(smiles) > return Chem.MolToMolBlock(mol) > $$; > > > and you can then > > select smiles2molfile('CC'); > > and get back a molfile. > > For more advanced usage it is worth taking a look at the rdchord > project that TJ has sent links to. > > Cheers > -- Jan > > On 2015-09-25 15:54, Tim Dudgeon wrote: >> Jan, >> >> thanks for that. I'll give it a try. >> Are there any examples of writing RDKit functions and procedures for >> postgres in python? >> I see this general postgres docs: >> http://www.postgresql.org/docs/9.4/static/plpython.html >> but wondered if there are any RDKit specific examples anywhere? >> >> Tim >> >> On 25/09/2015 08:30, Jan Holst Jensen wrote: >>> On 2015-09-24 16:22, Tim Dudgeon wrote: >>>> I'm trying to get to grips with using the RDKit cartridge, and so far >>>> its going well. >>>> One thing I'm concerned about is molecule standardization, along the >>>> lines of the ChemAxon Standardizer that allows substructure searches to >>>> be done is a way that is largely independent of the quirks of structure >>>> representation. The classic example would be how nitro groups are >>>> represented, so that it didn't matter which nitro representation was in >>>> the query or target structures, because both were converted to a >>>> canonical form. >>>> >>>> My initial thoughts are that this would be done by: >>>> 1. loading the "raw" structures into a source column that would never be >>>> changed >>>> 2. defining a function that performed the necessary transform to >>>> generate the canonical form of a molecule. >>>> 3. generating a "canonical" structure column that was the result of >>>> passing the raw structures through that function >>>> 4. building the SSS index on that canonical column >>>> 5. executing queries using that function to canonicalize the query >>>> structure >>>> >>>> The problem I'm finding is that there do not seem to be postgres >>>> functions defined for doing molecular transforms (essentially a reaction >>>> transform) and doing things like removing explicit hydrogens. At least >>>> not in the functions listed on this page: >>>> http://rdkit.org/docs/Cartridge.html#functions >>>> >>>> Am I missing something here, or might I be barking up completely the >>>> wrong tree? >>>> >>>> Tim >>> Hi Tim, >>> >>> We have about the same situation and we're adding standardization >>> (beyond what RDKit implicitly does when it sanitizes the molecule) >>> through Python stored procedures. You will need to build and maintain >>> a normal Python-enabled RDKit installation in parallel to the >>> cartridge. The Python stored procedures can access the normal RDKit >>> installation and then run whatever Python code is necessary to do >>> additional molecule cleanup. >>> >>> You will need to tweak your Postgres environment so the Python stored >>> procedures can load RDKit. This is what I have defined in an >>> environment file on CentOS: >>> >>> RDBASE=/opt/rdkit >>> LD_LIBRARY_PATH=/opt/rdkit/lib >>> PYTHONPATH=/opt/rdkit >>> >>> On Ubuntu this would go into /etc/postgresql/9.x/main/environment (in >>> a slightly different format where the values have to be single-quoted). >>> >>> Cheers >>> -- Jan, Biochemfusion > |