Re: [Rdkit-discuss] A couple of questions regarding ReactionFromSmarts
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2010-06-04 04:13:43
|
Dear James, On Thu, Jun 3, 2010 at 7:51 PM, James Davidson <J.D...@ve...> wrote: > > First of all, I'd like to start by saying how much I've been enjoying > exploring the functionality of RDKit - great job, Greg! Thanks! > I have a couple of questions regarding > 'rdkit.Chem.AllChem.ReactionFromSmarts': > > (1) I see that the reaction objects can be created from MDL Reaction > Files/Blocks - is there a way to do the reverse, and save a reaction object > in MDL .rxn format? I tried using investigating the rxn.ToBinary() > attribute, but didn't get very far... The reason I wanted to do this, is > that I was trying to figure-out how to generate a form of the reaction > object (generated from reaction SMARTS) that was suitable for converting > into a 2D depiction of the transformation. At the moment the reactions are essentially input-only. There's really no way to get them out in any format that could be used elsewhere. This is a sadly missing feature: it would be really nice to be able to generate either .rxn files (or at least reaction smarts) from reactions. I will add a feature request for this, but it may take a while to happen.[1] A workaround that kind of works is to paste the reaction smarts into something like Marvin Sketch. It will normally display something that at least gives some idea of what the reaction is. > (2) I know that reaction SMARTS isn't SMIRKS, but I have noticed some > behaviour that I would not expect - however, this could be down to my > SMARTS-naivety; my SMIRKS-naivety; or both! Anytime reactions behave in ways you don't expect, it's probably best to just blame me for coming up with yet another way of expressing them that is slightly incompatible with the existing ones. :-) > I initially tried the > following: > > from rdkit import Chem > from rdkit.Chem import AllChem > rxn_smarts = > '[!#1:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[!#1:1]-[C:3](=[O:4])-[NH:2]-[C,c:5]' > sm = Chem.MolFromSmiles('CC(=O)NC') > rxn = AllChem.ReactionFromSmarts(rxn_smarts) > prods = rxn.RunReactants((sm,)) > prod = Chem.MolToSmiles(prod[0][0]) > > > This gives me prod = '[H]C(=O)NC' There's a discussion of this kind of case in the "RDKit Book" ($RDBASE/Docs/Book/RDKit_Book.pdf) starting on page 3. The short answer is that if you have a query feature (atom list, recursive smarts, etc.) in the reactants and you would like the matching atom to be copied into the products you should include a dummy for that atom in the products. A working form of your example is then: [11]>>> rxn_smarts = '[!#1:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[*:1]-[C:3](=[O:4])-[NH:2]-[*:5]' [12]>>> rxn = AllChem.ReactionFromSmarts(rxn_smarts) [13]>>> prods = rxn.RunReactants((Chem.MolFromSmiles('c1ccccc1C(=O)NCC1CC1'),)) [14]>>> Chem.MolToSmiles(prods[0][0]) Out[14] 'O=C(CC1CC1)Nc1ccccc1' As an aside, in SMARTS it's shorter (and I think clearer) to write [C,c] as [#6]. It also produces a query that runs a bit quicker, but you probably won't notice that difference in most cases. > If I replace with <<rxn_smarts = > '[!H:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[!H:1]-[C:3](=[O:4])-[NH:2]-[C,c:5]'>>, > I get the behaviour I want - with prod = 'CNC(=O)C'. So I think I can get > the behaviour I want, but was curious if I am using the SMARTS ! operator > incorrectly in conjunction with atomic numbers, or whether this may be a > bug? Not really a bug. The behavior when you have queries in the products is undocumented: depending on the details of the query it will sometimes do the right thing, sometimes not. It's much safer to just use "*". What I probably should do is add a warning message if the reaction contains a query in the products, I will think about this. Best Regards, -greg [1] The underlying problem isn't actually generating the rxn files themselves, they are just a collection of mol blocks with a bit of extra verbiage sprinkled around. The problem is generating reasonable mol blocks for molecules with query features. I already have a feature request in for that one (http://sourceforge.net/tracker/?group_id=160139&atid=814653), but it turns out to not be quite as easy as it sounds. |