Re: [Rdkit-discuss] Isomeric smiles and explicit hydrogens
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2008-04-14 16:25:46
|
Hi Noel, You already figured out the problem with the chirality of chlorobromomethane, but I want to clarify a couple of things below. On Mon, Apr 14, 2008 at 12:50 PM, Noel O'Boyle <bao...@gm...> wrote: > > I'm trying to specify the chirality of the carbon in > chlorobromomethane, but RDKit is not picking up on the chirality: > > >>> rdk.readstring("smi", "[C](Cl)Br").write("iso") > 'ClCBr' > (No chirality, as expected) Just to be clear on this one, the output here is not technically correct; you've input a molecule with the formula CClBr (you told the software that the C has no implicit Hs by putting it in square brackets), the output however is for something with the formula CH2ClBr. This is actually a bug; thanks for finding it. :-) https://sourceforge.net/tracker/index.php?func=detail&aid=1942220&group_id=160139&atid=814650 > >>> rdk.readstring("smi", "[C@@H](Cl)Br").write("iso") > 'Cl[CH]Br' > >>> rdk.readstring("smi", "[C@](Cl)Br").write("iso") > 'ClCBr' > >>> rdk.readstring("smi", "Cl[C@]Br").write("iso") > 'ClCBr' > >>> rdk.readstring("smi", "Cl[C@@H]Br").write("iso") > 'Cl[CH]Br' > (Expected chirality, but didn't get it) As you've realized: this molecule isn't chiral, so the RDKit is doing the right thing by not marking chirality. It's doing something arguable with the canonical smiles though, because it's showing the explicit H (inside the square brackets). If you input exactly the same molecule as ClCBr, you'd get a different canonical smiles. This is a known oddity of the way things are currently handled internally and I haven't quite figured out a solution yet. Basically explicit Hs remain always explicit, even if they don't need to be. > Let's try 1-chloro,1-bromoethane: > > >>> rdk.readstring("smi", "Cl[C@@](Br)C").write("iso") > 'CC(Cl)Br' > (Expected chirality, but didn't get it) Again, the molecule as provided isn't chiral because carbon 1 only has three neighbors (you've told it that there are no implicit Hs). > >>> rdk.readstring("smi", "Cl[C@@H](Br)C").write("iso") > 'C[C@@H](Cl)Br' > (Expected chirality, and got it) It's even the right chirality, which is good to see. :-) > Is the problem with me or with RDKit? I'll answer that "or" question with a "yes", because it's a little of both. :-) > On a related note, I have found that RDKit, when reading SDF files, > turns all of the hydrogens into implicit hydrogens. correct. > However, when > reading SMILES strings, it retains any explicit hydrogens specified in > C@@H expressions. This doesn't seem to be consistent and requires the > user to remove hydrogens if he/she wants to create a canonical smiles > string. I commented on this above. It's a known problem and I've been stewing over how to solve it for a while. Now that someone other than me is complaining I'll bump it up a bit in priority. -greg |