Re: [Rdkit-discuss] MolFromPDBBlock and heterocycles
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Sereina <ser...@gm...> - 2016-09-07 19:47:02
|
Hi Steven, The PDB reader in the RDKit doesn’t determine any bond orders - everything is read as a single bond. In order to set the bond orders, you need to call the AssignBondOrdersFromTemplate() function using a reference molecule generated from SMILES (or SDF). Here is some example code from the docs: >>> from rdkit.Chem import AllChem >>> template = AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2ccccc2)c3ccccc3)N") >>> mol = AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir, 'Chem', 'test_data', '4DJU_lig.pdb')) >>> len([1 for b in template.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 8 >>> len([1 for b in mol.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 22 Now assign the bond orders based on the template molecule >>> newMol = AllChem.AssignBondOrdersFromTemplate(template, mol) >>> len([1 for b in newMol.GetBonds() if b.GetBondTypeAsDouble() == 1.0]) 8 Note that the template molecule should have no explicit hydrogens else the algorithm will fail. Hope this helps. Best, Sereina On 07 Sep 2016, at 17:16, Steven Combs <ste...@gm...> wrote: > Hello! > > I have a pdb block that I am working with, which is attached to this email. The ligand has aromatic ring structures in it; however, when it is read into RDKit and converted into a smiles string, the aromatic rings are converted into aliphatic rings. Any thoughts? > > Here is the python code: > > def extract_data( filename): > extracted_info = "" > with open(filename) as f: > for line in f.readlines(): > if "HETATM" in line: > extracted_info += ( line) > return extracted_info > > for index, filename in enumerate(solution_pdb_filenames): > row = extract_data( filename) > m = Chem.MolFromPDBBlock(row, sanitize=True, removeHs=False ) > Chem.SetHybridization(m) > Chem.SetAromaticity(m) > Chem.SanitizeMol(m, sanitizeOps=Chem.rdmolops.SanitizeFlags.SANITIZE_ALL) #not needed since sanitizing during read in, but trying to figure out if it actually worked > print ("Parsing file " + str(index) + " of " + str(len(solution_pdb_filenames))) > print (Chem.MolToSmiles(m, kekuleSmiles=True, allHsExplicit=True)) > > The output smile string is: > > [H][O][CH]1[NH][CH]([C]([H])([H])[CH]([OH])[OH])[CH]([C]([H])([H])[C]([H])([H])[H])[CH]([CH]([OH])[CH]2[CH]([H])[CH]([H])[CH]([H])[CH]([N]([H])[H])[CH]2[H])[CH]1[N]([C]([H])([H])[H])[C]([H])([H])[H] > > Steven Combs > > > <test.pdb>------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |