Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Paul E. <pe...@mr...> - 2021-09-28 08:45:42
|
> PDB files have no bond information, This is not true. The chemistry is specified in the Chemical Component Dictionary using the residue identifier (so it's a reference to a chemical description, it's not embedded). https://www.wwpdb.org/data/ccd https://github.com/pdbeurope/ccdutils Paul. On 27/09/2021 11:22, Lewis Martin wrote: > Very interesting - thank you Francois! PDB re-do does the trick: > > *import requests > from rdkit import Chem > > def getPDB(code): > out = requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb > <https://pdb-redo.eu/db/{code}/{code}_final.pdb>') > return out.content > > pdb_string = getPDB('3udn') > Chem.MolFromPDBBlock(pdb_string)* > > I think this solves it for me, but if anyone knows how to infer correct bonding information without relying > on distances, I'd love to hear it too! So far I've noticed that Parmed and PDBFixer infer correct bonds, but > they don't determine bond orders, so it's difficult to port the molecule into RDKit. > > Cheers > Lewis > > > > On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger <ml...@li... <mailto:ml...@li...>> wrote: > > Hi Lewis, > > Just an idea: you might try to load your PDB in UCSF Chimera, then > save it as a mol2 or sdf file. > Then, try to read this sdf file from rdkit. > > Another idea: try to get your pdb file through the pdbredo service. > https://pdb-redo.eu/ <https://pdb-redo.eu/> > They might have fixed a few things; maybe this PDB will read better in > rdkit. > > Regards, > F. > > On 26/09/2021 17:02, Lewis Martin wrote: > > Hi RDKit, > > While parsing proteins from the PBD with RDKit, I've come across > > situations where the distance-based bond determination leads to > > 'incorrect' bonds between atoms that are erroneously too close > > together. PDB files have no bond information, so it's not really > > 'incorrect' (rather the model coordinates are off), but the bonds are > > nonphysical - and it means the Mol objects won't sanitize. > > > > Here's an example: > > > > import requests > > from io import BytesIO > > import gzip > > from rdkit import Chem > > > > def getPDB(code): > > out = > > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz > <https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz> [1]') > > binary_stream = BytesIO(out.content) > > return gzip.open(binary_stream).read() > > > > pdb_string = getPDB('3udn') > > Chem.MolFromPDBBlock(pdb_string) > > > > Error is: > > > > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is > > greater than permitted > > > > This is caused by the threonine 72 sidechain being too close to the > > TYR71 backbone carbonyl oxygen (this can be visualized at > > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B > <https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction&sele=09B> , > > TYR71 is near the ligand). > > > > Does anyone know how to avoid this to create a Chem.Mol? I've tried > > using Parmed and PDBFixer, since they use residue templates to > > generate the correct bonding topology, but they don't write CONECT > > records or SDFs, so the bonds are still lost to RDKit. > > > > Thanks for your time! > > Lewis > > PS - why not just use PDBFixer? I'm trying to calculate atom > > invariants using RDKit's morgan fingerprinter implementation, so > > ultimately I want a sanitized Mol object > > > > Links: > > ------ > > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz > <https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz> > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdk...@li... <mailto:Rdk...@li...> > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |