Re: [Rdkit-discuss] fingerprint a molecule with pseudoatoms denoted by 'Du'
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Paolo T. <pao...@gm...> - 2019-10-30 13:25:46
|
Hi Jenke, I have put together a small gist showing a slightly hacky way to round-trip a molecule containing dummy atoms through a PDB block (assuming that your molecules do not contain astatine). If your dummy atoms are called "DU" rather than " *", you may just change the replace() expression with something that fits your needs. HTH, cheers p. On 10/30/19 12:06, SCHEEN Jenke wrote: > Hi RDKitters, > > I'm trying to use rdkit to generate molecular fingerprints (such as AP > or ECFP) on molecules that have non-interactive pseudoatoms ('dummy > atoms', denoted by Du). I attached a sample PDB file containing the > dummy atoms on positions 21-24. Reading this file > (Chem.rdmolfiles.MolFromPDBFile("test.pdb", sanitize=False) throws a > post-condition violation because the element 'Du' isn't recognised, > which makes sense. I've been searching online and haven't been able to > find any workarounds, do you have any suggestions? > > Some notes: > > * I'm hoping that once rdkit is able to read in the pdb file the mol > object can be parsed without the FP constructor (e.g. > AllChem.GetMorganFingerprint) complaining. > * The use of the term dummy atoms here should not be confused with > the dummy atoms depiction in fragmentising molecules in rdkit > (where * is the smiles notation). > * For this project all I aim to do is generate structural > fingerprints for these types of ligands. This means I won't have > to worry about defining chemical properties to Du. > * The context for this issue is that we're aiming to featurise the > ligands for an ML protocol where the dummy atoms are one of the > major descriptors of the problem. > > * I thought manually inserting a 119th element in atomic_data.cpp > might resolve the issue but I've been unable to locate the file in > my conda installation. > * The ODDT python API seems to parse the Du element without any > issues but is limited in its FP generator diversity. > > > Best, > > Jenke > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |