Re: [Rdkit-discuss] Molecule with no atoms, so is it valid?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: JP <jea...@in...> - 2012-01-30 20:24:27
|
Thanks for the explanation and the helpful-as-ever code snippets Greg. A molecule without atoms sounds a bit weird to me but it seems to be a perfectly legal (in CTFile definition) to have a no-atom molecule. "Each record can hold one molecule (which may be blank)." from this very inspiring article "Why not to use SDF": http://molmatinf.com/whynotmolsdf.html So a case for it exists. On the practical side, in terms of the API some things to think about: - Would you define the checks to exclude, or the ones to include in SanitzeMol? - Shouldn't the default behaviour be to run all checks (or?) ? - How would sanitizemol be called from other methods such a SDMolSupplier etc. (the methods which allow for optional sanitization)? Are you going to allow for flags to be passed there too? Isn't this going to make the API unwieldy? - Will it the flags be in the form of ints which you OR together CHECK_EMPTY_MOL | CHECK_KEKULIZE (like Java/C++ kind of options) ? PS the salt removal code is working fine as is. I'd rather remove both [Li].[Br] and check for the empty molecule as Nik suggested rather than keep one of those "little" buggers. Have a nice evening! - Jean-Paul Ebejer Early Stage Researcher On 30 January 2012 19:20, Greg Landrum <gre...@gm...> wrote: > On Mon, Jan 30, 2012 at 1:26 PM, JP <jea...@in...> wrote: > > > > But then I will have to add the "if not clean_mol.GetNumAtoms():" > > before/after replacing/editing molecule parts, after reading molecules, > > before writing them etc. i.e. I'd need this statement in a lot of places. > > This is why I asked if it should be considered a valid molecule - > because if > > these moves in SanitizeMol I wouldn't need any of that e.g. I can assume > > that the molecule I have in hand, is valid and if I still wanted these > > molecules (for some not so clear reason) I could just switch of > sanitization > > off, on the methods that allow it. > > > > Just asking, there is probably some good design decision for this which > I am > > missing... (hence the question) > > It's not an easy one. I believe there's not really a strong argument > for either behavior. As you've seen, the current behavior of the RDKit > is to treat molecules without atoms as completely legal entities. You > can test for this case the way Nik pointed out. > > I'm playing with the idea of making the SanitizeMol routine > configurable, so you could pass in a set of flags to control which > operations are carried out. If this happens, a "check for zero atoms" > flag that defaults to false could be added. I just created a feature > request for this: > > https://sourceforge.net/tracker/?func=detail&aid=3481729&group_id=160139&atid=814653 > > In the meantime, if you'd like to change the definition of > sanitization, the easiest way to do so would be to write your own > function, perhaps something like this (not tested): > > def mySanitize(mol): > if not mol.GetNumAtoms(): > raise ValueError,'molecule has no atoms' > Chem.SanitizeMol(mol) > > > Note: for the particular case of salt stripping, you can ensure that > the salt stripper doesn't remove all atoms using the > dontRemoveEverything optional argument. Take a look at the help for > SaltRemover.StripMol: > > http://rdkit.org/docs/api/rdkit.Chem.SaltRemover.SaltRemover-class.html#StripMol > > -greg > |