Re: [Rdkit-discuss] Count carbon atoms
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Joos K. <joo...@gm...> - 2015-10-14 05:10:44
|
Hi Greg, thanks for your reply. Can confirm it's way faster than other 2 options. For my use case it's fast enough. Best Regards, Joos 2015-10-14 6:55 GMT+02:00 Greg Landrum <gre...@gm...>: > > > On Wed, Oct 7, 2015 at 11:12 AM, Joos Kiener <joo...@gm...> > wrote: > >> Hi all, >> >> is there an easy way I'm missing to get the number of C-Atoms in a >> molecule? >> >> Currently I iterate all atoms and check if it's symbol is C. Doesn't seem >> very efficient. >> > > There's an, obviously under-documented, simple (and fast) way to do this: > > from rdkit.Chem import rdqueries > q = rdqueries.AtomNumEqualsQueryAtom(6) > len(mol.GetAtomsMatchingQuery(q)) > > This is, based on the testing I just did, significantly faster than either > the substructure-based or GetAtoms() based approaches. I'm not sure how > generally useful it is, but this could be made even faster by adding a > Mol.CountAtomsMatchingQuery() function to the C++ interface. > > You can find a bit of explanation, as well as a list of the types of > QueryAtom functions available, by searching for QueryAtom here: > https://github.com/rdkit/UGM_2014/blob/master/Notebooks/Whats_new.ipynb > > Best, > -greg > > > |