Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Mike M. <mi...@no...> - 2019-11-01 10:47:35
|
Hi,
Thanks for your response.
The problem is that I’d like to chunk Pandas dataframes to different processors. And efficiently as possible, remove those rows which fail to be converted into RDKit Mols. What I find however, is that the entire process dies if the PandasTools fails to convert a SMI to a Mol. Chunking individual rows (chunk = 1) should ensure that row operations get sent to processors and fail and will not affect “good” molecules as they would be in separate dataframes. But this isn’t every efficient for Pool, I’d rather chuck the dataframe into 5-10% chunks.
So the question is. How to catch failed compounds within a dataframe and still write out something in the new fields (like add none to ROMol and HAC).
Does that make sense? Sorry if this isn’t very clear.
Cheers,
mike
From: Greg Landrum <gre...@gm...>
Sent: 01 November 2019 10:40
To: Mike Mazanetz <mi...@no...>; RDKit Discuss <rdk...@li...>
Subject: Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule
What I'm failing to understand here is what you want to do.
Do you want the rows with molecules that failed to parse to remain in the DataFrame?
If not you can just remove them (there's probably a simpler way to do this, but Pandas never fails to surprise me):
filtered_df = df[df['ROMol'].astype(str).ne('None')]
-greg
On Thu, Oct 31, 2019 at 11:32 AM Mike Mazanetz <mi...@no... <mailto:mi...@no...> > wrote:
Hi Taka and Jan,
Thanks for your help.
Worked out that I shouldn’t have added the names=[] when I read in my csv file (woops).
It fails if you have a mol which is None, I’ll have to add a line asking it to check that ROMol isn’t None first. Annoying.
Thanks for your help,
mike
From: Taka Seri <ser...@gm... <mailto:ser...@gm...> >
Sent: 31 October 2019 10:15
To: Jan Halborg Jensen <jhj...@ch... <mailto:jhj...@ch...> >
Cc: Mike Mazanetz <mi...@no... <mailto:mi...@no...> >; RDKit Discuss <rdk...@li... <mailto:rdk...@li...> >
Subject: Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule
Hi,
Pandas apply function will work too.
AddMoleculeColumnToFrame(DF, "Smiles") at first.
Default setting, rdkit mol object will be added "ROMol" column in your dataframe.
https://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html
Then call apply function to apply a calculation function an axis of ROMol.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
DF['HAC'] = DF["ROMol"].apply(Chem.Lipinski.HeavyAtomCount)
Best regards,
Taka
2019年10月31日(木) 18:30 Jan Halborg Jensen <jhj...@ch... <mailto:jhj...@ch...> >:
Hi Mike
This should work
DF[‘HAC’] = [Chem.Lipinski.HeavyAtomCount(mol) for mol in DF[‘Molecule’]]
Best regards, Jan
On 31 Oct 2019, at 10.16, Mike Mazanetz <mi...@no... <mailto:mi...@no...> > wrote:
Hi RDKit Gurus,
I’ve followed the docs and created a molecule column in my Pandas dataframe.
However, I do not seem to be able to do molecular operations on the column.
For example, if you had a SMILES column, how would you calculate heavy atom count and append this result to a new column?
This doesn’t work:
DF[‘HAC’] = Chem.Lipinski.HeavyAtomCount(DF[‘Molecule’])
Where the Molecule column is generated by PandasTools.AddMoleculeColumnToFrame
Thanks,
mike
_______________________________________________
Rdkit-discuss mailing list
<mailto:Rdk...@li...> Rdk...@li...
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li... <mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li... <mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
|