Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Markus H. <mar...@mo...> - 2013-05-07 11:13:37
|
Thanks again for your reply. That's what I have tried: from rdkit import Chem from rdkit.Chem import AllChem import pandas as pd from rdkit.Chem import PandasTools from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False) display(HTML(df.to_html())) So it is a dataframe and .to_html() works fine in general. I see all sdf fields. It's just that the molecule column contains string value of this kind: <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAASwAAAEsCAYAAAB ... The notebook somehow does not realize that it is an html tag with an image, but instead renders it as a normal string (just like before with the single molecule). Best wishes, Markus On 05/07/2013 12:57 PM, Nikolas Fechner wrote: > Just for clarification, are you trying to render a dataframe or a > series/single column? The pandas series object has no to_html() method > and is therefore rendered as string only. Moreover, if you select a > single column, e.g. 'ROMol' from a dataframe by df['ROMol'] you will > get a series object that is rendered as string. If you select a set of > columns you get a dataframe, for which the HTML rendering should work. > The latter also works for a single column if you enclose in double > brackets df[ *[*'ROMol' *]*], which will give a single-column > dataframe. This took me some time to figure out and the silent > conversion that sometimes occurs can be quite confusing. > Best, > Niko > > On May 7, 2013 at 11:33 AM Markus Hartenfeller > <mar...@mo...> wrote: >> Thanks for your help, Niko. Importing the iPythonConsole from rdkit + >> removing the 'print' command did the trick for a single molecule :) >> >> Unfortunately, molecules in data frames are still shown as strings, >> even when forcing html rendering. I will try to get this working and >> report here if I make any progress. In case somebody has already >> faced the same problem please let me know. >> >> Best, >> Markus >> >> >> On 05/07/2013 10:27 AM, Nikolas Fechner wrote: >>> Hi Markus, >>> glad you think it could be useful :). Regarding the problem, there >>> are two things: You have to import the RDKit IPythonConsole to >>> enable the molecule rendering (from rdkit.Chem.Draw import >>> IPythonConsole) and if you trigger the output using 'print' the >>> notebook will always use string rendering (AFAIK). Just try 'm' >>> alone (instead of 'print m'). Alternatively, you can always force >>> the notebook to do a HTML rendering (useful for large dataframe): >>> from IPython.core.display import HTML >>> display(HTML('''any HTML string e.g. dataframe.to_html()''')) >>> I hope that helps. >>> Best, >>> Niko >>> >>> On May 7, 2013 at 10:02 AM Markus Hartenfeller >>> <mar...@mo...> >>> <mailto:mar...@mo...> wrote: >>>> Hi Nikolas, >>>> >>>> I had a first look at the PandasTools package: very cool! I think >>>> this is going to be useful for many rdkit users. I'm looking >>>> forward to using it in the future. Thanks for sharing this module. >>>> >>>> I'm having troubles to see the molecule depictions in the ipython >>>> notebook though (both in tables and by just printing out a single >>>> molecule). >>>> >>>> This code in a ipython notebook >>>> >>>> from rdkit import Chem >>>> from rdkit.Chem import PandasTools >>>> m=Chem.MolFromSmiles('N1CCNCC1') >>>> print m >>>> >>>> gives me >>>> <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAASwAAAEsCAYAAAB ... >>>> a very long string with the base64 encoding of the image, but not >>>> the image itself. Plotting from matplotlib works fine. Did I forget >>>> to import something, or could it be a browser issue? I am using >>>> centOS 6 and Firefox. >>>> >>>> Thanks in advance. >>>> >>>> Best, >>>> Markus >>>> >>>> >>>> On 04/19/2013 11:56 AM, Nikolas Fechner wrote: >>>>> Dear all, >>>>> We developed a new module ( rdkit.Chem.PandasTools.py ) that >>>>> allows for using RDKit molecule objects directly in pandas >>>>> dataframes. Pandas ( http://pandas.pydata.org/) is a python >>>>> library that offers table-like datacontainers, which are >>>>> incredibly useful for anything related to data mining. Moreover, >>>>> it integrates nicely with the ipython notebook producing rendered >>>>> HTML tables for the dataframes. The RDKit integration allows to >>>>> have molecule-type columns and functionality to perform >>>>> substructure-based row filtering directly on the pandas table. >>>>> Additionally, if a dataframe is exported as HTML or shown within >>>>> an ipython notebook, the molecules in the table are rendered as 2D >>>>> structures. >>>>> The new module is available in the current SF trunk and contains a >>>>> doctest header that provides examples of how to use it. >>>>> I hope some of you find that interesting. As always, bug reports, >>>>> comments, ideas... are very much appreciated. >>>>> Best, >>>>> Nikolas >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Precog is a next-generation analytics platform capable of advanced >>>>> analytics on semi-structured data. The platform includes APIs for building >>>>> apps and a phenomenal toolset for data science. Developers can use >>>>> our toolset for easy data analysis& visualization. Get a free account! >>>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>>>> >>>>> >>>>> _______________________________________________ >>>>> Rdkit-discuss mailing list >>>>> Rdk...@li... <mailto:Rdk...@li...> >>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Learn Graph Databases - Download FREE O'Reilly Book >>> "Graph Databases" is the definitive new guide to graph databases and >>> their applications. This 200-page book is written by three acclaimed >>> leaders in the field. The early access version is available now. >>> Download your free book today! >>> http://p.sf.net/sfu/neotech_d2d_may_______________________________________________ >>> >>> Rdkit-discuss mailing list >>> Rdk...@li... >>> <mailto:Rdk...@li...> >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > ------------------------------------------------------------------------------ > > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200-page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! > http://p.sf.net/sfu/neotech_d2d_may_______________________________________________ > > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- *Markus Hartenfeller* Chemoinformatics Specialist Molecular Health GmbH Belfortstr. 2 69115 Heidelberg Germany Tel: +49 6221 43851 209 Fax: +49 6221 43851 100 Email: mar...@mo... www.molecularhealth.com ---------------------------------------------------------- Molecular Health GmbH Geschaeftsfuehrer: Dr. Stephan Brock/ Dr. Friedrich von Bohlen und Halbach Sitz der Gesellschaft: Heidelberg Handelsregister: Amtsgericht Mannheim - HRB 338037 ---------------------------------------------------------- |