rdkit-devel Mailing List for RDKit (Page 13)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(24) |
Jun
(20) |
Jul
|
Aug
(2) |
Sep
(4) |
Oct
(39) |
Nov
(33) |
Dec
(8) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
(17) |
Feb
(13) |
Mar
(35) |
Apr
(10) |
May
(1) |
Jun
(2) |
Jul
(3) |
Aug
(4) |
Sep
(4) |
Oct
(7) |
Nov
(1) |
Dec
|
| 2008 |
Jan
(10) |
Feb
(2) |
Mar
(2) |
Apr
(10) |
May
(8) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
|
| 2009 |
Jan
(2) |
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(7) |
Aug
(2) |
Sep
(6) |
Oct
(12) |
Nov
|
Dec
|
| 2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
(4) |
Jun
(2) |
Jul
(17) |
Aug
(7) |
Sep
(20) |
Oct
(8) |
Nov
(1) |
Dec
(12) |
| 2011 |
Jan
(8) |
Feb
(15) |
Mar
(20) |
Apr
(5) |
May
(8) |
Jun
(2) |
Jul
(17) |
Aug
(8) |
Sep
(4) |
Oct
(15) |
Nov
|
Dec
(2) |
| 2012 |
Jan
(3) |
Feb
|
Mar
(23) |
Apr
(2) |
May
(2) |
Jun
(8) |
Jul
(7) |
Aug
(18) |
Sep
(8) |
Oct
(10) |
Nov
(2) |
Dec
(7) |
| 2013 |
Jan
(6) |
Feb
(3) |
Mar
|
Apr
(3) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
(5) |
Nov
|
Dec
|
| 2014 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(10) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
(6) |
| 2015 |
Jan
(22) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(10) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(9) |
Dec
(3) |
| 2016 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(31) |
May
(3) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(10) |
Dec
(7) |
| 2017 |
Jan
|
Feb
(7) |
Mar
(3) |
Apr
(6) |
May
(4) |
Jun
(6) |
Jul
(5) |
Aug
(1) |
Sep
(7) |
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
(11) |
Apr
(13) |
May
(18) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(3) |
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
(2) |
| 2020 |
Jan
(2) |
Feb
|
Mar
(5) |
Apr
(2) |
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
| 2021 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
(1) |
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Andrew D. <da...@da...> - 2012-12-10 15:52:09
|
On Dec 8, 2012, at 7:01 AM, Greg Landrum wrote:
> It's a pretty small API change, but there's a huge amount of code that needs to be changed in the back and lots and lots of testing that has to be done, so this is going to take a while.
Speaking of hydrogens, I came across a strange query (part of the
BindingDB_structure set from my Structure Query Collection).
C1N[H]OC[H]1
Because of RDKit's sanitization, the '[H]' gets absorbed into the atoms.
This breaks a bond, so what started as a ring with poor chemistry ends
up as two linear pieces with sane chemistry.
>>> mol = Chem.MolFromSmiles("C1N[H]OC[H]1")
>>> Chem.MolToSmiles(mol)
'CN.OC'
>>>
The above query is nonsensical, but I don't think sanitization should
modify the topology of the input.
Divalent hydrogen does exist in a very small number of real records.
I once came across a couple of structure with a 4-membered boron/hydrogen
ring, like this:
>>> mol = Chem.MolFromSmiles("[B]1[H][B][H]1")
>>> Chem.MolToSmiles(mol)
'[BH].[BH]'
>>>
I am unable to find a public record with that core, to act as a
more realistic test case.
Andrew
da...@da...
|
|
From: Andrew D. <da...@da...> - 2012-12-10 12:32:42
|
On Dec 9, 2012, at 8:06 PM, Gianluca Sforna wrote: > I believe a commonly accepted approach to this kind of issues is something like: > > 1. add warnings about the deprecated or changed methods, but keep the > current behavior so code do not break, but becomes verbose when called ... That progressive set of changes is very context dependent. Some projects have the above policy, but not all. Even within a project, some parts of a package can break backwards compatibility while others do not. For an obvious example, if the API is broken, then there's no need to go through this very careful transition plan. No one can be using it so just fix the API. Similarly, even if the API is working (as this one is), it may be that no one is using it. Or it may be that one person is using it (like I am) but the migration path from the old API to the new one is only a few lines of code, while the internal changes to support both APIs is very complicated. I believe Greg's proposal is an example of the latter. That is, I suspect that: 1) few people currently use the API, 2) of those, most can trivially port to the new API, and 3) supporting both the old and new APIs is complicated I have no problems with doing a flag-day change for the Q1 or Q2 release. Andrew da...@da... |
|
From: Andrew D. <da...@da...> - 2012-12-10 12:12:02
|
On Dec 8, 2012, at 7:01 AM, Greg Landrum wrote: > Please let me know if this sounds reasonable or if you have suggestions on how to handle the logistics more cleanly. It looks quite reasonable to me. Only a bit of my code will be affected by the change, and that's code which is just at the beginning of development, so it's not deployed anywhere. Andrew da...@da... |
|
From: Christos K. <chr...@gm...> - 2012-12-09 19:16:03
|
Gianluca 's plan is better. I think step 5 can be the step where depreciated methods exist but they are redirecting to the new methods if this is possible, otherwise there have to be removed. The key point to the whole plan is to give users the time to update their code. Regards, Christos Sent from my Galaxy Note! On Dec 9, 2012 8:06 PM, "Gianluca Sforna" <gi...@gm...> wrote: > On Sat, Dec 8, 2012 at 7:01 AM, Greg Landrum <gre...@gm...> > wrote: > > My current plan is to keep this branch in sync with the trunk through (at > > least) the Q4 2012 release and then consider moving it onto the trunk and > > creating a v1 API branch in Q1 of next year. I guess I should be able to > > keep the v1 API branch in sync, at least in terms of bug fixes, for > another > > 2-3 releases. > > I believe a commonly accepted approach to this kind of issues is something > like: > > 1. add warnings about the deprecated or changed methods, but keep the > current behavior so code do not break, but becomes verbose when called > 2. add the v2 methods. > 3. release Q4 2012 > 4. keep the warnings around as much as you feel is fair to give time > for the clients to port code > 5. make the real break (remove deprecated methods, change method signature) > 6. release a new version with the api break > > Cheers > > G. > > -- > Gianluca Sforna > > http://morefedora.blogspot.com > http://identi.ca/giallu - http://twitter.com/giallu > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Rdkit-devel mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > |
|
From: Gianluca S. <gi...@gm...> - 2012-12-09 19:06:27
|
On Sat, Dec 8, 2012 at 7:01 AM, Greg Landrum <gre...@gm...> wrote: > My current plan is to keep this branch in sync with the trunk through (at > least) the Q4 2012 release and then consider moving it onto the trunk and > creating a v1 API branch in Q1 of next year. I guess I should be able to > keep the v1 API branch in sync, at least in terms of bug fixes, for another > 2-3 releases. I believe a commonly accepted approach to this kind of issues is something like: 1. add warnings about the deprecated or changed methods, but keep the current behavior so code do not break, but becomes verbose when called 2. add the v2 methods. 3. release Q4 2012 4. keep the warnings around as much as you feel is fair to give time for the clients to port code 5. make the real break (remove deprecated methods, change method signature) 6. release a new version with the api break Cheers G. -- Gianluca Sforna http://morefedora.blogspot.com http://identi.ca/giallu - http://twitter.com/giallu |
|
From: Greg L. <gre...@gm...> - 2012-12-08 06:01:55
|
Hi,
This is an unusually long email for me, so here's the TL;DR version:
The RDKit handling of implicit hydrogens is confusing. This leads to
needlessly complicated code and, almost certainly, bugs. I'm proposing to
fix it, but doing so requires an API change. This work will be done on a
branch until we decide that it should move over to the trunk.
# Background
Like most cheminformatics toolkits, the RDKit by default stores molecules
without hydrogens being explicitly present in the molecular graph. This is
done for efficency reasons: there's no need to store all those extra atoms
if you can implicitly know that they're there. Thanks to the octet rule,
it's pretty easy to know when Hs should be attached to the common elements
encountered in organic chemistry. This leads to the concept of the
"implicit hydrogen": an H that you know is there but which isn't explicitly
represented.
This is wonderfully clear in SMILES: you can write ethane as "CC" instead
of "[H]C([H])([H])C([H])([H])[H]" or "[CH3][CH3]". For elements out of the
"organic subset" -- where the octet rule may be a bit more relaxed -- or in
cases where the octet rule is violated, you have to provide the H count in
the SMILES. For example: "[SiH3][SiH3]" or the radical "[CH2]C". The rules
for SMILES stipulate that if the atom is in square brackets, for whatever
reason, you must provide the H count. For example "Cl[C@H](C)F". As an
aside: you also sometimes need to do this in aromatic heterocycles, where
it's not always clear where if an H should be present. So pyrrole is
written as "c1cccc[nH]1", not "c1ccccn1".
Back in the very early days of designing and implementing the RDKit, for
reasons I can no longer really remember (give me a break! it was more than
10 years ago!), I decided that there should be a difference between
"specified implicit Hs", those that are inside square brackets, and
"default implicit Hs", those that don't show up in the SMILES at all
(thanks to Andrew Dalke for the terms). Thus there are the methods
Atom.GetNumImplicitHs and Atom.GetNumExplicitHs (I will use the Python API
names throughout this since I'm showing examples from Python). These don't
necessarily behave the way one would expect them to:
In [2]: m = Chem.MolFromSmiles('C[CH3]')
In [3]: m.GetAtomWithIdx(0).GetNumImplicitHs()
Out[3]: 3
In [4]: m.GetAtomWithIdx(1).GetNumImplicitHs()
Out[4]: 0
In [5]: m.GetAtomWithIdx(0).GetNumExplicitHs()
Out[5]: 0
In [6]: m.GetAtomWithIdx(1).GetNumExplicitHs()
Out[6]: 3
You really need to use Atom.GetTotalNumHs() to safely figure out how many
Hs an atom has:
In [7]: m.GetAtomWithIdx(0).GetTotalNumHs()
Out[7]: 3
In [8]: m.GetAtomWithIdx(1).GetTotalNumHs()
Out[8]: 3
This has been bothering me for a long time, but fixing it requires a change
to the API which will break code. I'm really, really resistant to doing
that, but I don't see any way to clean things up without an API change.
# Fixing the problem
Here's a preliminary list of API changes:
The following methods will be removed from the API:
- Atom.{Get,Set}NumExplicitHs()
- Atom.{Get,Set}NoImplicit()
The following methods will be added:
- Atom.{Get,Set}NoImplicitCalculation(): equivalent to the current
Atom.{Get,Set}NoImplicit()
- Atom.SetNumImplicitHs()
The following methods will be modified:
- Atom.GetTotalNumHs(): the optional "includeNeighbors" argument will be
removed; it will always include neighbors.
Things may be added to this over time.
# Practical considerations
It's a pretty small API change, but there's a huge amount of code that
needs to be changed in the back and lots and lots of testing that has to be
done, so this is going to take a while.
I'm currently working on a branch (note: this is definitely not yet stable
enough to rely on it working at all):
https://svn.code.sf.net/p/rdkit/code/branches/AlternateValence_21Nov2012
My current plan is to keep this branch in sync with the trunk through (at
least) the Q4 2012 release and then consider moving it onto the trunk and
creating a v1 API branch in Q1 of next year. I guess I should be able to
keep the v1 API branch in sync, at least in terms of bug fixes, for another
2-3 releases.
Please let me know if this sounds reasonable or if you have suggestions on
how to handle the logistics more cleanly.
# Thanks
Roger Sayle, perhaps inadvertently, provide the final bit of impetus
required to make this happen by submitting a nice little patch to update
the way implicit valence is calculated. Integrating this simple patch
forced me to confront how ugly the current handling of implicit Hs is.
Best Regards,
-greg
|
|
From: Greg L. <gre...@gm...> - 2012-11-17 04:01:19
|
Dear all, On Mon, Oct 29, 2012 at 10:57 AM, Nicholas Firth <Nic...@ic...>wrote: > Hi Guys, > I have managed to be allowed to open source the code from my paper. Greg I > couldn't work if I was supposed to add this to the contrib or if you do it, > so I've just attached it to the email. The code is a 'working' > implementation although the error handling is not my finest work. > Nick and I did some revisions of his code. The most recent version is now checked into $RDBASE/Contrib/PBF It's C++ only at the moment. Many thanks to Nick for the contribution. -greg |
|
From: Greg L. <gre...@gm...> - 2012-11-01 11:35:38
|
Dear all, After some key assistance from a colleague (thanks Nico!), I've managed to build C# wrappers for the RDKit that are useable from within Visual Studio projects. These use the same SWIG configuration as the RDKit java wrappers, so the API should be more or less the same in both Java and C#. I suspect it's going to take a few iterations to get the packaging and documentation of these things done in a form that's actually useful to others, so I'm looking for volunteers. If anyone in the community is interested in trying out the new wrappers and helping to refine the build process, please let me know. If you're really feeling brave, the wrappers are in $RDBASE/Code/JavaWrappers/csharp_wrapper. -greg |
|
From: Greg L. <gre...@gm...> - 2012-10-29 12:32:15
|
Dear Nick, On Mon, Oct 29, 2012 at 10:57 AM, Nicholas Firth <Nic...@ic...>wrote: > Hi Guys, > I have managed to be allowed to open source the code from my paper. Greg I > couldn't work if I was supposed to add this to the contrib or if you do it, > so I've just attached it to the email. The code is a 'working' > implementation although the error handling is not my finest work. > Great! Thanks for the contribution. I will get it integrated into the contrib are in SVN sometime in the next couple days and let you know when it's there. Best, -greg |
|
From: Nicholas F. <Nic...@ic...> - 2012-10-29 10:18:55
|
<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"><div></div></body></html> |
|
From: Greg L. <gre...@gm...> - 2012-10-24 06:09:44
|
Dear all, Starting with the 2012.12 release, I would like to stop doing python 2.6 builds of the RDKit for Windows. In addition to the "administrative" overhead of having to do the build, package it, and upload it, having a second Python installation on my Windows machine is making the process of configuring and running the build more complex than I think it needs to be. This has an impact on my ability to update and simplify the windows build instructions (one of my ToDo items from the UGM). The Python 2.6 binaries are downloaded a lot less often than the 2.7 versions, so it seems like this is unlikely to cause major problems, but I'm willing to keep doing the builds for at least a few more releases if they are important to someone here. Best, -greg |
|
From: Greg L. <gre...@gm...> - 2012-10-21 03:12:37
|
I'm very happy to announce that the next version of the RDKit -- 2012.09 (a.k.a Q3 2012) -- is released. The release notes are below. The source release is on the sourceforge downloads page: http://sourceforge.net/projects/rdkit/files/rdkit/Q3_2012/ The files can also be downloaded from the google project page: http://code.google.com/p/rdkit/downloads/list The binaries for Windows, Python 2.6 and Python 2.7, have also been uploaded. The online version of the documentation at rdkit.org has been updated. Thanks to the everyone who submitted bug reports and suggestions for this release! Please let me know if you find any problems with the release or have suggestions for the next one. -greg ****** Release_2012.09.1 ******* (Changes relative to Release_2012.06.1) !!!!!! IMPORTANT !!!!!! - Some of the bug fixes affect the generation of SMILES. Canonical SMILES generated with this version of the RDKit will be different from previous versions. - The fix to Issue 252 (see below) will lead to changes in calculated logP and MR values for some compounds. - The fix to Issue 254 (see below) will lead to changes in some descriptors and geometries for sulfur-containing compounds. - The fix to Issue 256 (see below) has changed the name of the optional argument to mol.GetNumAtoms from onlyHeavy to onlyExplicit. For compatibility reasons, Python code that uses explicitly uses onlyHeavy will still work, but it will generate warnings. This compatibility will be removed in a future release. Acknowledgements: Gianpaolo Bravi, David Cosgrove, Andrew Dalke, Fabian Dey, James Davidson, JP Ebejer, Gabriele Menna, Stephan Reiling, Roger Sayle, James Swetnam Bug Fixes: - The molecules that come from mergeQueryHs() now reset the RingInfo structure. (issue 245) - The output from MurckoScaffold.MakeScaffoldGeneric no longer includes stereochemistry or explicit Hs. (issue 246) - D and T atoms in CTABs now have their isotope information set. (issue 247) - Some problems with ring finding in large, complex molecules have been fixed. (issue 249) - The "rootedAtAtom" argument for FindAllSubgraphsOfLengthN is now handled properly. (issue 250) - Bonds now have a SetProp() method available in Python. (issue 251) - A number of problems with the Crippen atom parameters have been fixed. (issue 252) - Ring closure digits are no longer repeated on the same atom in SMILES generated by the RDKit. (issue 253) - Non-ring sulfur atoms adjacent to aromatic atoms are no longer set to be SP2 hybridized. This allows them to be stereogenic. (issue 254) - The combineMols() function now clears computed properties on the result molecule. - A couple of problems with the pickling functions on big endian hardware were fixed. - The molecule drawing code now uses isotope information - Superscript/Subscript handling in the agg canvas has been improved. - SKP lines in CTABS are now propertly handled. (Issue 255) - The name of the optional argument to mol.GetNumAtoms has been changed from onlyHeavy to onlyExplicit. The method counts the number of atoms in the molecular graph, not the number of heavy atoms. These numbers happen to usually be the same (which is why this has taken so long to show up), but there are exceptions if Hs or dummy atoms are in the graph. (Issue 256) - Unknown bonds in SMILES are now output using '~' instead of '?'. The SMILES parser now recognizes '~' as an "any bond" query. (Issue 257) - Lines containing only white space in SDF property blocks are no longer treated as field separators. - Transition metals and lanthanides no longer have default valences assigned. New Features: - The RDKit now has a maximum common substructure (MCS) implementation contributed by Andrew Dalke. This is currently implemented in Python and is available as: from rdkit.Chem import MCS Documentation is available as a docstring for the function MCS.FindMCS and in the GettingStarted document. - A few new functions have been added to rdkit.Chem.Draw: MolsToImage(), MolsToGridImage(), ReactionToImage() - CalcMolFormula() now provides the option to include isotope information. - The RDKit and Layered fingerprinters both now accept "fromAtoms" arguments that can be used to limit which atoms contribute to the fingerprint. - Version information is now available in the Java wrapper. - The descriptor NumRadicalElectrons is now available. - The PyMol interface now supports a GetPNG() method which returns the current contents of the viewer window as an PIL Image object. - Molecules (ROMol in C++, rdkit.Chem.Mol in Python) now have a getNumHeavyAtoms() method. - Component-level grouping (parens) can be used in reaction SMARTS. New Database Cartridge Features: - support for molecule <-> pickle conversion via the functions mol_to_pkl, mol_from_pkl, and is_valid_mol_pkl. - support for bit vector <-> binary text conversion via the functions bfp_to_binary_text, bfp_from_binary_text New Java Wrapper Features: Deprecated modules (to be removed in next release): Removed modules: Other: - During this release cycle, the sourceforge project was updated to their new hosting system. This explains the change in bug/issue ids. - the SMILES parser is now substantially faster. - The molecular drawings generated by Code/Demo/RDKit/Draw/MolDrawing.h have been improved. - There is now demo code availble for using the C++ drawing code within Qt applications. (contributed by David Cosgrove) - The directory $RDBASE/Regress now contains sample data and scripts for benchmarking the database cartridge. - Fused-ring aromaticity is now only considered in rings of up to size 24. - It is no longer necessary to have flex and bison installed in order to build the RDKit. |
|
From: James D. <J.D...@ve...> - 2012-10-15 05:20:54
|
Dear All, I wondered if I could take a moment to alert you to a new position here at Vernalis. The job is advertised on our site (http://www.vernalis.com/about-us/careers-at-vernalis) and on CCL (http://www.ccl.net/cca/jobs/joblist/mess0025266.shtml). Kind regards James ______________________________________________________________________ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or pos...@ve.... Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies 100 Berkshire Place Wharfedale Road Winnersh, Berkshire RG41 5RD, England Tel: +44 (0)118 938 0000 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. ______________________________________________________________________ |
|
From: Greg L. <gre...@gm...> - 2012-10-09 05:36:41
|
Dear all, Because of some last minute bug fixing, particularly issue 256 (http://sourceforge.net/p/rdkit/bugs/256/), I've decided to do a second beta before the next RDKit release. Today I tagged the beta for the 2012.09 release in svn: http://rdkit.svn.sourceforge.net/viewvc/rdkit/tags/Release_2012_09_1beta2/ and uploaded a source distribution to the google code site: http://rdkit.googlecode.com/files/RDKit_2012_09_1beta2.tgz The google code page also has binaries for python2.7 on win32 (http://rdkit.googlecode.com/files/RDKit_2012_09_1beta2.win32.py27.zip). If no show-stopper bugs appear, I will do the release itself at the end of next week. Excerpts from the updated release notes are below. Best Regards, -greg ****** Release_2012.09.1 ******* (Changes relative to Release_2012.06.1) !!!!!! IMPORTANT !!!!!! - Some of the bug fixes affect the generation of SMILES. Canonical SMILES generated with this version of the RDKit will be different from previous versions. - The fix to Issue 252 (see below) will lead to changes in calculated logP and MR values for some compounds. - The fix to Issue 254 (see below) will lead to changes in some descriptors and geometries for sulfur-containing compounds. - The fix to Issue 256 (see below) has changed the name of the optional argument to mol.GetNumAtoms from onlyHeavy to onlyExplicit. For compatibility reasons, Python code that uses explicitly uses onlyHeavy will still work, but it will generate warnings. This compatibility will be removed in a future release. Acknowledgements: Gianpaolo Bravi, David Cosgrove, Andrew Dalke, Fabian Dey, James Davidson, JP Ebejer, Gabriele Menna, Stephan Reiling, James Swetnam Bug Fixes: - The molecules that come from mergeQueryHs() now reset the RingInfo structure. (issue 245) - The output from MurckoScaffold.MakeScaffoldGeneric no longer includes stereochemistry or explicit Hs. (issue 246) - D and T atoms in CTABs now have their isotope information set. (issue 247) - Some problems with ring finding in large, complex molecules have been fixed. (issue 249) - The "rootedAtAtom" argument for FindAllSubgraphsOfLengthN is now handled properly. (issue 250) - Bonds now have a SetProp() method available in Python. (issue 251) - A number of problems with the Crippen atom parameters have been fixed. (issue 252) - Ring closure digits are no longer repeated on the same atom in SMILES generated by the RDKit. (issue 253) - Non-ring sulfur atoms adjacent to aromatic atoms are no longer set to be SP2 hybridized. This allows them to be stereogenic. (issue 254) - The combineMols() function now clears computed properties on the result molecule. - A couple of problems with the pickling functions on big endian hardware were fixed. - The molecule drawing code now uses isotope information - Superscript/Subscript handling in the agg canvas has been improved. - The name of the optional argument to mol.GetNumAtoms has been changed from onlyHeavy to onlyExplicit. The method counts the number of atoms in the molecular graph, not the number of heavy atoms. These numbers happen to usually be the same (which is why this has taken so long to show up), but there are exceptions if Hs or dummy atoms are in the graph. (Issue 256) New Features: - The RDKit now has a maximum common substructure (MCS) implementation contributed by Andrew Dalke. This is currently implemented in Python and is available as: from rdkit.Chem import MCS Documentation is available as a docstring for the function MCS.FindMCS and in the GettingStarted document. - A few new functions have been added to rdkit.Chem.Draw: MolsToImage(), MolsToGridImage(), ReactionToImage() - CalcMolFormula() now provides the option to include isotope information. - The RDKit and Layered fingerprinters both now accept "fromAtoms" arguments that can be used to limit which atoms contribute to the fingerprint. - Version information is now available in the Java wrapper. - The descriptor NumRadicalElectrons is now available. - The PyMol interface now supports a GetPNG() method which returns the current contents of the viewer window as an PIL Image object. - Molecules (ROMol in C++, rdkit.Chem.Mol in Python) now have a getNumHeavyAtoms() method. New Database Cartridge Features: - support for molecule <-> pickle conversion via the functions mol_to_pkl, mol_from_pkl, and is_valid_mol_pkl. - support for bit vector <-> binary text conversion via the functions bfp_to_binary_text, bfp_from_binary_text New Java Wrapper Features: Deprecated modules (to be removed in next release): Removed modules: Other: - During this release cycle, the sourceforge project was updated to their new hosting system. This explains the change in bug/issue ids. - the SMILES parser is now substantially faster. - The molecular drawings generated by Code/Demo/RDKit/Draw/MolDrawing.h have been improved. - There is now demo code availble for using the C++ drawing code within Qt applications. (contributed by David Cosgrove) - The directory $RDBASE/Regress now contains sample data and scripts for benchmarking the database cartridge. - Fused-ring aromaticity is now only considered in rings of up to size 24. - It is no longer necessary to have flex and bison installed in order to build the RDKit. |
|
From: Gianluca S. <gi...@gm...> - 2012-10-05 19:15:13
|
On Fri, Oct 5, 2012 at 4:46 PM, Greg Landrum <gre...@gm...> wrote: > On Fri, Oct 5, 2012 at 9:27 AM, Gianluca Sforna <gi...@gm...> wrote: > What is involved in doing the mirroring? We could explore having it > run from the VM that hosts rdkit.org if it would help. I run this script hourly: cd /home/giallu/sources/rdkit-code git checkout -q master git svn -qq rebase > /dev/null git push -q rdkit@github master Running in the VM is doable, I just need to dig the commands I used to clone the repo in the first place -- Gianluca Sforna http://morefedora.blogspot.com http://identi.ca/giallu - http://twitter.com/giallu |
|
From: Greg L. <gre...@gm...> - 2012-10-05 14:47:03
|
On Fri, Oct 5, 2012 at 9:27 AM, Gianluca Sforna <gi...@gm...> wrote: > > Just one note about the repo. I'm mirroring it hourly from SVN, but > this is done from my own PC so, if it is offline, there could be > delays on updates. Otherwise, it should be pretty much in sync with > the master SVN repo. What is involved in doing the mirroring? We could explore having it run from the VM that hosts rdkit.org if it would help. -greg |
|
From: Gianluca S. <gi...@gm...> - 2012-10-05 07:27:56
|
On Fri, Oct 5, 2012 at 7:36 AM, Greg Landrum <gre...@gm...> wrote: > I mentioned at the UGM that Gianluca Sforna has set up a mirror of the > RDKit subversion repository. > The repository is here: https://github.com/rdkit/rdkit > Thanks Greg, I really looked forward to join you for the UGM, but unfortunately could not. Thanks also to Nathan Brown for covering the event on Twitter. Just one note about the repo. I'm mirroring it hourly from SVN, but this is done from my own PC so, if it is offline, there could be delays on updates. Otherwise, it should be pretty much in sync with the master SVN repo. Cheers Gianluca -- Gianluca Sforna http://morefedora.blogspot.com http://identi.ca/giallu - http://twitter.com/giallu |
|
From: Greg L. <gre...@gm...> - 2012-10-05 05:36:39
|
Dear all, I mentioned at the UGM that Gianluca Sforna has set up a mirror of the RDKit subversion repository. The repository is here: https://github.com/rdkit/rdkit I plan to continue to use svn as the primary repository of the code for the foreseeable future, but the git repository provides a nice option for people who want to use a distributed version control system. Feel free to submit suggested changes/patches via git pull requests. I will figure out how to re-integrated accepted patches into the svn trunk. In the near term, it is probably a good idea to assume that I won't see pull requests (since I'm not usually using git) and to either send me email or post to rdkit-devel. Many thanks for Gianluca for setting the git mirror up! -greg |
|
From: Andrew D. <da...@da...> - 2012-09-25 20:42:12
|
I made the last of the code changes. The public API now uses RDKit-style
mixedCase parameters instead of PEP 8-style underscore_names.
On Sep 25, 2012, at 4:44 AM, Greg Landrum wrote:
> Is there a python documentation generator that can process that and do
> something sensible with it? If so, I'm fine with it. If not, then
> maybe the @type bits can be combined into the @param somehow (yes, it
> would just be text then).
That was the theory. I've looked it over and greatly improved the
documentation by removing that autodoc section and using prose, like:
"""Find the maximum common substructure of a set of molecules
In the simplest case, pass in a list of molecules and get back
an MCSResult object which describes the MCS:
>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles("C#CCP"), Chem.MolFromSmiles("C=CCO")]
>>> from rdkit.Chem import MCS
>>> MCS.FindMCS(mols)
MCSResult(numAtoms=2, numBonds=1, smarts='[#6]-[#6]', completed=1)
The SMARTS '[#6]-[#6]' matches the largest common substructure of
the input structures. It has 2 atoms and 1 bond. If there is no
MCS which is at least `minNumAtoms` in size then the result will set
numAtoms and numBonds to -1 and set smarts to None.
...
> The API docs are automatically generated. It would be nice to have a
> brief section for the GettingStarted document
> ($RDBASE/Docs/Book/GettingStartedInPython.rst), but I am willing to
> write that if you don't have time.
I've copied the doctoring, with slight modifications, into the Getting
Started document. One of the modifications was to show an example of
using three molecules, since the docstring only shows pairwise examples.
I don't like the document duplication. I'm not going to worry much
about it though.
Feel free to change as you wish.
Andrew
da...@da...
|
|
From: Stiefl, N. <nik...@no...> - 2012-09-25 08:02:07
|
Hi Andrew - that's great - thanks! Maybe SubsetMCS would be good since it will be for a subset of the entire system? Ciao Nik On 9/25/12 4:44 AM, "Greg Landrum" <gre...@gm...> wrote: >On Tue, Sep 25, 2012 at 4:30 AM, Andrew Dalke <da...@da...> >wrote: >> The next quarter release is coming soon, which was good incentive for >>me to >> work on the new MCS code for RDKit. > >This is great news. Andrew's MCS code fills an important gap in the >current RDKit functionality. > > >> Here's the API and documentation. >> >> >> def FindMCS(mols, min_num_atoms=2, >> maximize = Default.maximize, >> atom_compare = Default.atom_compare, >> bond_compare = Default.bond_compare, >> match_valences = Default.match_valences, >> ring_matches_ring_only = False, >> complete_rings_only = False, >> timeout=Default.timeout, >> ): >> """Find the maximum common substructure of a set of molecules >> >> @type mols: molecule iterator >> @param mols: find the MCS of these molecules >> @type min_num_atoms: integer >> @param min_num_atoms: The minimum number of atoms which must be in >>the MCS. >> The minimim value is 2. >> @type maximize: "atoms" or "bonds" >> @param maximize: The default "atoms" maximizes the number of atoms >>in >> the MCS. Use "bonds" to maximize the number of bonds >>instead. >> @type atom_compare: "any", "elements", or "isotopes" >> @param atom_compare: Specify the atom comparison function. The >>default "elements" >> says that two atoms are the same if and only if they have >>the same >> element number. Use "isotopes" if you are using isotope >>labels to >> define your own atom classs. With "any", all atoms match >>each other. >> @type bond_compare: "any" or "bondtypes" >> @param bond_compare: Specify the bond comparison function. The >>default >> "bondtypes" says that two bonds are the same if and only >>if they >> have the same bond type. With "any", all bonds match each >>other. >> @type match_valences: boolean >> @param match_valences: If True, atoms must also have matching >>valences >> to match. By default this is False. >> @type ring_matches_ring_only: boolean >> @param ring_matches_ring_only: If True, then both bonds must either >>be in >> a ring or not in a ring in order to match. By default >>this is False. >> @type ring_matches_ring_only: boolean >> @param ring_matches_ring_only: If True, then both bonds must either >>be in >> a ring or not in a ring in order to match. By default >>this is False. >> @type complete_rings_only: boolean >> @param complete_rings_only: If True, then if a ring bond of a >>molecule >> is in the MCS then the corresponding MCS bond is also in >>a ring. >> @type timeout: float >> @param timeout: stop search after 'timeout' seconds and report the >>current >> best MCS. >> >> @rtype: MCSResult >> @return: Information about the MCS search results. Attributes are >>'completed' >> (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds', >>and 'smarts'. >> >> >> Is that too unreadable? I can rewrite that to be more prose than >> this sort of auto-documentation format. > >Is there a python documentation generator that can process that and do >something sensible with it? If so, I'm fine with it. If not, then >maybe the @type bits can be combined into the @param somehow (yes, it >would just be text then). > > >> I wasn't sure of how/where to update the documentation. Greg? Do you >>have >> an idea of where it might go? Or perhaps want to do it yourself? > >The API docs are automatically generated. It would be nice to have a >brief section for the GettingStarted document >($RDBASE/Docs/Book/GettingStartedInPython.rst), but I am willing to >write that if you don't have time. > >> In other news, I'll be presenting the MCS work at the Goslar conference >> this fall, and I've got some funding to work on finding a threshold MCS, >> that is, the maximum common substructure which is in at least some >>percentage >> threshold of the entire system. > >Congrats! That would definitely be useful. > >> Can anyone suggest a good name for that? Threshold MCS? Something else? > >"Threshold MCS" sounds ok to me, but I'm not great at naming things. > >-greg > >-------------------------------------------------------------------------- >---- >Live Security Virtual Conference >Exclusive live event will cover all the ways today's security and >threat landscape has changed and how IT managers can respond. Discussions >will include endpoint security, mobile security and the latest in malware >threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >_______________________________________________ >Rdkit-devel mailing list >Rdk...@li... >https://lists.sourceforge.net/lists/listinfo/rdkit-devel |
|
From: Greg L. <gre...@gm...> - 2012-09-25 02:45:18
|
On Tue, Sep 25, 2012 at 4:30 AM, Andrew Dalke <da...@da...> wrote: > The next quarter release is coming soon, which was good incentive for me to > work on the new MCS code for RDKit. This is great news. Andrew's MCS code fills an important gap in the current RDKit functionality. > Here's the API and documentation. > > > def FindMCS(mols, min_num_atoms=2, > maximize = Default.maximize, > atom_compare = Default.atom_compare, > bond_compare = Default.bond_compare, > match_valences = Default.match_valences, > ring_matches_ring_only = False, > complete_rings_only = False, > timeout=Default.timeout, > ): > """Find the maximum common substructure of a set of molecules > > @type mols: molecule iterator > @param mols: find the MCS of these molecules > @type min_num_atoms: integer > @param min_num_atoms: The minimum number of atoms which must be in the MCS. > The minimim value is 2. > @type maximize: "atoms" or "bonds" > @param maximize: The default "atoms" maximizes the number of atoms in > the MCS. Use "bonds" to maximize the number of bonds instead. > @type atom_compare: "any", "elements", or "isotopes" > @param atom_compare: Specify the atom comparison function. The default "elements" > says that two atoms are the same if and only if they have the same > element number. Use "isotopes" if you are using isotope labels to > define your own atom classs. With "any", all atoms match each other. > @type bond_compare: "any" or "bondtypes" > @param bond_compare: Specify the bond comparison function. The default > "bondtypes" says that two bonds are the same if and only if they > have the same bond type. With "any", all bonds match each other. > @type match_valences: boolean > @param match_valences: If True, atoms must also have matching valences > to match. By default this is False. > @type ring_matches_ring_only: boolean > @param ring_matches_ring_only: If True, then both bonds must either be in > a ring or not in a ring in order to match. By default this is False. > @type ring_matches_ring_only: boolean > @param ring_matches_ring_only: If True, then both bonds must either be in > a ring or not in a ring in order to match. By default this is False. > @type complete_rings_only: boolean > @param complete_rings_only: If True, then if a ring bond of a molecule > is in the MCS then the corresponding MCS bond is also in a ring. > @type timeout: float > @param timeout: stop search after 'timeout' seconds and report the current > best MCS. > > @rtype: MCSResult > @return: Information about the MCS search results. Attributes are 'completed' > (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds', and 'smarts'. > > > Is that too unreadable? I can rewrite that to be more prose than > this sort of auto-documentation format. Is there a python documentation generator that can process that and do something sensible with it? If so, I'm fine with it. If not, then maybe the @type bits can be combined into the @param somehow (yes, it would just be text then). > I wasn't sure of how/where to update the documentation. Greg? Do you have > an idea of where it might go? Or perhaps want to do it yourself? The API docs are automatically generated. It would be nice to have a brief section for the GettingStarted document ($RDBASE/Docs/Book/GettingStartedInPython.rst), but I am willing to write that if you don't have time. > In other news, I'll be presenting the MCS work at the Goslar conference > this fall, and I've got some funding to work on finding a threshold MCS, > that is, the maximum common substructure which is in at least some percentage > threshold of the entire system. Congrats! That would definitely be useful. > Can anyone suggest a good name for that? Threshold MCS? Something else? "Threshold MCS" sounds ok to me, but I'm not great at naming things. -greg |
|
From: Andrew D. <da...@da...> - 2012-09-25 02:30:27
|
The next quarter release is coming soon, which was good incentive for me to work on the new MCS code for RDKit. I checked in the actual MCS code a few weeks ago, and Greg's reviewed it at least lightly. This evening I checked in the test cases. They are at: http://sourceforge.net/p/rdkit/code/2197/tree/trunk/rdkit/Chem/MCS.py http://sourceforge.net/p/rdkit/code/2197/tree/trunk/rdkit/Chem/UnitTestMCS.py Here's the API and documentation. def FindMCS(mols, min_num_atoms=2, maximize = Default.maximize, atom_compare = Default.atom_compare, bond_compare = Default.bond_compare, match_valences = Default.match_valences, ring_matches_ring_only = False, complete_rings_only = False, timeout=Default.timeout, ): """Find the maximum common substructure of a set of molecules @type mols: molecule iterator @param mols: find the MCS of these molecules @type min_num_atoms: integer @param min_num_atoms: The minimum number of atoms which must be in the MCS. The minimim value is 2. @type maximize: "atoms" or "bonds" @param maximize: The default "atoms" maximizes the number of atoms in the MCS. Use "bonds" to maximize the number of bonds instead. @type atom_compare: "any", "elements", or "isotopes" @param atom_compare: Specify the atom comparison function. The default "elements" says that two atoms are the same if and only if they have the same element number. Use "isotopes" if you are using isotope labels to define your own atom classs. With "any", all atoms match each other. @type bond_compare: "any" or "bondtypes" @param bond_compare: Specify the bond comparison function. The default "bondtypes" says that two bonds are the same if and only if they have the same bond type. With "any", all bonds match each other. @type match_valences: boolean @param match_valences: If True, atoms must also have matching valences to match. By default this is False. @type ring_matches_ring_only: boolean @param ring_matches_ring_only: If True, then both bonds must either be in a ring or not in a ring in order to match. By default this is False. @type ring_matches_ring_only: boolean @param ring_matches_ring_only: If True, then both bonds must either be in a ring or not in a ring in order to match. By default this is False. @type complete_rings_only: boolean @param complete_rings_only: If True, then if a ring bond of a molecule is in the MCS then the corresponding MCS bond is also in a ring. @type timeout: float @param timeout: stop search after 'timeout' seconds and report the current best MCS. @rtype: MCSResult @return: Information about the MCS search results. Attributes are 'completed' (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds', and 'smarts'. Is that too unreadable? I can rewrite that to be more prose than this sort of auto-documentation format. I wasn't sure of how/where to update the documentation. Greg? Do you have an idea of where it might go? Or perhaps want to do it yourself? In other news, I'll be presenting the MCS work at the Goslar conference this fall, and I've got some funding to work on finding a threshold MCS, that is, the maximum common substructure which is in at least some percentage threshold of the entire system. Can anyone suggest a good name for that? Threshold MCS? Something else? Cheers, Andrew da...@da... |
|
From: Andrew D. <da...@da...> - 2012-09-01 18:43:31
|
On Sep 1, 2012, at 6:00 PM, James Swetnam wrote:
> Sorry, hit send prematurely.
>
> I'd define a separate function FindMCSWithTimeout for the timeout parameter unless a majority of users will be concerned with this method's running time, but I'm not familiar with the algorithm's details.
The MCS algorithm is one of those normally fast but
not infrequently exponential slow algorithms. In my various
tests, including simple pairwise tests of 10,000 randomly
chosen pairs of structures from ChEMBL 13, I found that:
7 pairs took longer than 30 seconds,
9977 took less than 2 seconds,
9958 took less than 1 second,
9420 took less than 0.1 seconds,
4820 took less than 0.01 seconds.
1513 took less than 0.001 seconds
As I recall, sometimes the search can take many hours.
My belief is that the slow runtimes occur often enough
that everyone will use a timeout. I also believe that early
on, people who use the API will test with structures where
the MCS is easy to find. Only later on will they get stuck
in some slow-downs (which is especially a problem on a
web server), and then have to diagnose it.
BTW, unlike, say, the CDK code, timeout failures do NOT
raise an exception. One of the advantages of my algorithm
over some of the other ones is that I can report the best
size to date. The returned MCSResult contains:
- number of atoms in the MCS
- number of bonds in the MCS
- SMARTS pattern
- a 'completed' flag, to know if MCS search completed
(True) or stopped due to some other reason,
like a timeout (False).
Indigo's scaffold finder uses a per-indigo-instance variable
to set the timeout. My testing showed that Indigo's code,
while faster than fmcs, still has timeout issues.
CDK's MCS code implements a timeout.
Small Molecule Subgraph Detector (SMSD) does not have a timeout.
I had to write my own wrapper around it to kill it when it took
too long.
OEChem's MCS code does not seem to have a timeout. They have
two different methods, one approximate and the other exact.
I don't have the experience to judge if there are problems
when using the exact method in practice.
The existence of timeouts in other codes suggests that
it's important, and it's hard to remove the need.
// This is OpenBabel like. It's orthogonal to the method itself.
I don't know what you mean by orthogonal. That is,
why does the run-time allocation any more orthogonal
than the binary options
ring_matches_ring_only = False,
complete_rings_only = False,
There are several ways to handle all of the parameters:
- have a function which takes all of the parameters
- create a search class, where some parameters can be set in
the constructor.
- and/or, which has methods methods like "SetTimeout(double seconds)"
- create a "search parameters" class, which holds all of the
values and is passed to the search function
- create different classes, one for each parameterization
- like Indigo, support multiple toolkit instance, where
each instance can have its own settings
- introduce global (or per-thread) variables
Of these, I thought that a single function call interface
was the easiest to understand.
Also, I lean towards the functional programming ideal of
not changing state, so don't like "SetTimeout" or the
approaches which set some sort of (semi-) global parameter.
> Agree with Greg's assertion to rename to FindMCS.
Yes, I'll be changing the name.
> // Rename Default to something like MCSParams, and have a factory function returning a DefaultMCSParams I assume this is the command pattern?
It's not a command pattern. It's minimal data only.
=================
### A place to set global options
# (Is this really useful?)
class Default(object):
timeout = None
timeout_string = "none"
maximize = "bonds"
atom_compare = "elements"
bond_compare = "bondtypes"
match_valences = False
ring_matches_ring_only = False
complete_rings_only = False
==================
> // Logging should really be separated from API methods. You should be
> using the python logging framework such that users
> // can modify these globally and not through the API
Then perhaps "logging" is the wrong name for it. "Progress" is
much better.
It's used in the command-line tool to show progress information.
When something takes 30 seconds to run, it's comforting to know
that the underlying code is doing something. This would also be
true for a GUI display. Ideally the API would support a progress
handler, along with an option to abort.
I think it's best to strip it out of the RDKit version of the code.
What's there is only for the command-line tool, and I don't have
the time/funding to develop a proper API.
BTW, I don't think Python's logging system is appropriate for this.
Actually, I don't like using logs for most things, but that's
a different topic.
> verbose_delay=1.0,
> // Also remove from this function
Yes. See above.
This is an example of something which is hard to integrate with
the logging API. I could log every event, but that's useless.
Instead, I need to provide some sample. I could do it every
N events, but that's CPU- and load-specific. So I chose to use
a time delay.
I don't know how to configure logging to handle that case.
On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <gre...@gm...> wrote:
> I would have no objection to the command-line driver being there in a
> separate file in the package; it could well be useful to people.
It could well be. However:
1) this for me is more of a contractual obligation, in that
my client asked me nicely (it's not actually in the contract)
to put it into RDKit. I don't have much free time/funding
to do this.
2) The command-line tool was designed for my client, and while
I think it's general enough, I haven't gotten any feedback
about it. The isotope-label-as-atom-classes hack is my biggest
concern here.
3) As you all can see, the API is designed specifically for
the command-line tool. Stripping out the currently named
'logging' code would mean also modifying the command-line
driver. That's more work. Especially since the work I would
like to do is to improve the algorithm.
4) I would also have to get a sense of what RDKit's command-line
parameter style looks like, so that the code fits in.
Hence, rather more work than I would like to take on
when so far neither Greg nor my clients have asked for it.
Andrew
da...@da...
|
|
From: James S. <jsw...@gm...> - 2012-09-01 16:00:47
|
Sorry, hit send prematurely. I'd define a separate function FindMCSWithTimeout for the timeout parameter unless a majority of users will be concerned with this method's running time, but I'm not familiar with the algorithm's details. Best, James On Sat, Sep 1, 2012 at 8:57 AM, James Swetnam <jsw...@gm...> wrote: > Some pendantic and unsolicited API critiques, but as a user I figured I > should chime in. > > Agree with Greg's assertion to rename to FindMCS. > > def MCS(mols, min_num_atoms=2, > maximize = Default.maximize, > // Rename Default to something like MCSParams, and have a factory > function returning a DefaultMCSParams I assume this is the command pattern? > > atom_compare = Default.atom_compare, > bond_compare = Default.bond_compare, > match_valences = Default.match_valences, > ring_matches_ring_only = False, > complete_rings_only = False, > timeout=Default.timeout, > // Could you remove timeout? This is OpenBabel likeIt's orthogonal to > the method itself. You could define an additional method with > times=None, > > verbose=False, > // Logging should really be separated from API methods. You should be > using the python logging framework > <http://docs.python.org/library/logging.html>such that users > // can modify these globally and not through the API > verbose_delay=1.0, > // Also remove from this function > ): > > > On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <gre...@gm...>wrote: > >> On Sat, Sep 1, 2012 at 1:28 AM, Andrew Dalke <da...@da...> >> wrote: >> > I've started to add the fmcs MCS search code to RDKit. >> > By that I mean it does not include the command-line driver >> > which is part of the fmcs distribution; just the MCS search code. >> >> I would have no objection to the command-line driver being there in a >> separate file in the package; it could well be useful to people. >> >> > The interface to it is a single function which takes these >> > parameters. >> > >> > def MCS(mols, min_num_atoms=2, >> > maximize = Default.maximize, >> > atom_compare = Default.atom_compare, >> > bond_compare = Default.bond_compare, >> > match_valences = Default.match_valences, >> > ring_matches_ring_only = False, >> > complete_rings_only = False, >> > timeout=Default.timeout, >> > times=None, >> > verbose=False, >> > verbose_delay=1.0, >> > ): >> >> Any objection to calling it FindMCS? >> >> > >> > The steps I've done so far are: >> > - rename all other functions and classes to use the >> > "_" prefix to indicate that they are internal >> > - remove the command-line driver code >> > - removed all code uses of the word 'fmcs' >> > >> > The steps still to do are: >> > - use the right module name >> > - reformat the doc strings >> > - make it build properly >> > - add basic tests >> > - add the above to SVN and commit to SF >> > >> > I have some questions about those steps. >> > >> > 1) What is the right module name? Greg and I talked >> > about it a couple of months ago. I can't find the >> > paper where I wrote it down. >> > >> > Right now it is "rdkit.Chem.MCS", so that the actual >> > function to call is "rdkit.Chem.MCS.MCS". >> >> MCS is fine with me, the function would then be rdkit.Chem.MCS.FindMCS >> assuming that you agree with my function renaming suggestion. >> >> > 2) Is there any other documentation I should update >> > other than the docstrings? >> >> It would be nice to have something about it in the Getting Started >> guide, but I can do that. >> >> > >> > I'll have more questions which I get to the build system. >> > >> > >> > I'm not sure of the long-term coordination between >> > the RDKit and fmcs code bases. Greg would like to >> > push more of the RDKit code into C++ while I would >> > like fmcs to be portable across other toolkits. >> >> Since the rest of the list haven't been involved in these discussions, >> here's my motivation for that: >> having the MCS code in C++ increases the impact of the code: it allows >> it to be used from the cartridge, in the knime nodes, from other Java >> programs, and from C++ itself. There may also be performance benefits >> (I say "may" because I know that Andrew writes really good Python >> code), but those are at most a secondary motivation >> >> -greg >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Rdkit-devel mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-devel >> > > |
|
From: James S. <jsw...@gm...> - 2012-09-01 15:58:11
|
Some pendantic and unsolicited API critiques, but as a user I figured I
should chime in.
Agree with Greg's assertion to rename to FindMCS.
def MCS(mols, min_num_atoms=2,
maximize = Default.maximize,
// Rename Default to something like MCSParams, and have a factory
function returning a DefaultMCSParams I assume this is the command pattern?
atom_compare = Default.atom_compare,
bond_compare = Default.bond_compare,
match_valences = Default.match_valences,
ring_matches_ring_only = False,
complete_rings_only = False,
timeout=Default.timeout,
// Could you remove timeout? This is OpenBabel likeIt's orthogonal to the
method itself. You could define an additional method with
times=None,
verbose=False,
// Logging should really be separated from API methods. You should be
using the python logging framework
<http://docs.python.org/library/logging.html>such that users
// can modify these globally and not through the API
verbose_delay=1.0,
// Also remove from this function
):
On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <gre...@gm...>wrote:
> On Sat, Sep 1, 2012 at 1:28 AM, Andrew Dalke <da...@da...>
> wrote:
> > I've started to add the fmcs MCS search code to RDKit.
> > By that I mean it does not include the command-line driver
> > which is part of the fmcs distribution; just the MCS search code.
>
> I would have no objection to the command-line driver being there in a
> separate file in the package; it could well be useful to people.
>
> > The interface to it is a single function which takes these
> > parameters.
> >
> > def MCS(mols, min_num_atoms=2,
> > maximize = Default.maximize,
> > atom_compare = Default.atom_compare,
> > bond_compare = Default.bond_compare,
> > match_valences = Default.match_valences,
> > ring_matches_ring_only = False,
> > complete_rings_only = False,
> > timeout=Default.timeout,
> > times=None,
> > verbose=False,
> > verbose_delay=1.0,
> > ):
>
> Any objection to calling it FindMCS?
>
> >
> > The steps I've done so far are:
> > - rename all other functions and classes to use the
> > "_" prefix to indicate that they are internal
> > - remove the command-line driver code
> > - removed all code uses of the word 'fmcs'
> >
> > The steps still to do are:
> > - use the right module name
> > - reformat the doc strings
> > - make it build properly
> > - add basic tests
> > - add the above to SVN and commit to SF
> >
> > I have some questions about those steps.
> >
> > 1) What is the right module name? Greg and I talked
> > about it a couple of months ago. I can't find the
> > paper where I wrote it down.
> >
> > Right now it is "rdkit.Chem.MCS", so that the actual
> > function to call is "rdkit.Chem.MCS.MCS".
>
> MCS is fine with me, the function would then be rdkit.Chem.MCS.FindMCS
> assuming that you agree with my function renaming suggestion.
>
> > 2) Is there any other documentation I should update
> > other than the docstrings?
>
> It would be nice to have something about it in the Getting Started
> guide, but I can do that.
>
> >
> > I'll have more questions which I get to the build system.
> >
> >
> > I'm not sure of the long-term coordination between
> > the RDKit and fmcs code bases. Greg would like to
> > push more of the RDKit code into C++ while I would
> > like fmcs to be portable across other toolkits.
>
> Since the rest of the list haven't been involved in these discussions,
> here's my motivation for that:
> having the MCS code in C++ increases the impact of the code: it allows
> it to be used from the cartridge, in the knime nodes, from other Java
> programs, and from C++ itself. There may also be performance benefits
> (I say "may" because I know that Andrew writes really good Python
> code), but those are at most a secondary motivation
>
> -greg
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Rdkit-devel mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>
|