rdkit-devel Mailing List for RDKit (Page 13)

Open-Source Cheminformatics and Machine Learning

Brought to you by: glandrum

rdkit-devel — Developers information and discussion.

You can subscribe to this list here.

2006	Jan	Feb	Mar	Apr	May (24)	Jun (20)	Jul	Aug (2)	Sep (4)	Oct (39)	Nov (33)	Dec (8)
2007	Jan (17)	Feb (13)	Mar (35)	Apr (10)	May (1)	Jun (2)	Jul (3)	Aug (4)	Sep (4)	Oct (7)	Nov (1)	Dec
2008	Jan (10)	Feb (2)	Mar (2)	Apr (10)	May (8)	Jun (2)	Jul (1)	Aug (1)	Sep (3)	Oct (1)	Nov	Dec
2009	Jan (2)	Feb (1)	Mar (1)	Apr (1)	May	Jun (1)	Jul (7)	Aug (2)	Sep (6)	Oct (12)	Nov	Dec
2010	Jan (1)	Feb	Mar	Apr (2)	May (4)	Jun (2)	Jul (17)	Aug (7)	Sep (20)	Oct (8)	Nov (1)	Dec (12)
2011	Jan (8)	Feb (15)	Mar (20)	Apr (5)	May (8)	Jun (2)	Jul (17)	Aug (8)	Sep (4)	Oct (15)	Nov	Dec (2)
2012	Jan (3)	Feb	Mar (23)	Apr (2)	May (2)	Jun (8)	Jul (7)	Aug (18)	Sep (8)	Oct (10)	Nov (2)	Dec (7)
2013	Jan (6)	Feb (3)	Mar	Apr (3)	May (1)	Jun (1)	Jul (1)	Aug (2)	Sep	Oct (5)	Nov	Dec
2014	Jan (1)	Feb	Mar	Apr	May (10)	Jun	Jul (2)	Aug	Sep	Oct (7)	Nov (1)	Dec (6)
2015	Jan (22)	Feb	Mar (2)	Apr (5)	May (10)	Jun	Jul	Aug	Sep	Oct (3)	Nov (9)	Dec (3)
2016	Jan (2)	Feb (5)	Mar	Apr (31)	May (3)	Jun (2)	Jul	Aug	Sep	Oct (4)	Nov (10)	Dec (7)
2017	Jan	Feb (7)	Mar (3)	Apr (6)	May (4)	Jun (6)	Jul (5)	Aug (1)	Sep (7)	Oct (1)	Nov	Dec
2018	Jan	Feb	Mar (11)	Apr (13)	May (18)	Jun (1)	Jul	Aug	Sep	Oct (5)	Nov (3)	Dec
2019	Jan	Feb	Mar	Apr (10)	May (4)	Jun	Jul	Aug	Sep	Oct (2)	Nov (1)	Dec (2)
2020	Jan (2)	Feb	Mar (5)	Apr (2)	May	Jun	Jul (4)	Aug	Sep	Oct (2)	Nov	Dec
2021	Jan	Feb	Mar (4)	Apr	May (1)	Jun	Jul (3)	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 11 12 13 14 15 .. 35 > >> (Page 13 of 35)

Re: [Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling

From: Andrew D. <da...@da...> - 2012-12-10 15:52:09

On Dec 8, 2012, at 7:01 AM, Greg Landrum wrote:
> It's a pretty small API change, but there's a huge amount of code that needs to be changed in the back and lots and lots of testing that has to be done, so this is going to take a while.

Speaking of hydrogens, I came across a strange query (part of the
BindingDB_structure set from my Structure Query Collection).

     C1N[H]OC[H]1

Because of RDKit's sanitization, the '[H]' gets absorbed into the atoms.
This breaks a bond, so what started as a ring with poor chemistry ends
up as two linear pieces with sane chemistry.

>>> mol = Chem.MolFromSmiles("C1N[H]OC[H]1")
>>> Chem.MolToSmiles(mol)
'CN.OC'
>>> 

The above query is nonsensical, but I don't think sanitization should
modify the topology of the input.

Divalent hydrogen does exist in a very small number of real records.
I once came across a couple of structure with a 4-membered boron/hydrogen
ring, like this:

>>> mol = Chem.MolFromSmiles("[B]1[H][B][H]1")
>>> Chem.MolToSmiles(mol)
'[BH].[BH]'
>>> 

I am unable to find a public record with that core, to act as a
more realistic test case.

				Andrew
				da...@da...

Re: [Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling

From: Andrew D. <da...@da...> - 2012-12-10 12:32:42

On Dec 9, 2012, at 8:06 PM, Gianluca Sforna wrote:
> I believe a commonly accepted approach to this kind of issues is something like:
> 
> 1. add warnings about the deprecated or changed methods, but keep the
> current behavior so code do not break, but becomes verbose when called
   ...

That progressive set of changes is very context dependent.

Some projects have the above policy, but not all. Even within a
project, some parts of a package can break backwards compatibility
while others do not.

For an obvious example, if the API is broken, then there's no need
to go through this very careful transition plan. No one can be using
it so just fix the API.

Similarly, even if the API is working (as this one is), it may be
that no one is using it. Or it may be that one person is using it
(like I am) but the migration path from the old API to the new one
is only a few lines of code, while the internal changes to support
both APIs is very complicated.

I believe Greg's proposal is an example of the latter. That is,
I suspect that:
  1) few people currently use the API,
  2) of those, most can trivially port to the new API, and
  3) supporting both the old and new APIs is complicated

I have no problems with doing a flag-day change for the Q1 or Q2 release.

				Andrew
				da...@da...

Re: [Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling

From: Andrew D. <da...@da...> - 2012-12-10 12:12:02

On Dec 8, 2012, at 7:01 AM, Greg Landrum wrote:
> Please let me know if this sounds reasonable or if you have suggestions on how to handle the logistics more cleanly.

It looks quite reasonable to me.

Only a bit of my code will be affected by the change, and that's code
which is just at the beginning of development, so it's not deployed anywhere.



				Andrew
				da...@da...

Re: [Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling

From: Christos K. <chr...@gm...> - 2012-12-09 19:16:03

Gianluca 's plan is better.
I think step 5 can be the step where depreciated methods exist but they
are  redirecting to the new methods if this is possible, otherwise there
have to be removed.
The key point to the whole plan is to give users the time to update their
code.

Regards,
Christos

Sent from my Galaxy Note!
On Dec 9, 2012 8:06 PM, "Gianluca Sforna" <gi...@gm...> wrote:

> On Sat, Dec 8, 2012 at 7:01 AM, Greg Landrum <gre...@gm...>
> wrote:
> > My current plan is to keep this branch in sync with the trunk through (at
> > least) the Q4 2012 release and then consider moving it onto the trunk and
> > creating a v1 API branch in Q1 of next year. I guess I should be able to
> > keep the v1 API branch in sync, at least in terms of bug fixes, for
> another
> > 2-3 releases.
>
> I believe a commonly accepted approach to this kind of issues is something
> like:
>
> 1. add warnings about the deprecated or changed methods, but keep the
> current behavior so code do not break, but becomes verbose when called
> 2. add the v2 methods.
> 3. release Q4 2012
> 4. keep the warnings around as much as you feel is fair to give time
> for the clients to port code
> 5. make the real break (remove deprecated methods, change method signature)
> 6. release a new version with the api break
>
> Cheers
>
> G.
>
> --
> Gianluca Sforna
>
> http://morefedora.blogspot.com
> http://identi.ca/giallu - http://twitter.com/giallu
>
>
> ------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> _______________________________________________
> Rdkit-devel mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>

Re: [Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling

From: Gianluca S. <gi...@gm...> - 2012-12-09 19:06:27

On Sat, Dec 8, 2012 at 7:01 AM, Greg Landrum <gre...@gm...> wrote:
> My current plan is to keep this branch in sync with the trunk through (at
> least) the Q4 2012 release and then consider moving it onto the trunk and
> creating a v1 API branch in Q1 of next year. I guess I should be able to
> keep the v1 API branch in sync, at least in terms of bug fixes, for another
> 2-3 releases.

I believe a commonly accepted approach to this kind of issues is something like:

1. add warnings about the deprecated or changed methods, but keep the
current behavior so code do not break, but becomes verbose when called
2. add the v2 methods.
3. release Q4 2012
4. keep the warnings around as much as you feel is fair to give time
for the clients to port code
5. make the real break (remove deprecated methods, change method signature)
6. release a new version with the api break

Cheers

G.

-- 
Gianluca Sforna

http://morefedora.blogspot.com
http://identi.ca/giallu - http://twitter.com/giallu

[Rdkit-devel] Rethinking the RDKit's implicit hydrogen handling

From: Greg L. <gre...@gm...> - 2012-12-08 06:01:55

Hi,

This is an unusually long email for me, so here's the TL;DR version:
The RDKit handling of implicit hydrogens is confusing. This leads to
needlessly complicated code and, almost certainly, bugs. I'm proposing to
fix it, but doing so requires an API change. This work will be done on a
branch until we decide that it should move over to the trunk.



# Background

Like most cheminformatics toolkits, the RDKit by default stores molecules
without hydrogens being explicitly present in the molecular graph. This is
done for efficency reasons: there's no need to store all those extra atoms
if you can implicitly know that they're there. Thanks to the octet rule,
it's pretty easy to know when Hs should be attached to the common elements
encountered in organic chemistry. This leads to the concept of the
"implicit hydrogen": an H that you know is there but which isn't explicitly
represented.

This is wonderfully clear in SMILES: you can write ethane as "CC" instead
of "[H]C([H])([H])C([H])([H])[H]" or "[CH3][CH3]". For elements out of the
"organic subset" -- where the octet rule may be a bit more relaxed -- or in
cases where the octet rule is violated, you have to provide the H count in
the SMILES. For example: "[SiH3][SiH3]" or the radical "[CH2]C". The rules
for SMILES stipulate that if the atom is in square brackets, for whatever
reason, you must provide the H count. For example "Cl[C@H](C)F". As an
aside: you also sometimes need to do this in aromatic heterocycles, where
it's not always clear where if an H should be present. So pyrrole is
written as "c1cccc[nH]1", not "c1ccccn1".

Back in the very early days of designing and implementing the RDKit, for
reasons I can no longer really remember (give me a break! it was more than
10 years ago!), I decided that there should be a difference between
"specified implicit Hs", those that are inside square brackets, and
"default implicit Hs", those that don't show up in the SMILES at all
(thanks to Andrew Dalke for the terms). Thus there are the methods
Atom.GetNumImplicitHs and Atom.GetNumExplicitHs (I will use the Python API
names throughout this since I'm showing examples from Python). These don't
necessarily behave the way one would expect them to:

     In [2]: m = Chem.MolFromSmiles('C[CH3]')

     In [3]: m.GetAtomWithIdx(0).GetNumImplicitHs()
     Out[3]: 3

     In [4]: m.GetAtomWithIdx(1).GetNumImplicitHs()
     Out[4]: 0

     In [5]: m.GetAtomWithIdx(0).GetNumExplicitHs()
     Out[5]: 0

     In [6]: m.GetAtomWithIdx(1).GetNumExplicitHs()
     Out[6]: 3

You really need to use Atom.GetTotalNumHs() to safely figure out how many
Hs an atom has:

    In [7]: m.GetAtomWithIdx(0).GetTotalNumHs()
    Out[7]: 3

    In [8]: m.GetAtomWithIdx(1).GetTotalNumHs()
    Out[8]: 3


This has been bothering me for a long time, but fixing it requires a change
to the API which will break code. I'm really, really resistant to doing
that, but I don't see any way to clean things up without an API change.


# Fixing the problem

Here's a preliminary list of API changes:

The following methods will be removed from the API:
  - Atom.{Get,Set}NumExplicitHs()
  - Atom.{Get,Set}NoImplicit()
The following methods will be added:
  - Atom.{Get,Set}NoImplicitCalculation(): equivalent to the current
Atom.{Get,Set}NoImplicit()
  - Atom.SetNumImplicitHs()
The following methods will be modified:
  - Atom.GetTotalNumHs(): the optional "includeNeighbors" argument will be
removed; it will always include neighbors.

Things may be added to this over time.

# Practical considerations

It's a pretty small API change, but there's a huge amount of code that
needs to be changed in the back and lots and lots of testing that has to be
done, so this is going to take a while.

I'm currently working on a branch (note: this is definitely not yet stable
enough to rely on it working at all):
https://svn.code.sf.net/p/rdkit/code/branches/AlternateValence_21Nov2012

My current plan is to keep this branch in sync with the trunk through (at
least) the Q4 2012 release and then consider moving it onto the trunk and
creating a v1 API branch in Q1 of next year. I guess I should be able to
keep the v1 API branch in sync, at least in terms of bug fixes, for another
2-3 releases.

Please let me know if this sounds reasonable or if you have suggestions on
how to handle the logistics more cleanly.


# Thanks

Roger Sayle, perhaps inadvertently, provide the final bit of impetus
required to make this happen by submitting a nice little patch to update
the way implicit valence is calculated. Integrating this simple patch
forced me to confront how ugly the current handling of implicit Hs is.


Best Regards,
-greg

Re: [Rdkit-devel] [Rdkit-discuss] PBF Code

From: Greg L. <gre...@gm...> - 2012-11-17 04:01:19

Dear all,

On Mon, Oct 29, 2012 at 10:57 AM, Nicholas Firth
<Nic...@ic...>wrote:

> Hi Guys,
> I have managed to be allowed to open source the code from my paper. Greg I
> couldn't work if I was supposed to add this to the contrib or if you do it,
> so I've just attached it to the email. The code is a 'working'
> implementation although the error handling is not my finest work.
>

Nick and I did some revisions of his code. The most recent version is now
checked into $RDBASE/Contrib/PBF
It's C++ only at the moment.

Many thanks to Nick for the contribution.

-greg

[Rdkit-devel] C# wrappers for the RDKit

From: Greg L. <gre...@gm...> - 2012-11-01 11:35:38

Dear all,

After some key assistance from a colleague (thanks Nico!), I've managed to
build C# wrappers for the RDKit that are useable from within Visual Studio
projects. These use the same SWIG configuration as the RDKit java wrappers,
so the API should be more or less the same in both Java and C#.

I suspect it's going to take a few iterations to get the packaging and
documentation of these things done in a form that's actually useful to
others, so I'm looking for volunteers.

If anyone in the community is interested in trying out the new wrappers and
helping to refine the build process, please let me know. If you're really
feeling brave, the wrappers are in $RDBASE/Code/JavaWrappers/csharp_wrapper.

-greg

Re: [Rdkit-devel] [Rdkit-discuss] PBF Code

From: Greg L. <gre...@gm...> - 2012-10-29 12:32:15

Dear Nick,

On Mon, Oct 29, 2012 at 10:57 AM, Nicholas Firth
<Nic...@ic...>wrote:

> Hi Guys,
> I have managed to be allowed to open source the code from my paper. Greg I
> couldn't work if I was supposed to add this to the contrib or if you do it,
> so I've just attached it to the email. The code is a 'working'
> implementation although the error handling is not my finest work.
>

Great! Thanks for the contribution. I will get it integrated into the
contrib are in SVN sometime in the next couple days and let you know when
it's there.

Best,
-greg

[Rdkit-devel] PBF Code

From: Nicholas F. <Nic...@ic...> - 2012-10-29 10:18:55

Attachments: PBFRDKit.cpp ATT00001.htm bestfit.cpp ATT00002.htm PBFRDKit.h ATT00003.htm bestfit.h ATT00004.htm

<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"><div></div></body></html>

[Rdkit-devel] Question: Does anyone still need the Windows python 2.6 RDKit build?

From: Greg L. <gre...@gm...> - 2012-10-24 06:09:44

Dear all,

Starting with the 2012.12 release, I would like to stop doing python
2.6 builds of the RDKit for Windows. In addition to the
"administrative" overhead of having to do the build, package it, and
upload it, having a second Python installation on my Windows machine
is making the process of configuring and running the build more
complex than I think it needs to be. This has an impact on my ability
to update and simplify the windows build instructions (one of my ToDo
items from the UGM).

The Python 2.6 binaries are downloaded a lot less often than the 2.7
versions, so it seems like this is unlikely to cause major problems,
but I'm willing to keep doing the builds for at least a few more
releases if they are important to someone here.

Best,
-greg

[Rdkit-devel] 2012.09 (Q3 2012) RDKit release

From: Greg L. <gre...@gm...> - 2012-10-21 03:12:37

I'm very happy to announce that the next version of the RDKit --
2012.09 (a.k.a Q3 2012) -- is released.

The release notes are below.

The source release is on the sourceforge downloads page:
http://sourceforge.net/projects/rdkit/files/rdkit/Q3_2012/
The files can also be downloaded from the google project page:
http://code.google.com/p/rdkit/downloads/list
The binaries for Windows, Python 2.6 and Python 2.7, have also been uploaded.

The online version of the documentation at rdkit.org has been updated.

Thanks to the everyone who submitted bug reports and suggestions for
this release!

Please let me know if you find any problems with the release or have
suggestions for the next one.

-greg

******  Release_2012.09.1 *******
(Changes relative to Release_2012.06.1)

!!!!!! IMPORTANT !!!!!!
 - Some of the bug fixes affect the generation of SMILES. Canonical
   SMILES generated with this version of the RDKit will be different
   from previous versions.
 - The fix to Issue 252 (see below) will lead to changes in calculated
   logP and MR values for some compounds.
 - The fix to Issue 254 (see below) will lead to changes in some
   descriptors and geometries for sulfur-containing compounds.
 - The fix to Issue 256 (see below) has changed the name of the
   optional argument to mol.GetNumAtoms from onlyHeavy to
   onlyExplicit. For compatibility reasons, Python code that uses
   explicitly uses onlyHeavy will still work, but it will generate
   warnings. This compatibility will be removed in a future release.

Acknowledgements:
Gianpaolo Bravi, David Cosgrove, Andrew Dalke, Fabian Dey, James
Davidson, JP Ebejer, Gabriele Menna, Stephan Reiling, Roger Sayle,
James Swetnam

Bug Fixes:
- The molecules that come from mergeQueryHs() now reset the RingInfo
  structure. (issue 245)
- The output from MurckoScaffold.MakeScaffoldGeneric no longer
  includes stereochemistry or explicit Hs. (issue 246)
- D and T atoms in CTABs now have their isotope information
  set. (issue 247)
- Some problems with ring finding in large, complex molecules have
  been fixed. (issue 249)
- The "rootedAtAtom" argument for FindAllSubgraphsOfLengthN is now
  handled properly. (issue 250)
- Bonds now have a SetProp() method available in Python. (issue 251)
- A number of problems with the Crippen atom parameters have been
  fixed. (issue 252)
- Ring closure digits are no longer repeated on the same atom in
  SMILES generated by the RDKit. (issue 253)
- Non-ring sulfur atoms adjacent to aromatic atoms are no longer set
  to be SP2 hybridized. This allows them to be stereogenic. (issue
  254)
- The combineMols() function now clears computed properties on the
  result molecule.
- A couple of problems with the pickling functions on big endian
  hardware were fixed.
- The molecule drawing code now uses isotope information
- Superscript/Subscript handling in the agg canvas has been improved.
- SKP lines in CTABS are now propertly handled. (Issue 255)
- The name of the optional argument to mol.GetNumAtoms has been
  changed from onlyHeavy to onlyExplicit. The method counts the number
  of atoms in the molecular graph, not the number of heavy
  atoms. These numbers happen to usually be the same (which is why
  this has taken so long to show up), but there are exceptions if Hs
  or dummy atoms are in the graph. (Issue 256)
- Unknown bonds in SMILES are now output using '~' instead of '?'. The
  SMILES parser now recognizes '~' as an "any bond" query. (Issue 257)
- Lines containing only white space in SDF property blocks are no
  longer treated as field separators.
- Transition metals and lanthanides no longer have default valences
  assigned.

New Features:
- The RDKit now has a maximum common substructure (MCS) implementation
  contributed by Andrew Dalke. This is currently implemented in Python
  and is available as: from rdkit.Chem import MCS Documentation is
  available as a docstring for the function MCS.FindMCS and in the
  GettingStarted document.
- A few new functions have been added to rdkit.Chem.Draw:
  MolsToImage(), MolsToGridImage(), ReactionToImage()
- CalcMolFormula() now provides the option to include isotope
  information.
- The RDKit and Layered fingerprinters both now accept "fromAtoms"
  arguments that can be used to limit which atoms contribute to the
  fingerprint.
- Version information is now available in the Java wrapper.
- The descriptor NumRadicalElectrons is now available.
- The PyMol interface now supports a GetPNG() method which returns the
  current contents of the viewer window as an PIL Image object.
- Molecules (ROMol in C++, rdkit.Chem.Mol in Python) now have a
  getNumHeavyAtoms() method.
- Component-level grouping (parens) can be used in reaction SMARTS.


New Database Cartridge Features:
- support for molecule <-> pickle conversion via the functions
  mol_to_pkl, mol_from_pkl, and is_valid_mol_pkl.
- support for bit vector <-> binary text conversion via the functions
  bfp_to_binary_text, bfp_from_binary_text

New Java Wrapper Features:

Deprecated modules (to be removed in next release):

Removed modules:

Other:
- During this release cycle, the sourceforge project was updated to
  their new hosting system. This explains the change in bug/issue
  ids.
- the SMILES parser is now substantially faster.
- The molecular drawings generated by Code/Demo/RDKit/Draw/MolDrawing.h
  have been improved.
- There is now demo code availble for using the C++ drawing code
  within Qt applications. (contributed by David Cosgrove)
- The directory $RDBASE/Regress now contains sample data and
  scripts for benchmarking the database cartridge.
- Fused-ring aromaticity is now only considered in rings of up to size
  24.
- It is no longer necessary to have flex and bison installed in order
  to build the RDKit.

[Rdkit-devel] Cheminformatician / developer position at Vernalis

From: James D. <J.D...@ve...> - 2012-10-15 05:20:54

Dear All,

I wondered if I could take a moment to alert you to a new position here
at Vernalis. The job is advertised on our site
(http://www.vernalis.com/about-us/careers-at-vernalis) and on CCL
(http://www.ccl.net/cca/jobs/joblist/mess0025266.shtml).

Kind regards

James

______________________________________________________________________
PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or pos...@ve.... Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 0000

To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page..
______________________________________________________________________

[Rdkit-devel] Beta 2 of Q3 2012 release available

From: Greg L. <gre...@gm...> - 2012-10-09 05:36:41

Dear all,

Because of some last minute bug fixing, particularly issue 256
(http://sourceforge.net/p/rdkit/bugs/256/), I've decided to do a
second beta before the next RDKit release.

Today I tagged the beta for the 2012.09 release in svn:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/tags/Release_2012_09_1beta2/

and uploaded a source distribution to the google code site:
http://rdkit.googlecode.com/files/RDKit_2012_09_1beta2.tgz
The google code page also has binaries for python2.7 on win32
(http://rdkit.googlecode.com/files/RDKit_2012_09_1beta2.win32.py27.zip).

If no show-stopper bugs appear, I will do the release itself at the
end of next week.

Excerpts from the updated release notes are below.

Best Regards,
-greg

******  Release_2012.09.1 *******
(Changes relative to Release_2012.06.1)

!!!!!! IMPORTANT !!!!!!
 - Some of the bug fixes affect the generation of SMILES. Canonical
   SMILES generated with this version of the RDKit will be different
   from previous versions.
 - The fix to Issue 252 (see below) will lead to changes in calculated
   logP and MR values for some compounds.
 - The fix to Issue 254 (see below) will lead to changes in some
   descriptors and geometries for sulfur-containing compounds.
 - The fix to Issue 256 (see below) has changed the name of the
   optional argument to mol.GetNumAtoms from onlyHeavy to
   onlyExplicit. For compatibility reasons, Python code that uses
   explicitly uses onlyHeavy will still work, but it will generate
   warnings. This compatibility will be removed in a future release.

Acknowledgements:
Gianpaolo Bravi, David Cosgrove, Andrew Dalke, Fabian Dey, James
Davidson, JP Ebejer, Gabriele Menna, Stephan Reiling, James Swetnam

Bug Fixes:
- The molecules that come from mergeQueryHs() now reset the RingInfo
  structure. (issue 245)
- The output from MurckoScaffold.MakeScaffoldGeneric no longer
  includes stereochemistry or explicit Hs. (issue 246)
- D and T atoms in CTABs now have their isotope information
  set. (issue 247)
- Some problems with ring finding in large, complex molecules have
  been fixed. (issue 249)
- The "rootedAtAtom" argument for FindAllSubgraphsOfLengthN is now
  handled properly. (issue 250)
- Bonds now have a SetProp() method available in Python. (issue 251)
- A number of problems with the Crippen atom parameters have been
  fixed. (issue 252)
- Ring closure digits are no longer repeated on the same atom in
  SMILES generated by the RDKit. (issue 253)
- Non-ring sulfur atoms adjacent to aromatic atoms are no longer set
  to be SP2 hybridized. This allows them to be stereogenic. (issue
  254)
- The combineMols() function now clears computed properties on the
  result molecule.
- A couple of problems with the pickling functions on big endian
  hardware were fixed.
- The molecule drawing code now uses isotope information
- Superscript/Subscript handling in the agg canvas has been improved.
- The name of the optional argument to mol.GetNumAtoms has been
  changed from onlyHeavy to onlyExplicit. The method counts the number
  of atoms in the molecular graph, not the number of heavy
  atoms. These numbers happen to usually be the same (which is why
  this has taken so long to show up), but there are exceptions if Hs
  or dummy atoms are in the graph. (Issue 256)

New Features:
- The RDKit now has a maximum common substructure (MCS) implementation
  contributed by Andrew Dalke. This is currently implemented in Python
  and is available as: from rdkit.Chem import MCS Documentation is
  available as a docstring for the function MCS.FindMCS and in the
  GettingStarted document.
- A few new functions have been added to rdkit.Chem.Draw:
  MolsToImage(), MolsToGridImage(), ReactionToImage()
- CalcMolFormula() now provides the option to include isotope
  information.
- The RDKit and Layered fingerprinters both now accept "fromAtoms"
  arguments that can be used to limit which atoms contribute to the
  fingerprint.
- Version information is now available in the Java wrapper.
- The descriptor NumRadicalElectrons is now available.
- The PyMol interface now supports a GetPNG() method which returns the
  current contents of the viewer window as an PIL Image object.
- Molecules (ROMol in C++, rdkit.Chem.Mol in Python) now have a
  getNumHeavyAtoms() method.

New Database Cartridge Features:
- support for molecule <-> pickle conversion via the functions
  mol_to_pkl, mol_from_pkl, and is_valid_mol_pkl.
- support for bit vector <-> binary text conversion via the functions
  bfp_to_binary_text, bfp_from_binary_text

New Java Wrapper Features:

Deprecated modules (to be removed in next release):

Removed modules:

Other:
- During this release cycle, the sourceforge project was updated to
  their new hosting system. This explains the change in bug/issue
  ids.
- the SMILES parser is now substantially faster.
- The molecular drawings generated by Code/Demo/RDKit/Draw/MolDrawing.h
  have been improved.
- There is now demo code availble for using the C++ drawing code
  within Qt applications. (contributed by David Cosgrove)
- The directory $RDBASE/Regress now contains sample data and
  scripts for benchmarking the database cartridge.
- Fused-ring aromaticity is now only considered in rings of up to size
  24.
- It is no longer necessary to have flex and bison installed in order
  to build the RDKit.

Re: [Rdkit-devel] RDKit on Github

From: Gianluca S. <gi...@gm...> - 2012-10-05 19:15:13

On Fri, Oct 5, 2012 at 4:46 PM, Greg Landrum <gre...@gm...> wrote:
> On Fri, Oct 5, 2012 at 9:27 AM, Gianluca Sforna <gi...@gm...> wrote:
> What is involved in doing the mirroring? We could explore having it
> run from the VM that hosts rdkit.org if it would help.

I run this script hourly:
cd /home/giallu/sources/rdkit-code
git checkout -q master
git svn -qq rebase > /dev/null
git push -q rdkit@github master

Running in the VM is doable, I just need to dig the commands I used to
clone the repo in the first place

-- 
Gianluca Sforna

http://morefedora.blogspot.com
http://identi.ca/giallu - http://twitter.com/giallu

Re: [Rdkit-devel] RDKit on Github

From: Greg L. <gre...@gm...> - 2012-10-05 14:47:03

On Fri, Oct 5, 2012 at 9:27 AM, Gianluca Sforna <gi...@gm...> wrote:
>
> Just one note about the repo. I'm mirroring it hourly from SVN, but
> this is done from my own PC so, if it is offline, there could be
> delays on updates. Otherwise, it should be pretty much in sync with
> the master SVN repo.

What is involved in doing the mirroring? We could explore having it
run from the VM that hosts rdkit.org if it would help.

-greg

Re: [Rdkit-devel] RDKit on Github

From: Gianluca S. <gi...@gm...> - 2012-10-05 07:27:56

On Fri, Oct 5, 2012 at 7:36 AM, Greg Landrum <gre...@gm...> wrote:
> I mentioned at the UGM that Gianluca Sforna has set up a mirror of the
> RDKit subversion repository.
> The repository is here: https://github.com/rdkit/rdkit
>

Thanks Greg, I really looked forward to join you for the UGM, but
unfortunately could not. Thanks also to Nathan Brown for covering the
event on Twitter.

Just one note about the repo. I'm mirroring it hourly from SVN, but
this is done from my own PC so, if it is offline, there could be
delays on updates. Otherwise, it should be pretty much in sync with
the master SVN repo.

Cheers

Gianluca

-- 
Gianluca Sforna

http://morefedora.blogspot.com
http://identi.ca/giallu - http://twitter.com/giallu

[Rdkit-devel] RDKit on Github

From: Greg L. <gre...@gm...> - 2012-10-05 05:36:39

Dear all,

I mentioned at the UGM that Gianluca Sforna has set up a mirror of the
RDKit subversion repository.
The repository is here: https://github.com/rdkit/rdkit

I plan to continue to use svn as the primary repository of the code
for the foreseeable future, but the git repository provides a nice
option for people who want to use a distributed version control
system. Feel free to submit suggested changes/patches via git pull
requests. I will figure out how to re-integrated accepted patches into
the svn trunk. In the near term, it is probably a good idea to assume
that I won't see pull requests (since I'm not usually using git) and
to either send me email or post to rdkit-devel.

Many thanks for Gianluca for setting the git mirror up!

-greg

Re: [Rdkit-devel] fmcs in RDKit

From: Andrew D. <da...@da...> - 2012-09-25 20:42:12

I made the last of the code changes. The public API now uses RDKit-style
mixedCase parameters instead of PEP 8-style underscore_names.

On Sep 25, 2012, at 4:44 AM, Greg Landrum wrote:
> Is there a python documentation generator that can process that and do
> something sensible with it? If so, I'm fine with it. If not, then
> maybe the @type bits can be combined into the @param somehow (yes, it
> would just be text then).

That was the theory. I've looked it over and greatly improved the
documentation by removing that autodoc section and using prose, like:

    """Find the maximum common substructure of a set of molecules

    In the simplest case, pass in a list of molecules and get back
    an MCSResult object which describes the MCS:

    >>> from rdkit import Chem
    >>> mols = [Chem.MolFromSmiles("C#CCP"), Chem.MolFromSmiles("C=CCO")]
    >>> from rdkit.Chem import MCS
    >>> MCS.FindMCS(mols)
    MCSResult(numAtoms=2, numBonds=1, smarts='[#6]-[#6]', completed=1)

    The SMARTS '[#6]-[#6]' matches the largest common substructure of
    the input structures. It has 2 atoms and 1 bond. If there is no
    MCS which is at least `minNumAtoms` in size then the result will set
    numAtoms and numBonds to -1 and set smarts to None.
    ...

> The API docs are automatically generated. It would be nice to have a
> brief section for the GettingStarted document
> ($RDBASE/Docs/Book/GettingStartedInPython.rst), but I am willing to
> write that if you don't have time.

I've copied the doctoring, with slight modifications, into the Getting
Started document. One of the modifications was to show an example of
using three molecules, since the docstring only shows pairwise examples.

I don't like the document duplication. I'm not going to worry much
about it though.

Feel free to change as you wish.

				Andrew
				da...@da...

Re: [Rdkit-devel] fmcs in RDKit

From: Stiefl, N. <nik...@no...> - 2012-09-25 08:02:07

Hi Andrew - that's great - thanks!
Maybe SubsetMCS would be good since it will be for a subset of the entire
system?
Ciao
Nik



On 9/25/12 4:44 AM, "Greg Landrum" <gre...@gm...> wrote:

>On Tue, Sep 25, 2012 at 4:30 AM, Andrew Dalke <da...@da...>
>wrote:
>> The next quarter release is coming soon, which was good incentive for
>>me to
>> work on the new MCS code for RDKit.
>
>This is great news. Andrew's MCS code fills an important gap in the
>current RDKit functionality.
>
>
>> Here's the API and documentation.
>>
>>
>> def FindMCS(mols, min_num_atoms=2,
>>             maximize = Default.maximize,
>>             atom_compare = Default.atom_compare,
>>             bond_compare = Default.bond_compare,
>>             match_valences = Default.match_valences,
>>             ring_matches_ring_only = False,
>>             complete_rings_only = False,
>>             timeout=Default.timeout,
>>             ):
>>     """Find the maximum common substructure of a set of molecules
>>
>>     @type  mols: molecule iterator
>>     @param mols: find the MCS of these molecules
>>     @type min_num_atoms: integer
>>     @param min_num_atoms: The minimum number of atoms which must be in
>>the MCS.
>>               The minimim value is 2.
>>     @type maximize: "atoms" or "bonds"
>>     @param maximize: The default "atoms" maximizes the number of atoms
>>in
>>               the MCS. Use "bonds" to maximize the number of bonds
>>instead.
>>     @type atom_compare: "any", "elements", or "isotopes"
>>     @param atom_compare: Specify the atom comparison function. The
>>default "elements"
>>               says that two atoms are the same if and only if they have
>>the same
>>               element number. Use "isotopes" if you are using isotope
>>labels to
>>               define your own atom classs. With "any", all atoms match
>>each other.
>>     @type bond_compare: "any" or "bondtypes"
>>     @param bond_compare: Specify the bond comparison function. The
>>default
>>               "bondtypes" says that two bonds are the same if and only
>>if they
>>               have the same bond type. With "any", all bonds match each
>>other.
>>     @type match_valences: boolean
>>     @param match_valences: If True, atoms must also have matching
>>valences
>>               to match. By default this is False.
>>     @type ring_matches_ring_only: boolean
>>     @param ring_matches_ring_only: If True, then both bonds must either
>>be in
>>               a ring or not in a ring in order to match. By default
>>this is False.
>>     @type ring_matches_ring_only: boolean
>>     @param ring_matches_ring_only: If True, then both bonds must either
>>be in
>>               a ring or not in a ring in order to match. By default
>>this is False.
>>     @type complete_rings_only: boolean
>>     @param complete_rings_only: If True, then if a ring bond of a
>>molecule
>>               is in the MCS then the corresponding MCS bond is also in
>>a ring.
>>     @type timeout: float
>>     @param timeout: stop search after 'timeout' seconds and report the
>>current
>>               best MCS.
>>
>>     @rtype: MCSResult
>>     @return: Information about the MCS search results. Attributes are
>>'completed'
>>         (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds',
>>and 'smarts'.
>>
>>
>> Is that too unreadable? I can rewrite that to be more prose than
>> this sort of auto-documentation format.
>
>Is there a python documentation generator that can process that and do
>something sensible with it? If so, I'm fine with it. If not, then
>maybe the @type bits can be combined into the @param somehow (yes, it
>would just be text then).
>
>
>> I wasn't sure of how/where to update the documentation. Greg? Do you
>>have
>> an idea of where it might go? Or perhaps want to do it yourself?
>
>The API docs are automatically generated. It would be nice to have a
>brief section for the GettingStarted document
>($RDBASE/Docs/Book/GettingStartedInPython.rst), but I am willing to
>write that if you don't have time.
>
>> In other news, I'll be presenting the MCS work at the Goslar conference
>> this fall, and I've got some funding to work on finding a threshold MCS,
>> that is, the maximum common substructure which is in at least some
>>percentage
>> threshold of the entire system.
>
>Congrats! That would definitely be useful.
>
>> Can anyone suggest a good name for that? Threshold MCS? Something else?
>
>"Threshold MCS" sounds ok to me, but I'm not great at naming things.
>
>-greg
>
>--------------------------------------------------------------------------
>----
>Live Security Virtual Conference
>Exclusive live event will cover all the ways today's security and
>threat landscape has changed and how IT managers can respond. Discussions
>will include endpoint security, mobile security and the latest in malware
>threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>_______________________________________________
>Rdkit-devel mailing list
>Rdk...@li...
>https://lists.sourceforge.net/lists/listinfo/rdkit-devel

Re: [Rdkit-devel] fmcs in RDKit

From: Greg L. <gre...@gm...> - 2012-09-25 02:45:18

On Tue, Sep 25, 2012 at 4:30 AM, Andrew Dalke <da...@da...> wrote:
> The next quarter release is coming soon, which was good incentive for me to
> work on the new MCS code for RDKit.

This is great news. Andrew's MCS code fills an important gap in the
current RDKit functionality.


> Here's the API and documentation.
>
>
> def FindMCS(mols, min_num_atoms=2,
>             maximize = Default.maximize,
>             atom_compare = Default.atom_compare,
>             bond_compare = Default.bond_compare,
>             match_valences = Default.match_valences,
>             ring_matches_ring_only = False,
>             complete_rings_only = False,
>             timeout=Default.timeout,
>             ):
>     """Find the maximum common substructure of a set of molecules
>
>     @type  mols: molecule iterator
>     @param mols: find the MCS of these molecules
>     @type min_num_atoms: integer
>     @param min_num_atoms: The minimum number of atoms which must be in the MCS.
>               The minimim value is 2.
>     @type maximize: "atoms" or "bonds"
>     @param maximize: The default "atoms" maximizes the number of atoms in
>               the MCS. Use "bonds" to maximize the number of bonds instead.
>     @type atom_compare: "any", "elements", or "isotopes"
>     @param atom_compare: Specify the atom comparison function. The default "elements"
>               says that two atoms are the same if and only if they have the same
>               element number. Use "isotopes" if you are using isotope labels to
>               define your own atom classs. With "any", all atoms match each other.
>     @type bond_compare: "any" or "bondtypes"
>     @param bond_compare: Specify the bond comparison function. The default
>               "bondtypes" says that two bonds are the same if and only if they
>               have the same bond type. With "any", all bonds match each other.
>     @type match_valences: boolean
>     @param match_valences: If True, atoms must also have matching valences
>               to match. By default this is False.
>     @type ring_matches_ring_only: boolean
>     @param ring_matches_ring_only: If True, then both bonds must either be in
>               a ring or not in a ring in order to match. By default this is False.
>     @type ring_matches_ring_only: boolean
>     @param ring_matches_ring_only: If True, then both bonds must either be in
>               a ring or not in a ring in order to match. By default this is False.
>     @type complete_rings_only: boolean
>     @param complete_rings_only: If True, then if a ring bond of a molecule
>               is in the MCS then the corresponding MCS bond is also in a ring.
>     @type timeout: float
>     @param timeout: stop search after 'timeout' seconds and report the current
>               best MCS.
>
>     @rtype: MCSResult
>     @return: Information about the MCS search results. Attributes are 'completed'
>         (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds', and 'smarts'.
>
>
> Is that too unreadable? I can rewrite that to be more prose than
> this sort of auto-documentation format.

Is there a python documentation generator that can process that and do
something sensible with it? If so, I'm fine with it. If not, then
maybe the @type bits can be combined into the @param somehow (yes, it
would just be text then).


> I wasn't sure of how/where to update the documentation. Greg? Do you have
> an idea of where it might go? Or perhaps want to do it yourself?

The API docs are automatically generated. It would be nice to have a
brief section for the GettingStarted document
($RDBASE/Docs/Book/GettingStartedInPython.rst), but I am willing to
write that if you don't have time.

> In other news, I'll be presenting the MCS work at the Goslar conference
> this fall, and I've got some funding to work on finding a threshold MCS,
> that is, the maximum common substructure which is in at least some percentage
> threshold of the entire system.

Congrats! That would definitely be useful.

> Can anyone suggest a good name for that? Threshold MCS? Something else?

"Threshold MCS" sounds ok to me, but I'm not great at naming things.

-greg

Re: [Rdkit-devel] fmcs in RDKit

From: Andrew D. <da...@da...> - 2012-09-25 02:30:27

The next quarter release is coming soon, which was good incentive for me to
work on the new MCS code for RDKit.

I checked in the actual MCS code a few weeks ago, and Greg's reviewed it at
least lightly. This evening I checked in the test cases. They are at:

http://sourceforge.net/p/rdkit/code/2197/tree/trunk/rdkit/Chem/MCS.py
http://sourceforge.net/p/rdkit/code/2197/tree/trunk/rdkit/Chem/UnitTestMCS.py

Here's the API and documentation.


def FindMCS(mols, min_num_atoms=2,
            maximize = Default.maximize,
            atom_compare = Default.atom_compare,
            bond_compare = Default.bond_compare,
            match_valences = Default.match_valences,
            ring_matches_ring_only = False,
            complete_rings_only = False,
            timeout=Default.timeout,
            ):
    """Find the maximum common substructure of a set of molecules

    @type  mols: molecule iterator
    @param mols: find the MCS of these molecules
    @type min_num_atoms: integer
    @param min_num_atoms: The minimum number of atoms which must be in the MCS.
              The minimim value is 2.
    @type maximize: "atoms" or "bonds"
    @param maximize: The default "atoms" maximizes the number of atoms in
              the MCS. Use "bonds" to maximize the number of bonds instead.
    @type atom_compare: "any", "elements", or "isotopes"
    @param atom_compare: Specify the atom comparison function. The default "elements"
              says that two atoms are the same if and only if they have the same
              element number. Use "isotopes" if you are using isotope labels to
              define your own atom classs. With "any", all atoms match each other.
    @type bond_compare: "any" or "bondtypes"
    @param bond_compare: Specify the bond comparison function. The default
              "bondtypes" says that two bonds are the same if and only if they
              have the same bond type. With "any", all bonds match each other.
    @type match_valences: boolean
    @param match_valences: If True, atoms must also have matching valences
              to match. By default this is False.
    @type ring_matches_ring_only: boolean
    @param ring_matches_ring_only: If True, then both bonds must either be in
              a ring or not in a ring in order to match. By default this is False.
    @type ring_matches_ring_only: boolean
    @param ring_matches_ring_only: If True, then both bonds must either be in
              a ring or not in a ring in order to match. By default this is False.
    @type complete_rings_only: boolean
    @param complete_rings_only: If True, then if a ring bond of a molecule
              is in the MCS then the corresponding MCS bond is also in a ring.
    @type timeout: float
    @param timeout: stop search after 'timeout' seconds and report the current
              best MCS.

    @rtype: MCSResult
    @return: Information about the MCS search results. Attributes are 'completed'
        (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds', and 'smarts'.


Is that too unreadable? I can rewrite that to be more prose than
this sort of auto-documentation format.



I wasn't sure of how/where to update the documentation. Greg? Do you have
an idea of where it might go? Or perhaps want to do it yourself?



In other news, I'll be presenting the MCS work at the Goslar conference
this fall, and I've got some funding to work on finding a threshold MCS,
that is, the maximum common substructure which is in at least some percentage
threshold of the entire system.

Can anyone suggest a good name for that? Threshold MCS? Something else?

Cheers,


				Andrew
				da...@da...

Re: [Rdkit-devel] fmcs in RDKit

From: Andrew D. <da...@da...> - 2012-09-01 18:43:31

On Sep 1, 2012, at 6:00 PM, James Swetnam wrote:
> Sorry, hit send prematurely.
>
> I'd define a separate function FindMCSWithTimeout for the timeout parameter unless a majority of users will be concerned with this method's running time, but I'm not familiar with the algorithm's details.

The MCS algorithm is one of those normally fast but
not infrequently exponential slow algorithms. In my various
tests, including simple pairwise tests of 10,000 randomly
chosen pairs of structures from ChEMBL 13, I found that:

7 pairs took longer than 30 seconds,
9977 took less than 2 seconds,
9958 took less than 1 second,
9420 took less than 0.1 seconds,
4820 took less than 0.01 seconds.
1513 took less than 0.001 seconds

As I recall, sometimes the search can take many hours.

My belief is that the slow runtimes occur often enough
that everyone will use a timeout. I also believe that early
on, people who use the API will test with structures where
the MCS is easy to find. Only later on will they get stuck
in some slow-downs (which is especially a problem on a
web server), and then have to diagnose it.

BTW, unlike, say, the CDK code, timeout failures do NOT
raise an exception. One of the advantages of my algorithm
over some of the other ones is that I can report the best
size to date. The returned MCSResult contains:
- number of atoms in the MCS
- number of bonds in the MCS
- SMARTS pattern
- a 'completed' flag, to know if MCS search completed
(True) or stopped due to some other reason,
like a timeout (False).

Indigo's scaffold finder uses a per-indigo-instance variable
to set the timeout. My testing showed that Indigo's code,
while faster than fmcs, still has timeout issues.

CDK's MCS code implements a timeout.

Small Molecule Subgraph Detector (SMSD) does not have a timeout.
I had to write my own wrapper around it to kill it when it took
too long.

OEChem's MCS code does not seem to have a timeout. They have
two different methods, one approximate and the other exact.
I don't have the experience to judge if there are problems
when using the exact method in practice.

The existence of timeouts in other codes suggests that
it's important, and it's hard to remove the need.

// This is OpenBabel like. It's orthogonal to the method itself.

I don't know what you mean by orthogonal. That is,
why does the run-time allocation any more orthogonal
than the binary options

ring_matches_ring_only = False,
complete_rings_only = False,

There are several ways to handle all of the parameters:

- have a function which takes all of the parameters

- create a search class, where some parameters can be set in
the constructor.

- and/or, which has methods methods like "SetTimeout(double seconds)"

- create a "search parameters" class, which holds all of the
values and is passed to the search function

- create different classes, one for each parameterization

- like Indigo, support multiple toolkit instance, where
each instance can have its own settings

- introduce global (or per-thread) variables

Of these, I thought that a single function call interface
was the easiest to understand.

Also, I lean towards the functional programming ideal of
not changing state, so don't like "SetTimeout" or the
approaches which set some sort of (semi-) global parameter.

> Agree with Greg's assertion to rename to FindMCS.

Yes, I'll be changing the name.

> // Rename Default to something like MCSParams, and have a factory function returning a DefaultMCSParams I assume this is the command pattern?

It's not a command pattern. It's minimal data only.

=================
### A place to set global options
# (Is this really useful?)

class Default(object):
timeout = None
timeout_string = "none"
maximize = "bonds"
atom_compare = "elements"
bond_compare = "bondtypes"
match_valences = False
ring_matches_ring_only = False
complete_rings_only = False
==================

> // Logging should really be separated from API methods. You should be
> using the python logging framework such that users
> // can modify these globally and not through the API

Then perhaps "logging" is the wrong name for it. "Progress" is
much better.

It's used in the command-line tool to show progress information.
When something takes 30 seconds to run, it's comforting to know
that the underlying code is doing something. This would also be
true for a GUI display. Ideally the API would support a progress
handler, along with an option to abort.

I think it's best to strip it out of the RDKit version of the code.
What's there is only for the command-line tool, and I don't have
the time/funding to develop a proper API.

BTW, I don't think Python's logging system is appropriate for this.
Actually, I don't like using logs for most things, but that's
a different topic.

> verbose_delay=1.0,
> // Also remove from this function

Yes. See above.

This is an example of something which is hard to integrate with
the logging API. I could log every event, but that's useless.
Instead, I need to provide some sample. I could do it every
N events, but that's CPU- and load-specific. So I chose to use
a time delay.

I don't know how to configure logging to handle that case.

On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <gre...@gm...> wrote:
> I would have no objection to the command-line driver being there in a
> separate file in the package; it could well be useful to people.

It could well be. However:

1) this for me is more of a contractual obligation, in that
my client asked me nicely (it's not actually in the contract)
to put it into RDKit. I don't have much free time/funding
to do this.

2) The command-line tool was designed for my client, and while
I think it's general enough, I haven't gotten any feedback
about it. The isotope-label-as-atom-classes hack is my biggest
concern here.

3) As you all can see, the API is designed specifically for
the command-line tool. Stripping out the currently named
'logging' code would mean also modifying the command-line
driver. That's more work. Especially since the work I would
like to do is to improve the algorithm.

4) I would also have to get a sense of what RDKit's command-line
parameter style looks like, so that the code fits in.

Hence, rather more work than I would like to take on
when so far neither Greg nor my clients have asked for it.

Andrew
da...@da...

Re: [Rdkit-devel] fmcs in RDKit

From: James S. <jsw...@gm...> - 2012-09-01 16:00:47

Sorry, hit send prematurely.

I'd define a separate function FindMCSWithTimeout for the timeout parameter
unless a majority of users will be concerned with this method's running
time, but I'm not familiar with the algorithm's details.

Best,
James
On Sat, Sep 1, 2012 at 8:57 AM, James Swetnam <jsw...@gm...> wrote:

> Some pendantic and unsolicited API critiques, but as a user I figured I
> should chime in.
>
> Agree with Greg's assertion to rename to FindMCS.
>
>  def MCS(mols, min_num_atoms=2,
>          maximize = Default.maximize,
>          // Rename Default to something like MCSParams, and have a factory
> function returning a DefaultMCSParams  I assume this is the command pattern?
>
>          atom_compare = Default.atom_compare,
>          bond_compare = Default.bond_compare,
>          match_valences = Default.match_valences,
>          ring_matches_ring_only = False,
>          complete_rings_only = False,
>          timeout=Default.timeout,
>  // Could you remove timeout?  This is OpenBabel likeIt's orthogonal to
> the method itself.  You could define an additional method with
>          times=None,
>
>          verbose=False,
> // Logging should really be separated from API methods.  You should be
> using the python logging framework
> <http://docs.python.org/library/logging.html>such that users
> // can modify these globally and not through the API
>          verbose_delay=1.0,
> // Also remove from this function
>          ):
>
>
> On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <gre...@gm...>wrote:
>
>> On Sat, Sep 1, 2012 at 1:28 AM, Andrew Dalke <da...@da...>
>> wrote:
>> > I've started to add the fmcs MCS search code to RDKit.
>> > By that I mean it does not include the command-line driver
>> > which is part of the fmcs distribution; just the MCS search code.
>>
>> I would have no objection to the command-line driver being there in a
>> separate file in the package; it could well be useful to people.
>>
>> > The interface to it is a single function which takes these
>> > parameters.
>> >
>> > def MCS(mols, min_num_atoms=2,
>> >         maximize = Default.maximize,
>> >         atom_compare = Default.atom_compare,
>> >         bond_compare = Default.bond_compare,
>> >         match_valences = Default.match_valences,
>> >         ring_matches_ring_only = False,
>> >         complete_rings_only = False,
>> >         timeout=Default.timeout,
>> >         times=None,
>> >         verbose=False,
>> >         verbose_delay=1.0,
>> >         ):
>>
>> Any objection to calling it FindMCS?
>>
>> >
>> > The steps I've done so far are:
>> >   - rename all other functions and classes to use the
>> >       "_" prefix to indicate that they are internal
>> >   - remove the command-line driver code
>> >   - removed all code uses of the word 'fmcs'
>> >
>> > The steps still to do are:
>> >   - use the right module name
>> >   - reformat the doc strings
>> >   - make it build properly
>> >   - add basic tests
>> >   - add the above to SVN and commit to SF
>> >
>> > I have some questions about those steps.
>> >
>> > 1) What is the right module name? Greg and I talked
>> > about it a couple of months ago. I can't find the
>> > paper where I wrote it down.
>> >
>> > Right now it is "rdkit.Chem.MCS", so that the actual
>> > function to call is "rdkit.Chem.MCS.MCS".
>>
>> MCS is fine with me, the function would then be rdkit.Chem.MCS.FindMCS
>> assuming that you agree with my function renaming suggestion.
>>
>> > 2) Is there any other documentation I should update
>> > other than the docstrings?
>>
>> It would be nice to have something about it in the Getting Started
>> guide, but I can do that.
>>
>> >
>> > I'll have more questions which I get to the build system.
>> >
>> >
>> > I'm not sure of the long-term coordination between
>> > the RDKit and fmcs code bases. Greg would like to
>> > push more of the RDKit code into C++ while I would
>> > like fmcs to be portable across other toolkits.
>>
>> Since the rest of the list haven't been involved in these discussions,
>> here's my motivation for that:
>> having the MCS code in C++ increases the impact of the code: it allows
>> it to be used from the cartridge, in the knime nodes, from other Java
>> programs, and from C++ itself. There may also be performance benefits
>> (I say "may" because I know that Andrew writes really good Python
>> code), but those are at most a secondary motivation
>>
>> -greg
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Rdkit-devel mailing list
>> Rdk...@li...
>> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>>
>
>

Re: [Rdkit-devel] fmcs in RDKit

From: James S. <jsw...@gm...> - 2012-09-01 15:58:11

Some pendantic and unsolicited API critiques, but as a user I figured I
should chime in.

Agree with Greg's assertion to rename to FindMCS.

 def MCS(mols, min_num_atoms=2,
         maximize = Default.maximize,
         // Rename Default to something like MCSParams, and have a factory
function returning a DefaultMCSParams  I assume this is the command pattern?
         atom_compare = Default.atom_compare,
         bond_compare = Default.bond_compare,
         match_valences = Default.match_valences,
         ring_matches_ring_only = False,
         complete_rings_only = False,
         timeout=Default.timeout,
// Could you remove timeout?  This is OpenBabel likeIt's orthogonal to the
method itself.  You could define an additional method with
         times=None,

         verbose=False,
// Logging should really be separated from API methods.  You should be
using the python logging framework
<http://docs.python.org/library/logging.html>such that users
// can modify these globally and not through the API
         verbose_delay=1.0,
// Also remove from this function
         ):

On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <gre...@gm...>wrote:

> On Sat, Sep 1, 2012 at 1:28 AM, Andrew Dalke <da...@da...>
> wrote:
> > I've started to add the fmcs MCS search code to RDKit.
> > By that I mean it does not include the command-line driver
> > which is part of the fmcs distribution; just the MCS search code.
>
> I would have no objection to the command-line driver being there in a
> separate file in the package; it could well be useful to people.
>
> > The interface to it is a single function which takes these
> > parameters.
> >
> > def MCS(mols, min_num_atoms=2,
> >         maximize = Default.maximize,
> >         atom_compare = Default.atom_compare,
> >         bond_compare = Default.bond_compare,
> >         match_valences = Default.match_valences,
> >         ring_matches_ring_only = False,
> >         complete_rings_only = False,
> >         timeout=Default.timeout,
> >         times=None,
> >         verbose=False,
> >         verbose_delay=1.0,
> >         ):
>
> Any objection to calling it FindMCS?
>
> >
> > The steps I've done so far are:
> >   - rename all other functions and classes to use the
> >       "_" prefix to indicate that they are internal
> >   - remove the command-line driver code
> >   - removed all code uses of the word 'fmcs'
> >
> > The steps still to do are:
> >   - use the right module name
> >   - reformat the doc strings
> >   - make it build properly
> >   - add basic tests
> >   - add the above to SVN and commit to SF
> >
> > I have some questions about those steps.
> >
> > 1) What is the right module name? Greg and I talked
> > about it a couple of months ago. I can't find the
> > paper where I wrote it down.
> >
> > Right now it is "rdkit.Chem.MCS", so that the actual
> > function to call is "rdkit.Chem.MCS.MCS".
>
> MCS is fine with me, the function would then be rdkit.Chem.MCS.FindMCS
> assuming that you agree with my function renaming suggestion.
>
> > 2) Is there any other documentation I should update
> > other than the docstrings?
>
> It would be nice to have something about it in the Getting Started
> guide, but I can do that.
>
> >
> > I'll have more questions which I get to the build system.
> >
> >
> > I'm not sure of the long-term coordination between
> > the RDKit and fmcs code bases. Greg would like to
> > push more of the RDKit code into C++ while I would
> > like fmcs to be portable across other toolkits.
>
> Since the rest of the list haven't been involved in these discussions,
> here's my motivation for that:
> having the MCS code in C++ increases the impact of the code: it allows
> it to be used from the cartridge, in the knime nodes, from other Java
> programs, and from C++ itself. There may also be performance benefits
> (I say "may" because I know that Andrew writes really good Python
> code), but those are at most a secondary motivation
>
> -greg
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Rdkit-devel mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>

4 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 11 12 13 14 15 .. 35 > >> (Page 13 of 35)