rdkit-devel Mailing List for RDKit (Page 25)

Open-Source Cheminformatics and Machine Learning

Brought to you by: glandrum

rdkit-devel — Developers information and discussion.

You can subscribe to this list here.

2006	Jan	Feb	Mar	Apr	May (24)	Jun (20)	Jul	Aug (2)	Sep (4)	Oct (39)	Nov (33)	Dec (8)
2007	Jan (17)	Feb (13)	Mar (35)	Apr (10)	May (1)	Jun (2)	Jul (3)	Aug (4)	Sep (4)	Oct (7)	Nov (1)	Dec
2008	Jan (10)	Feb (2)	Mar (2)	Apr (10)	May (8)	Jun (2)	Jul (1)	Aug (1)	Sep (3)	Oct (1)	Nov	Dec
2009	Jan (2)	Feb (1)	Mar (1)	Apr (1)	May	Jun (1)	Jul (7)	Aug (2)	Sep (6)	Oct (12)	Nov	Dec
2010	Jan (1)	Feb	Mar	Apr (2)	May (4)	Jun (2)	Jul (17)	Aug (7)	Sep (20)	Oct (8)	Nov (1)	Dec (12)
2011	Jan (8)	Feb (15)	Mar (20)	Apr (5)	May (8)	Jun (2)	Jul (17)	Aug (8)	Sep (4)	Oct (15)	Nov	Dec (2)
2012	Jan (3)	Feb	Mar (23)	Apr (2)	May (2)	Jun (8)	Jul (7)	Aug (18)	Sep (8)	Oct (10)	Nov (2)	Dec (7)
2013	Jan (6)	Feb (3)	Mar	Apr (3)	May (1)	Jun (1)	Jul (1)	Aug (2)	Sep	Oct (5)	Nov	Dec
2014	Jan (1)	Feb	Mar	Apr	May (10)	Jun	Jul (2)	Aug	Sep	Oct (7)	Nov (1)	Dec (6)
2015	Jan (22)	Feb	Mar (2)	Apr (5)	May (10)	Jun	Jul	Aug	Sep	Oct (3)	Nov (9)	Dec (3)
2016	Jan (2)	Feb (5)	Mar	Apr (31)	May (3)	Jun (2)	Jul	Aug	Sep	Oct (4)	Nov (10)	Dec (7)
2017	Jan	Feb (7)	Mar (3)	Apr (6)	May (4)	Jun (6)	Jul (5)	Aug (1)	Sep (7)	Oct (1)	Nov	Dec
2018	Jan	Feb	Mar (11)	Apr (13)	May (18)	Jun (1)	Jul	Aug	Sep	Oct (5)	Nov (3)	Dec
2019	Jan	Feb	Mar	Apr (10)	May (4)	Jun	Jul	Aug	Sep	Oct (2)	Nov (1)	Dec (2)
2020	Jan (2)	Feb	Mar (5)	Apr (2)	May	Jun	Jul (4)	Aug	Sep	Oct (2)	Nov	Dec
2021	Jan	Feb	Mar (4)	Apr	May (1)	Jun	Jul (3)	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 23 24 25 26 27 .. 35 > >> (Page 25 of 35)

[Rdkit-devel] RDKit now buildable on Mac OS X

From: Greg L. <gre...@gm...> - 2008-09-03 20:24:19

This evening I checked in a bunch of changes (mostly to Jamfiles) that
allow the RDKit to be built under Mac OS X (at least under version
10.5.4). All the unit tests pass except for Numeric/EigenSolvers; I'm
looking into that one.

I am a very, very long way from being a Mac OS X expert, but things
look pretty good and I didn't have to hack around in order to make
things compile, so I think this is probably ok.

Some work is going to have to be done on installation instructions
because I didn't really install much of anything on this Mac other
than boost 1.36.0. The rest came pre-configured from IT. The one thing
that seems a big dodgy is the location of the numpy include files
(needed for Jamroot)
--/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/core/include
-- but maybe that's standard?

I would be very, very happy to have input from people more familiar
with the platform, particularly if anyone could even sketch out some
install instructions that would make sense to a Mac user.

Best Regards,
-greg

[Rdkit-devel] "Iterative" stereochemistry

From: Greg L. <gre...@gm...> - 2008-08-25 13:21:05

Dear all,

One of the long-time gaps/bugs in the RDKit handling of
stereochemistry has been what I call "dependant stereochemistry" :
atoms or bonds that are stereogenic because some of their neighbors
are stereogenic.

A very simple, and well known, example is the molecule defined by the SMILES:
C[C@H]1[C@@H](F)CCC[C@H]1F
Carbon 1 (numbering from zero) here is a chiral center (absolute
stereochemistry S, or s, depending on which notation you use) because
its two neighbors are chiral centers with different chirality (one is
R, the other S).

Another example, this time with double bonds:
Cl\C=C(/C=C/F)/C=C\F
The second and third double bonds are E and Z, respectively. The first
bond is Z, but only because of the stereochemistry of the other two
bonds.

You can further elaborate this to double bonds that are stereogenic
because of the chirality of attached atoms:
C\C=C([C@@](C)(F)Br)/[C@@](Br)(F)C
or atoms that are chiral because of the stereochemistry of attached bonds:
C[C@](/C=C/C)(F)/C=C\C

I'm pretty sure this can be pretty much arbitrarily elaborated. It's
enough to make a cheminformatician's heart go pitter-patt.

Handling these cases in the RDKit required a restructuring of the
stereochemistry perception code (which made that "pitter-patt" feel
more like palpitations) and some changes to the client-visible
interface. Specifically, the former division between perceiving atom
chirality and double bond stereochemistry no longer makes sense.

Since this is a pretty deep and complex change, I created a separate
branch for the work:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/branches/IterativeChirality_20Aug2008/
I believe the implementation that's currently checked in there is
correct. I added test cases for each of the scenarios I could think of
and those all pass. I still need to do a bit of optimization work, but
that should not affect the results.

Before merging this into the core, I'd like to ask anyone who has time
and interest to try it out and let me know if you find problems. When
testing, please keep in mind that not all cheminformatics systems
handle these cases correctly. I have checked Marvin, openbabel
(v2.2.0), and ChemDraw, and only ChemDraw gets all of the test cases
right.

Best regards,
-greg

[Rdkit-devel] switching from numeric to numpy

From: Greg L. <gre...@gm...> - 2008-07-05 14:45:46

Dear all,

This morning I checked in a bunch of changes that switch the rdkit
over from using the old Numeric python library to the newer numpy
library (http://numpy.scipy.org/). The changes were merged into the
trunk in revision 742:
http://rdkit.svn.sourceforge.net/viewvc/rdkit?view=rev&revision=742

This was an important step because Numeric python is no longer being
supported and it's becoming increasingly difficult to find binary
distributions. Numpy has an active user/developer community and is
likely the future of good numeric support in Python.

The details of what were changed are captured in the svn changelog
above and in some notes I took as I made the change:
http://code.google.com/p/rdkit/wiki/NumPyPort

If anyone has any questions or problems, please let me know.

-greg

[Rdkit-devel] Removing the dependency on Numeric Python

From: Greg L. <gre...@gm...> - 2008-06-28 15:30:06

Dear all,

I have finally managed to get a version of the RDKit working that does
not require an installation of the obsolete and no-longer supported
Numeric Python library. This new version uses instead numpy
(http://www.scipy.org/SciPy).

The new version is on the branch:
http://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/NumPyPort_27June2008
and the changes I made are described here:
http://code.google.com/p/rdkit/wiki/NumPyPort

I've tested this on 32bit linux and windows. I'll run the tests on
64bit linux sometime next week.

If anyone else wants to give this a try, I'd be grateful for any feedback.

Note that to get things to build you need to be sure that the numpy
includes are in the python include directory. For reasons I'm not
clever enough to understand, the numpy installation process puts its
headers into the python site-packages. Obviously this will differ
based on the details of your python installation, but on my linux box
I needed to:
cp -r /usr/lib/python2.5/site-packages/numpy/core/include/numpy
/usr/include/python2.5
on windows it was
c:\python2.5\lib\site-packages\numpy\core\include\numpy ->
c:\python2.5\include


-greg

[Rdkit-devel] 64bit compatibility

From: Greg L. <gre...@gm...> - 2008-06-11 05:56:06

Dear all,

Last week I created a branch to get the RDKit working on 64bit
systems. After testing under windows (32 bit) and linux (32 and 64
bit) systems, I've merged those changes back onto the trunk:
http://rdkit.svn.sourceforge.net/viewvc/rdkit?view=rev&revision=718

I believe things should work without problems on 64bit systems now. If
anyone encounters problems, please let me know.

-greg

Re: [Rdkit-devel] collecting feature suggestions for the next release

From: Adrian S. <ma...@ad...> - 2008-05-29 09:39:55

Hi Greg,

I think a feature to handle PDB structures would be a great addition
to RDKit, but I guess this depends if its possible to have thousands
of atoms in an rdmol object.

Maybe I should start to explain what kind of implementation I had in
mind. Since the hierarchy in PDB structures can be fragile sometimes,
the most convenient way would be to use a set theory-like
representation. Basically, there could a class rdResidueAtom, subclass
of rdAtom, which in addition to the inherited attributes and functions
also has PDB details as attributes such as

rdResidueAtom.IsHetatm
rdResidueAtom.SerialNumber
rdResidueAtom.AtomName
rdResidueAtom.AlternateLocation
rdResidueAtom.ResidueName
rdResidueAtom.ChainID
rdResidueAtom.ResidueNumber
rdResidueAtom.InsertCode
rdResidueAtom.Occupancy
rdResidueAtom.BFactor

the sets would be structure->NMR model->chain->residue->atom then. In
an ideal case, it would be possible to access individual sets through
a hierarchy/index object, e.g. -> getResidue(chainID, resName, resNum,
insCode) would return all atoms with ->atomIdx() that belong to that
set - and accordingly for chains, atoms etc.

It would tremendously useful to have cheminformatics functions
available for protein-ligand complexes, particularly to determine
connectivity, aromaticity, assign implicit hydrogens, geometric
functions etc.

This, as a result, would simplify many tasks, for instance determining
hydrogen bonding, analysing the geometry of pi-pi interactions and so
on.

That would make my life a lot easier at least! ;)

Adrian

On Sun, May 25, 2008 at 7:55 AM, Greg Landrum <gre...@gm...> wrote:
> Dear all,
>
> I'm starting to collect suggestions for feature additions for the next
> RDKit release, which should happen sometime towards the end of Q3.
> These are in the feature tracker:
> http://sourceforge.net/tracker/?group_id=160139&atid=814653
> in the 2008_Q3 group.
>
> If you have things you'd like to see added, or cleanups that you
> believe need to be done, please either add them to the tracker
> directly, post them to the list, or email them to me.
>
> Thanks,
> -greg
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Rdkit-devel mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>

[Rdkit-devel] collecting feature suggestions for the next release

From: Greg L. <gre...@gm...> - 2008-05-25 06:55:04

Dear all,

I'm starting to collect suggestions for feature additions for the next
RDKit release, which should happen sometime towards the end of Q3.
These are in the feature tracker:
http://sourceforge.net/tracker/?group_id=160139&atid=814653
in the 2008_Q3 group.

If you have things you'd like to see added, or cleanups that you
believe need to be done, please either add them to the tracker
directly, post them to the list, or email them to me.

Thanks,
-greg

[Rdkit-devel] New RDKit Release: May2008_1

From: Greg L. <gre...@gm...> - 2008-05-25 06:00:17

I'm very happy to announce that the next version of the RDKit --
May2008_1 -- is released.

The release notes are below. There are some change that affect
backwards compatibility, so please at least skim them.

The source release and a windows binary are on the sourceforge downloads page:
http://sourceforge.net/project/showfiles.php?group_id=160139&package_id=180003&release_id=601793
The files can also be downloaded from the google project page:
http://code.google.com/p/rdkit/downloads/list

If you plan to build from source, please read the new build instructions:
1) For Linux: http://code.google.com/p/rdkit/wiki/BuildingOnLinux
2) For Windows: http://code.google.com/p/rdkit/wiki/BuildingOnWindows

An updated documentation distribution has been added to sourceforge:
http://sourceforge.net/project/showfiles.php?group_id=160139&package_id=230191&release_id=601803
note that this PDF is also included in the source and binary downloads.

I also updated the browseable documentation at rdkit.org:
http://www.rdkit.org/C++_Docs
http://www.rdkit.org/Python_Docs

Thanks to the everyone who submitted bug reports and suggestions for
this release!

Please let me know if you find any problems with the release or have
any suggestions.

-greg

****** Release_May2008_1 *******
(Changes relative to Release_Jan2008_1)

!!!!!! IMPORTANT !!!!!!
- A fix to the values of the parameters for the Crippen LogP
calculator means that the values calculated with this version are
not backwards compatible. Old values should be recalculated.
- topological fingerprints generated with this version *may* not be
compatible with those from earlier versions. Please read the note
below in the "Other" section.
- Please read the point about dummy atoms in the "New Features"
section. It explains a change that affects backwards compatibility
when dealing with dummy atoms.

Acknowledgements:
- Some of the bugs fixed in this release were found and reported by
Adrian Schreyer, Noel O'Boyle, and Markus Kossner.

Bug Fixes
- A core leak in MolAlign::getAlignmentTransform was fixed (issue
1899787)
- Mol suppliers now reset the EOF flag on their stream after they run
off the end (issue 1904170)
- A problem causing the string "Sc" to not parse correctly in
recursive SMARTS was fixed (issue 1912895)
- Combined recursive smarts queries are now output correctly.
(issue 1914154)
- A bug in the handling of chirality in reactions was fixed (issue
1920627)
- Looping directly over a supplier no longer causes a crash (issue
1928819)
- a core leak in the smiles parser was fixed (issue 1929199)
- Se and Te are now potential aromatic atoms (per the proposed
OpenSmiles standard). (issue 1932365)
- isotope information (and other atomic modifiers) are now correctly
propagated by chemical reactions (issue 1934052)
- triple bonds no longer contribute 2 electrons to the count for
aromaticity (issue 1940646)
- Two bugs connected with square brackets in SMILES were fixed
(issues 1942220 and 1942657)
- atoms with coordination numbers higher than 4 now have tetrahedral
stereochemistry removed (issue 1942656)
- Bond.SetStereo() is no longer exposed to Python (issue 1944575)
- A few typos in the parameter data for the Crippen logp calculator
were fixed. Values calculated with this version should be assumed
to not be backwards compatible with older versions (issue 1950302)
- Isotope queries are now added correctly (if perhaps not optimally)
to SMARTS.
- some drawing-related bugs have been cleared up.
- A bug in Chem.WedgeMolBonds (used in the drawing code) that was
causing incorrect stereochemistry in drawn structures was
fixed. (issue 1965035)
- A bug causing errors or crashes on Windows with [r<n>] queries was
fixed. (issue 1968930)
- A bug in the calculation of TPSA values in molecules that have Hs
in the graph was fixed. (issue 1969745)

New Features
- Support for supplying dummy atoms as "[Du]", "[X]", "[Xa]", etc. is
now considered deprecated. In this release a warning will be
generated for these forms and in the next release the old form will
generate errors. Note that the output of dummy atoms has also
changed: the default output format is now "*", this means that the
canonical SMILES for molecules containing dummies are no longer
compatible with the canonical SMILES from previous releases.
(feature request 186217)
- Atom and bond query information is now serializable; i.e. query
molecules can now be pickled and not lose the query
information. (feature request 1756596)
- Query features from mol files are now fully supported. (feature
request 1756962)
- Conformations now support a dimensionality flag. Dimensionality
information is now read from mol blocks and TDT files. (feature request
1906758)
- Bulk Dice similarity functions have been added for IntSparseIntVect
and LongSparseIntVect (feature request 1936450)
- Exceptions are no longer thrown during molecule parsing. Failure in
molecule parsing is indicated by returning None. Failure to *open* a
file when reading a molecule throws BadFileExceptions (feature
requests 1932875 and 1938303)
- The various similarity functions for BitVects and SparseIntVects
now take an optional returnDistance argument. If this is provided,
the functions return the corresponding distance instead of
similarity.
- Some additional query information from Mol files is now translated
when generating SMARTS. Additional queries now translated:
- number of ring bonds
- unsaturation queries
- atom lists are handled better as well
(feature request 1902466)
- A new algorithm for generating the bits for topological
fingerprints has been added. The new approach is a bit quicker and
more robust than the old, but is not backwards compatible.
Similarity trends are more or less conserved.
- The molecule drawing code in Chem.Draw.MolDrawing has been modified
so that it creates better drawings. A new option for drawing that
uses the aggdraw graphics library has been added.
- The RingInfo class supports two new methods: AtomRings() and
BondRings() that return tuples of tuples with indices of the atoms
or bonds that make up the molecule's rings.

Other
- Changes in the underlying boost random-number generator in version
1.35 of the boost library may have broken backwards compatibility
of 2D fingerprints generated using the old fingerprinter. It is
strongly suggested that you regenerate any stored fingerprints (and
switch to the new fingerprinter if possible). There is an explicit
test for this in $RDBASE/Code/GraphMol/Fingerprints/test1.cpp
- The unofficial and very obsolete version of John Torjo's v1
boost::logging library that was included with the RDKit
distribution is no longer used. The logging library has been
replaced with the much less powerful and flexible approach of just
sending things to stdout or stderr. If and when the logging library
is accepted into Boost, it will be integrated.
- The DbCLI tools (in $RDBASE/Projects/DbCLI) generate topological
fingerprints using both the old and new algorithms (unless the
--noOldFingerprints option is provided). The default search
uses the newer fingerprint.
- The directory $RDBASE/Data/SmartsLib contains a library of sample
SMARTS contributed by Richard Lewis.

[Rdkit-devel] Release Candidate 2

From: Greg L. <gre...@gm...> - 2008-05-17 07:56:17

Dear all,

After fixing a couple of small bugs and making some other changes, I
just tagged a second release candidate for the May 2008 release:
http://rdkit.svn.sourceforge.net/svnroot/rdkit/tags/Release_May2008_1RC2

Release notes are here:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/tags/Release_May2008_1RC2/ReleaseNotes.txt?revision=675

A tgz with the source is up on the google code site:
http://code.google.com/p/rdkit/downloads/list
the windows binary is uploading as I write this message.

If there are no substantial bugs uncovered, I will plan on doing the
May2008 release next weekend (May 24th, 25th, or 26th). If bugs come
in, I will fix
them as quickly as I can and restart the 1 week clock.

Best Regards,
-greg

Re: [Rdkit-devel] Boost 1.35.0

From: Greg L. <gre...@gm...> - 2008-05-15 17:37:04

On Thu, May 15, 2008 at 12:08 PM, Adrian Schreyer <am...@ca...> wrote:
> Is there any reason not to use the latest Boost release and stick to
> 1.34.1 for building RDKit? I have used version 1.35.0 to compile it on
> Kubuntu 8.04, works without problems so far.
>

I'm glad you brought this up; I was planning on mentioning the topic
sometime soon.

The short answer is that, with one caveat, everything works fine with
boost 1.35.0. The short form of the caveat is that the topological
fingerprints generated with Chem.DaylightFingerprint using version
1.34.x and 1.35.0 of Boost are not compatible with each other; so as
long as you aren't using older fingerprints (or regenerate the
fingerprints you have), everything is fine. If you switch to the new
(in this release) fingerprinter -- Chem.RDKFingerprint -- then it
doesn't matter which version of Boost you use.

Here's a more detailed explanation:
The algorithm that generates the topological fingerprints uses a
pseudo-random number generator (pRNG) provided by the Boost.Random
library. One of the properties of a pRNG is that as long as you seed
them with the same value, you should always get the same sequence of
random numbers. The fingerprinting algorithm uses this property to set
bits based on molecular paths: a number is generated for each path,
that number is used to seed the pRNG, and then several random bits are
set in the fingerprint.

The problem is that pRNG I use is generating different sequences of
numbers in 1.35.0 than it did in earlier versions. I've been trying
(by posting to mailing lists and submitting a bug report) to find out
if this is the result of a bug, a bug fix, or an error on my part, but
up to this point I have had no response from the boost community. In
case anyone is interested in following along at home, the bug is here:
http://svn.boost.org/trac/boost/ticket/1856

The new fingerprinter, RDKFingerprint, uses a different one of the
Boost.Random pRNGs. The alternative pRNG generates consistent values
between releases 1.34.1 and 1.35.0.

I believe that, in general, the new fingerprinter is more robust (it's
also a tiny bit faster, but this isn't a reason to switch) and I'd
suggest using that instead of the old one. If one takes this
suggestion, then the differences between Boost version no longer
matter and backwards compatibility is not a problem since there is no
backwards to be compatible with. :-)

------------------
For those who really care about details:
Take a look at :
http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/GraphMol/Fingerprints/Fingerprints.cpp?revision=655&view=markup

Both DaylightFingerprint and RDKFingerprint work from the same set of
molecular paths. The differences between the two algorithms are in how
they hash the paths (convert them into seeds for the pRNG) and which
pRNG they use. The old algorithm hashes paths by calculating the
Balaban J value (a classic topological descriptor) for each path and
then converts the bits of the J value (which is a real number) into an
int using "brute force" [lines 56-61]. That int is the seed. The new
algorithm (RDKFingerprint) assigns a value to each bond in the path
and then combines those into a single integer using some Boost
functionality for generating hashes [lines 125-185].
------------------

I hope this helps,
-greg

[Rdkit-devel] Boost 1.35.0

From: Adrian S. <am...@ca...> - 2008-05-15 10:08:07

Is there any reason not to use the latest Boost release and stick to
1.34.1 for building RDKit? I have used version 1.35.0 to compile it on
Kubuntu 8.04, works without problems so far.

Adrian

[Rdkit-devel] May 2008 release candidate

From: Greg L. <gre...@gm...> - 2008-05-13 20:07:41

Dear all,

I just tagged a release candidate for the May 2008 release:

http://rdkit.svn.sourceforge.net/svnroot/rdkit/tags/Release_May2008_1RC1

Release notes are here:
http://rdkit.svn.sourceforge.net/viewvc/*checkout*/rdkit/tags/Release_May2008_1RC1/ReleaseNotes.txt?revision=659
(and there, I just found the first problem: the release notes call
this the "April 2008" release)

A tgz with the source is up on the google code site:
http://code.google.com/p/rdkit/downloads/list
I'll put a windows binary up on google code tomorrow morning (CET).

Please note that the build process has been somewhat simplified:
http://code.google.com/p/rdkit/wiki/BuildingOnLinux_May2008

If there are no substantial bugs uncovered, I will plan on doing the
May2008 release next Tuesday (May 20). If bugs come in, I will fix
them as quickly as I can and restart the 1 week clock.

Best Regards,
-greg

[Rdkit-devel] getting ready for the next release

From: Greg L. <gre...@gm...> - 2008-05-08 19:07:38

Hi,

I'd like to do a release candidate for the Q2 2008 release early next
week. Things currently seem to be in pretty good shape and I believe
that all of the April2008 bugs are fixed with the exception of 1896935
http://sourceforge.net/tracker/index.php?func=detail&aid=1896935&group_id=160139&atid=814650
which I'm postponing because it requires some more substantial code changes.

Release notes (also, I think, mostly up to date) are here:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/ReleaseNotes.txt?revision=HEAD&view=markup

If anyone has any last-minute comments, suggestions, or bug reports,
please let me know. Unless I hear otherwise, I'll try to put together
a release candidate on either Tuesday or Wednesday of next week.

-greg

[Rdkit-devel] FYI: I'll be out of town

From: Greg L. <gre...@gm...> - 2008-04-24 17:34:40

Dear RDKit community,

I will be on vacation from Saturday until the following Sunday (May
4). I will not in email contact, so I won't be fixing bugs or
answering questions. You'll have to fend for yourselves. :-)

I would like to do the Q2 release of the RDKit sometime not too long
after I get back, so if anyone has bugs they've noticed and not gotten
around to reporting, please do so in the next week or so. Unless
something big comes up, I will do the first release candidate sometime
around May 10 or 11.

-greg

[Rdkit-devel] slight changes to the aromaticity rules

From: Greg L. <gre...@gm...> - 2008-04-12 13:54:27

Hi,

As part of the fix to issues 1934360:
http://sourceforge.net/tracker/index.php?func=detail&aid=1934360&group_id=160139&atid=814650
and 1940646:
http://sourceforge.net/tracker/index.php?func=detail&aid=1940646&group_id=160139&atid=814650
I made a couple of slight modification to the aromaticity rules:
 - An atom is now no longer considered to be a candidate for
aromaticity if it has more than one double or triple bond
 - atoms with triple bonds contribute one electron to the pi electron
count for a ring.

Some rings that were previously considered aromatic that aren't anymore:
C1=C=NCN1
C1#CC=C1

A ring that was not previously considered aromatic that now is:
C1#CC=CC=C1

There are now checked in.

If anyone has objections, comments, or suggestions, please let me know.
-greg

Re: [Rdkit-devel] [ rdkit-Bugs-1932365 ]

From: Adrian S. <adr...@gm...> - 2008-04-09 21:09:05

Hi Greg,

I get the emails - I will test it tomorrow with the MSDchem set and
send you the results.

Adrian

On Wed, Apr 9, 2008 at 8:24 PM, Greg Landrum <gre...@gm...> wrote:
> Adrian,
>
>  I'm not sure if you get email when the bugs are updated or not, so
>  I'll go ahead and post. The proposed solution here has been
>  implemented and checked in:
>
>
>  >  On Wed, Apr 2, 2008 at 7:14 PM, Greg Landrum <gre...@gm...> wrote:
>  >  >
>
> >  >  What about solution A (returning None on parse failure), with messages
>  >  >  displayed to stderr using the logging mechanism? I.e. when you'd see
>  >  >  something like this:
>  >  >  >>> m = Chem.MolFromSmiles('c1cccc1')
>  >  >  [TIMESTAMP] Sanitization error: Can't kekulize mol
>  >  >  >>> m is None
>  >  >  True
>
>  The various file parsers (including the suppliers) should no longer
>  throw exceptions when they fail. They should display error messages
>  and return None. The exception to this is when the file parsers (or
>  suppliers) fail to open the input file; in this case you'll get an
>  IOError, which is the Pythonic (IMO) way of doing things.
>
>  Let me know if you have a chance to try it and find any problems,
>  -greg
>

Re: [Rdkit-devel] [ rdkit-Bugs-1932365 ]

From: Greg L. <gre...@gm...> - 2008-04-09 19:24:51

Adrian,

I'm not sure if you get email when the bugs are updated or not, so
I'll go ahead and post. The proposed solution here has been
implemented and checked in:

>  On Wed, Apr 2, 2008 at 7:14 PM, Greg Landrum <gre...@gm...> wrote:
>  >
>  >  What about solution A (returning None on parse failure), with messages
>  >  displayed to stderr using the logging mechanism? I.e. when you'd see
>  >  something like this:
>  >  >>> m = Chem.MolFromSmiles('c1cccc1')
>  >  [TIMESTAMP] Sanitization error: Can't kekulize mol
>  >  >>> m is None
>  >  True

The various file parsers (including the suppliers) should no longer
throw exceptions when they fail. They should display error messages
and return None. The exception to this is when the file parsers (or
suppliers) fail to open the input file; in this case you'll get an
IOError, which is the Pythonic (IMO) way of doing things.

Let me know if you have a chance to try it and find any problems,
-greg

[Rdkit-devel] changes to dummy atom handing

From: Greg L. <gre...@gm...> - 2008-04-06 10:45:29

Dear All,

Another quick note about a small change that may have large repercussions:

I just checked in (rev594) the next step in the change in dummy atom
handling that was started in the last release.

Here's an explanation from the release notes:
 - Support for supplying dummy atoms as "[Du]", "[X]", "[Xa]", etc. is
   now considered deprecated. In this release a warning will be
   generated for these forms and in the next release the old form will
   generate errors. Note that the output of dummy atoms has also
   changed: the default output format is now "*".  (feature request
   186217)

Best Regards,
-greg

[Rdkit-devel] changes to the logging system

From: Greg L. <gre...@gm...> - 2008-04-06 09:17:47

Dear all,

I just checked in (rev593) a big simplification to the logging system
used in the RDKit. The changes shouldn't break existing code, but if
you made more than simple use of the logger, it's probably a good idea
to check that things still work.

The reason for the simplification is that the former system, based on
an old version of a proposed boost.logging library, was just too
cumbersome to build and work with. There were also problems with
operations across shared libraries (error log messages from some
shared libraries would not show up when using the python wrappers)
that were more or less unresolvable without large effort.

I will continue to keep an eye on the developments with the new
boost.logging. If it's ever approved as a standard boost library I
will redo the logging code to use that. In the meantime, we have
something basic that works.

If anyone has suggestions for another open source c++ logging library
with a compatible license (e.g. nothing GPL'ed) that is reasonably
lightweight, I'd be happy to hear about it.

Regards
-greg

Re: [Rdkit-devel] [ rdkit-Bugs-1932365 ]

From: Adrian S. <adr...@gm...> - 2008-04-02 18:19:54

Would be fine with me!

Adrian

On Wed, Apr 2, 2008 at 7:14 PM, Greg Landrum <gre...@gm...> wrote:
> On Wed, Apr 2, 2008 at 7:29 PM, Adrian Schreyer
>  <adr...@gm...> wrote:
>  > Personally, I would appreciate a solution which indicates why
>  >  structure parsing has failed (or at least allows some kind of debug).
>  >  I do not know if this is feasible in wrapped C++ code but it is really
>  >  useful to know where the problem arises from, particularly in the
>  >  context of bug reports.
>  >
>  >  I have to agree solution B is neither very elegant nor consistent but
>  >  maybe there is a way to implement some kind of debug mode?
>
>  What about solution A (returning None on parse failure), with messages
>  displayed to stderr using the logging mechanism? I.e. when you'd see
>  something like this:
>  >>> m = Chem.MolFromSmiles('c1cccc1')
>  [TIMESTAMP] Sanitization error: Can't kekulize mol
>  >>> m is None
>  True
>
>
>  -greg
>

Re: [Rdkit-devel] [ rdkit-Bugs-1932365 ]

From: Greg L. <gre...@gm...> - 2008-04-02 18:15:14

On Wed, Apr 2, 2008 at 7:29 PM, Adrian Schreyer
<adr...@gm...> wrote:
> Personally, I would appreciate a solution which indicates why
>  structure parsing has failed (or at least allows some kind of debug).
>  I do not know if this is feasible in wrapped C++ code but it is really
>  useful to know where the problem arises from, particularly in the
>  context of bug reports.
>
>  I have to agree solution B is neither very elegant nor consistent but
>  maybe there is a way to implement some kind of debug mode?

What about solution A (returning None on parse failure), with messages
displayed to stderr using the logging mechanism? I.e. when you'd see
something like this:
>>> m = Chem.MolFromSmiles('c1cccc1')
[TIMESTAMP] Sanitization error: Can't kekulize mol
>>> m is None
True


-greg

Re: [Rdkit-devel] [ rdkit-Bugs-1932365 ]

From: Adrian S. <adr...@gm...> - 2008-04-02 17:29:07

Personally, I would appreciate a solution which indicates why
structure parsing has failed (or at least allows some kind of debug).
I do not know if this is feasible in wrapped C++ code but it is really
useful to know where the problem arises from, particularly in the
context of bug reports.

I have to agree solution B is neither very elegant nor consistent but
maybe there is a way to implement some kind of debug mode?

On Wed, Apr 2, 2008 at 5:40 PM, Greg Landrum <gre...@gm...> wrote:
> Hi Adrian,
>
>  Sorry for a CC to a list you're subscribed to, but I'm not sure if you
>  read the mailing list and this is a message/discussion that needs to
>  be on the mailing list.
>
>  Here's the text from your bug report:
>  ------------------------
>  RDkit does neither return a molecule nor raise an error if an element in a
>  SMILES string is lowercase:
>
>
>  In [15]: Chem.MolFromSmiles('c1c[se]c2=NCC(=c21)CC(C(=O)O)N')
>
>  In [16]: Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
>  Out[16]: <Chem.rdchem.Mol object at 0x8bc93e4>
>  As far as I know lowercase elements are not really SMILES standard
>  therefore I would propose to throw some kind of error which can be handled
>  nicely in Python.
>  ------------------------
>
>  There are two things going on here:
>
>  1) Se is not recognized as an element that can be aromatic. The
>  consequence of this is that [se] is not recognized. In your line [16]
>  above, the molecule is processed correctly, but if you look at the
>  SMILES:
>  [2]>>> m = Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
>
>  [3]>>> Chem.MolToSmiles(m)
>  Out[3] 'NC(CC1=C2C=C[Se]C2=NC1)C(=O)O'
>
>  you see that the ring is not recognized as aromatic. This is a very
>  straightforward fix. Within the proposed OpenSmiles spec
>  (http://opensmiles.org/spec/open-smiles-2-grammar.html), both Se and
>  As can be marked aromatic. Unless I hear otherwise from the list, I
>  will support this and modify the smiles parser and aromaticity
>  recognition code to accept Se and As as aromatic atoms.
>
>  2) When the smiles parser (or the other molecule parsers) fails, the
>  result is a "None" as a return value. The theory behind this was that
>  the code to catch bad molecules is then much simpler:
>   for s in smiles:
>    m = Chem.MolFromSmiles(s)
>    if not m: continue
>  instead of:
>   for s in smiles:
>     try:
>        m = Chem.MolFromSmiles(s)
>     except:
>        continue
>  That's the theory. The practice is that the try/except is still needed
>  because the sanitization code can throw exceptions:
>  [4]>>> Chem.MolFromSmiles('c1cccc1')
>  ---------------------------------------------------------------------------
>  ValueError                                Traceback (most recent call last)
>
>  /home/glandrum/RDKit/Code/GraphMol/SmilesParse/<ipython console> in <module>()
>
>  ValueError: Sanitization error: Can't kekulize mol
>
>  This is inconsistent, confusing, and wrong. So I'll fix it. The
>  question is how. I see two options:
>    A) Continue returning None for failed parsing, but catch the
>  kekulization exceptions on the C++ side so that they don't make it
>  through to Python (i.e. return None from line [4] above)
>    B) Have all parse failures raise exceptions.
>
>  I'm leaning towards A) because I think the resulting client code is
>  much more readable. This really becomes important with suppliers,
>  which I think should also be consistent. Solution A) allows code like
>  this:
>
>  suppl = Chem.SDMolSupplier('file.sdf')
>  for mol in suppl:
>   if not mol: continue
>   <do something>
>
>  instead of this, which solution B would require (not tested):
>
>  suppl = Chem.SDMolSupplier('file.sdf')
>  while 1:
>   try:
>     m = suppl.next()
>   except StopIteration:
>     break
>   else:
>    continue
>   < do something >
>
>  Any arguments against me implementing solution A?
>
>  Thanks again for the detailed and careful bug report,
>  -greg
>

Re: [Rdkit-devel] [ rdkit-Bugs-1932365 ]

From: Greg L. <gre...@gm...> - 2008-04-02 16:40:55

Hi Adrian,

Sorry for a CC to a list you're subscribed to, but I'm not sure if you
read the mailing list and this is a message/discussion that needs to
be on the mailing list.

Here's the text from your bug report:
------------------------
RDkit does neither return a molecule nor raise an error if an element in a
SMILES string is lowercase:


In [15]: Chem.MolFromSmiles('c1c[se]c2=NCC(=c21)CC(C(=O)O)N')

In [16]: Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
Out[16]: <Chem.rdchem.Mol object at 0x8bc93e4>
As far as I know lowercase elements are not really SMILES standard
therefore I would propose to throw some kind of error which can be handled
nicely in Python.
------------------------

There are two things going on here:

1) Se is not recognized as an element that can be aromatic. The
consequence of this is that [se] is not recognized. In your line [16]
above, the molecule is processed correctly, but if you look at the
SMILES:
[2]>>> m = Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')

[3]>>> Chem.MolToSmiles(m)
Out[3] 'NC(CC1=C2C=C[Se]C2=NC1)C(=O)O'

you see that the ring is not recognized as aromatic. This is a very
straightforward fix. Within the proposed OpenSmiles spec
(http://opensmiles.org/spec/open-smiles-2-grammar.html), both Se and
As can be marked aromatic. Unless I hear otherwise from the list, I
will support this and modify the smiles parser and aromaticity
recognition code to accept Se and As as aromatic atoms.

2) When the smiles parser (or the other molecule parsers) fails, the
result is a "None" as a return value. The theory behind this was that
the code to catch bad molecules is then much simpler:
 for s in smiles:
   m = Chem.MolFromSmiles(s)
   if not m: continue
instead of:
 for s in smiles:
    try:
       m = Chem.MolFromSmiles(s)
    except:
       continue
That's the theory. The practice is that the try/except is still needed
because the sanitization code can throw exceptions:
[4]>>> Chem.MolFromSmiles('c1cccc1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/glandrum/RDKit/Code/GraphMol/SmilesParse/<ipython console> in <module>()

ValueError: Sanitization error: Can't kekulize mol

This is inconsistent, confusing, and wrong. So I'll fix it. The
question is how. I see two options:
   A) Continue returning None for failed parsing, but catch the
kekulization exceptions on the C++ side so that they don't make it
through to Python (i.e. return None from line [4] above)
   B) Have all parse failures raise exceptions.

I'm leaning towards A) because I think the resulting client code is
much more readable. This really becomes important with suppliers,
which I think should also be consistent. Solution A) allows code like
this:

suppl = Chem.SDMolSupplier('file.sdf')
for mol in suppl:
  if not mol: continue
  <do something>

instead of this, which solution B would require (not tested):

suppl = Chem.SDMolSupplier('file.sdf')
while 1:
  try:
    m = suppl.next()
  except StopIteration:
    break
  else:
   continue
  < do something >

Any arguments against me implementing solution A?

Thanks again for the detailed and careful bug report,
-greg

[Rdkit-devel] Interesting fact

From: Greg L. <gre...@gm...> - 2008-03-19 19:21:21

Today while browsing license documentation and a new Qt4 installation,
I noticed a very interesting file in the distribution directory. The
contents of said file are basically mirrored here:
http://trolltech.com/products/qt/gplexception
PyQt has a similar exception in it.

I read this to mean that it's possible to BSD license the GUI
components of the RDKit (at least new ones, developed with Qt4, since
I don't know when this exception appeared). This makes me very happy.
I was never happy with being stuck with the GPL on the GUI components,
and now it seems that is no longer necessary.

Thank you Trolltech and Riverbank Computing!

-greg

[Rdkit-devel] nicer molecular drawings

From: Greg L. <gre...@gm...> - 2008-03-16 18:19:16

Attachments: foo.png

I'm not sure what it was, but something set me off this weekend and
convinced me that the molecule drawing code absolutely *had* to be
improved.

I kicked around the idea of doing a C++ renderer based on AGG
(http://antigrain.com/), but that seemed like too much to get my brain
around on a Friday afternoon, so I looked around some more.

I've been using matplotlib (http://matplotlib.sourceforge.net/) a bit
lately and I knew from an old post by Andrew Dalke
(http://www.dalkescientific.com/writings/diary/archive/2005/04/23/matplotlib_without_gui.html)
that you could add arbitrary geometry to your matplot lib figures, so
I decided to give that a try.
[An aside: if you don't know about matplotlib, I strongly recommend
looking into it... it's a very nice tool for doing plotting from
python]

The matplotlib thing worked pretty well, but it's asking a lot to have
people install matplotlib just to be able to do molecule rendering.
And it somehow feels like a hack.

I took another swing at it using aggdraw, from the effbot:
http://effbot.org/zone/aggdraw-index.htm
and got something that is pretty nice, as the attached image demonstrates.

Since molecule drawing really just requires a few primitives, I
restructured the existing drawing code so that it will work either
with matplotlib, aggdraw, or the old approach. The changes are in
branch:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/branches/NewDrawing_16March2008/
and are mostly localized to this directory:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/branches/NewDrawing_16March2008/Python/Chem/Draw/

For creating images, I think the aggdraw renderer is the best. The
sping renderer can draw to pdf or svg (or qt!), which is nice, but one
really misses the antialiasing when it's used for images.

Back to thinking about rendering from C++: what would be really nice
would be to look into porting the AGG demo for molecule drawing to
work with RDKit molecules
(http://www.antigrain.com/demo/mol_view.cpp.html). But that's a
project for another weekend. And it would have to use a
pre-license-change version of AGG.

Enjoy and let me know what you think.
-greg

4 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 23 24 25 26 27 .. 35 > >> (Page 25 of 35)