rdkit-devel Mailing List for RDKit (Page 25)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(24) |
Jun
(20) |
Jul
|
Aug
(2) |
Sep
(4) |
Oct
(39) |
Nov
(33) |
Dec
(8) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
(17) |
Feb
(13) |
Mar
(35) |
Apr
(10) |
May
(1) |
Jun
(2) |
Jul
(3) |
Aug
(4) |
Sep
(4) |
Oct
(7) |
Nov
(1) |
Dec
|
| 2008 |
Jan
(10) |
Feb
(2) |
Mar
(2) |
Apr
(10) |
May
(8) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
|
| 2009 |
Jan
(2) |
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(7) |
Aug
(2) |
Sep
(6) |
Oct
(12) |
Nov
|
Dec
|
| 2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
(4) |
Jun
(2) |
Jul
(17) |
Aug
(7) |
Sep
(20) |
Oct
(8) |
Nov
(1) |
Dec
(12) |
| 2011 |
Jan
(8) |
Feb
(15) |
Mar
(20) |
Apr
(5) |
May
(8) |
Jun
(2) |
Jul
(17) |
Aug
(8) |
Sep
(4) |
Oct
(15) |
Nov
|
Dec
(2) |
| 2012 |
Jan
(3) |
Feb
|
Mar
(23) |
Apr
(2) |
May
(2) |
Jun
(8) |
Jul
(7) |
Aug
(18) |
Sep
(8) |
Oct
(10) |
Nov
(2) |
Dec
(7) |
| 2013 |
Jan
(6) |
Feb
(3) |
Mar
|
Apr
(3) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
(5) |
Nov
|
Dec
|
| 2014 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(10) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
(6) |
| 2015 |
Jan
(22) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(10) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(9) |
Dec
(3) |
| 2016 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(31) |
May
(3) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(10) |
Dec
(7) |
| 2017 |
Jan
|
Feb
(7) |
Mar
(3) |
Apr
(6) |
May
(4) |
Jun
(6) |
Jul
(5) |
Aug
(1) |
Sep
(7) |
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
(11) |
Apr
(13) |
May
(18) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(3) |
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
(2) |
| 2020 |
Jan
(2) |
Feb
|
Mar
(5) |
Apr
(2) |
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
| 2021 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
(1) |
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Greg L. <gre...@gm...> - 2008-09-03 20:24:19
|
This evening I checked in a bunch of changes (mostly to Jamfiles) that allow the RDKit to be built under Mac OS X (at least under version 10.5.4). All the unit tests pass except for Numeric/EigenSolvers; I'm looking into that one. I am a very, very long way from being a Mac OS X expert, but things look pretty good and I didn't have to hack around in order to make things compile, so I think this is probably ok. Some work is going to have to be done on installation instructions because I didn't really install much of anything on this Mac other than boost 1.36.0. The rest came pre-configured from IT. The one thing that seems a big dodgy is the location of the numpy include files (needed for Jamroot) --/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/core/include -- but maybe that's standard? I would be very, very happy to have input from people more familiar with the platform, particularly if anyone could even sketch out some install instructions that would make sense to a Mac user. Best Regards, -greg |
|
From: Greg L. <gre...@gm...> - 2008-08-25 13:21:05
|
Dear all, One of the long-time gaps/bugs in the RDKit handling of stereochemistry has been what I call "dependant stereochemistry" : atoms or bonds that are stereogenic because some of their neighbors are stereogenic. A very simple, and well known, example is the molecule defined by the SMILES: C[C@H]1[C@@H](F)CCC[C@H]1F Carbon 1 (numbering from zero) here is a chiral center (absolute stereochemistry S, or s, depending on which notation you use) because its two neighbors are chiral centers with different chirality (one is R, the other S). Another example, this time with double bonds: Cl\C=C(/C=C/F)/C=C\F The second and third double bonds are E and Z, respectively. The first bond is Z, but only because of the stereochemistry of the other two bonds. You can further elaborate this to double bonds that are stereogenic because of the chirality of attached atoms: C\C=C([C@@](C)(F)Br)/[C@@](Br)(F)C or atoms that are chiral because of the stereochemistry of attached bonds: C[C@](/C=C/C)(F)/C=C\C I'm pretty sure this can be pretty much arbitrarily elaborated. It's enough to make a cheminformatician's heart go pitter-patt. Handling these cases in the RDKit required a restructuring of the stereochemistry perception code (which made that "pitter-patt" feel more like palpitations) and some changes to the client-visible interface. Specifically, the former division between perceiving atom chirality and double bond stereochemistry no longer makes sense. Since this is a pretty deep and complex change, I created a separate branch for the work: http://rdkit.svn.sourceforge.net/viewvc/rdkit/branches/IterativeChirality_20Aug2008/ I believe the implementation that's currently checked in there is correct. I added test cases for each of the scenarios I could think of and those all pass. I still need to do a bit of optimization work, but that should not affect the results. Before merging this into the core, I'd like to ask anyone who has time and interest to try it out and let me know if you find problems. When testing, please keep in mind that not all cheminformatics systems handle these cases correctly. I have checked Marvin, openbabel (v2.2.0), and ChemDraw, and only ChemDraw gets all of the test cases right. Best regards, -greg |
|
From: Greg L. <gre...@gm...> - 2008-07-05 14:45:46
|
Dear all, This morning I checked in a bunch of changes that switch the rdkit over from using the old Numeric python library to the newer numpy library (http://numpy.scipy.org/). The changes were merged into the trunk in revision 742: http://rdkit.svn.sourceforge.net/viewvc/rdkit?view=rev&revision=742 This was an important step because Numeric python is no longer being supported and it's becoming increasingly difficult to find binary distributions. Numpy has an active user/developer community and is likely the future of good numeric support in Python. The details of what were changed are captured in the svn changelog above and in some notes I took as I made the change: http://code.google.com/p/rdkit/wiki/NumPyPort If anyone has any questions or problems, please let me know. -greg |
|
From: Greg L. <gre...@gm...> - 2008-06-28 15:30:06
|
Dear all, I have finally managed to get a version of the RDKit working that does not require an installation of the obsolete and no-longer supported Numeric Python library. This new version uses instead numpy (http://www.scipy.org/SciPy). The new version is on the branch: http://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/NumPyPort_27June2008 and the changes I made are described here: http://code.google.com/p/rdkit/wiki/NumPyPort I've tested this on 32bit linux and windows. I'll run the tests on 64bit linux sometime next week. If anyone else wants to give this a try, I'd be grateful for any feedback. Note that to get things to build you need to be sure that the numpy includes are in the python include directory. For reasons I'm not clever enough to understand, the numpy installation process puts its headers into the python site-packages. Obviously this will differ based on the details of your python installation, but on my linux box I needed to: cp -r /usr/lib/python2.5/site-packages/numpy/core/include/numpy /usr/include/python2.5 on windows it was c:\python2.5\lib\site-packages\numpy\core\include\numpy -> c:\python2.5\include -greg |
|
From: Greg L. <gre...@gm...> - 2008-06-11 05:56:06
|
Dear all, Last week I created a branch to get the RDKit working on 64bit systems. After testing under windows (32 bit) and linux (32 and 64 bit) systems, I've merged those changes back onto the trunk: http://rdkit.svn.sourceforge.net/viewvc/rdkit?view=rev&revision=718 I believe things should work without problems on 64bit systems now. If anyone encounters problems, please let me know. -greg |
|
From: Adrian S. <ma...@ad...> - 2008-05-29 09:39:55
|
Hi Greg, I think a feature to handle PDB structures would be a great addition to RDKit, but I guess this depends if its possible to have thousands of atoms in an rdmol object. Maybe I should start to explain what kind of implementation I had in mind. Since the hierarchy in PDB structures can be fragile sometimes, the most convenient way would be to use a set theory-like representation. Basically, there could a class rdResidueAtom, subclass of rdAtom, which in addition to the inherited attributes and functions also has PDB details as attributes such as rdResidueAtom.IsHetatm rdResidueAtom.SerialNumber rdResidueAtom.AtomName rdResidueAtom.AlternateLocation rdResidueAtom.ResidueName rdResidueAtom.ChainID rdResidueAtom.ResidueNumber rdResidueAtom.InsertCode rdResidueAtom.Occupancy rdResidueAtom.BFactor the sets would be structure->NMR model->chain->residue->atom then. In an ideal case, it would be possible to access individual sets through a hierarchy/index object, e.g. -> getResidue(chainID, resName, resNum, insCode) would return all atoms with ->atomIdx() that belong to that set - and accordingly for chains, atoms etc. It would tremendously useful to have cheminformatics functions available for protein-ligand complexes, particularly to determine connectivity, aromaticity, assign implicit hydrogens, geometric functions etc. This, as a result, would simplify many tasks, for instance determining hydrogen bonding, analysing the geometry of pi-pi interactions and so on. That would make my life a lot easier at least! ;) Adrian On Sun, May 25, 2008 at 7:55 AM, Greg Landrum <gre...@gm...> wrote: > Dear all, > > I'm starting to collect suggestions for feature additions for the next > RDKit release, which should happen sometime towards the end of Q3. > These are in the feature tracker: > http://sourceforge.net/tracker/?group_id=160139&atid=814653 > in the 2008_Q3 group. > > If you have things you'd like to see added, or cleanups that you > believe need to be done, please either add them to the tracker > directly, post them to the list, or email them to me. > > Thanks, > -greg > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Rdkit-devel mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > |
|
From: Greg L. <gre...@gm...> - 2008-05-25 06:55:04
|
Dear all, I'm starting to collect suggestions for feature additions for the next RDKit release, which should happen sometime towards the end of Q3. These are in the feature tracker: http://sourceforge.net/tracker/?group_id=160139&atid=814653 in the 2008_Q3 group. If you have things you'd like to see added, or cleanups that you believe need to be done, please either add them to the tracker directly, post them to the list, or email them to me. Thanks, -greg |
|
From: Greg L. <gre...@gm...> - 2008-05-25 06:00:17
|
I'm very happy to announce that the next version of the RDKit -- May2008_1 -- is released. The release notes are below. There are some change that affect backwards compatibility, so please at least skim them. The source release and a windows binary are on the sourceforge downloads page: http://sourceforge.net/project/showfiles.php?group_id=160139&package_id=180003&release_id=601793 The files can also be downloaded from the google project page: http://code.google.com/p/rdkit/downloads/list If you plan to build from source, please read the new build instructions: 1) For Linux: http://code.google.com/p/rdkit/wiki/BuildingOnLinux 2) For Windows: http://code.google.com/p/rdkit/wiki/BuildingOnWindows An updated documentation distribution has been added to sourceforge: http://sourceforge.net/project/showfiles.php?group_id=160139&package_id=230191&release_id=601803 note that this PDF is also included in the source and binary downloads. I also updated the browseable documentation at rdkit.org: http://www.rdkit.org/C++_Docs http://www.rdkit.org/Python_Docs Thanks to the everyone who submitted bug reports and suggestions for this release! Please let me know if you find any problems with the release or have any suggestions. -greg ****** Release_May2008_1 ******* (Changes relative to Release_Jan2008_1) !!!!!! IMPORTANT !!!!!! - A fix to the values of the parameters for the Crippen LogP calculator means that the values calculated with this version are not backwards compatible. Old values should be recalculated. - topological fingerprints generated with this version *may* not be compatible with those from earlier versions. Please read the note below in the "Other" section. - Please read the point about dummy atoms in the "New Features" section. It explains a change that affects backwards compatibility when dealing with dummy atoms. Acknowledgements: - Some of the bugs fixed in this release were found and reported by Adrian Schreyer, Noel O'Boyle, and Markus Kossner. Bug Fixes - A core leak in MolAlign::getAlignmentTransform was fixed (issue 1899787) - Mol suppliers now reset the EOF flag on their stream after they run off the end (issue 1904170) - A problem causing the string "Sc" to not parse correctly in recursive SMARTS was fixed (issue 1912895) - Combined recursive smarts queries are now output correctly. (issue 1914154) - A bug in the handling of chirality in reactions was fixed (issue 1920627) - Looping directly over a supplier no longer causes a crash (issue 1928819) - a core leak in the smiles parser was fixed (issue 1929199) - Se and Te are now potential aromatic atoms (per the proposed OpenSmiles standard). (issue 1932365) - isotope information (and other atomic modifiers) are now correctly propagated by chemical reactions (issue 1934052) - triple bonds no longer contribute 2 electrons to the count for aromaticity (issue 1940646) - Two bugs connected with square brackets in SMILES were fixed (issues 1942220 and 1942657) - atoms with coordination numbers higher than 4 now have tetrahedral stereochemistry removed (issue 1942656) - Bond.SetStereo() is no longer exposed to Python (issue 1944575) - A few typos in the parameter data for the Crippen logp calculator were fixed. Values calculated with this version should be assumed to not be backwards compatible with older versions (issue 1950302) - Isotope queries are now added correctly (if perhaps not optimally) to SMARTS. - some drawing-related bugs have been cleared up. - A bug in Chem.WedgeMolBonds (used in the drawing code) that was causing incorrect stereochemistry in drawn structures was fixed. (issue 1965035) - A bug causing errors or crashes on Windows with [r<n>] queries was fixed. (issue 1968930) - A bug in the calculation of TPSA values in molecules that have Hs in the graph was fixed. (issue 1969745) New Features - Support for supplying dummy atoms as "[Du]", "[X]", "[Xa]", etc. is now considered deprecated. In this release a warning will be generated for these forms and in the next release the old form will generate errors. Note that the output of dummy atoms has also changed: the default output format is now "*", this means that the canonical SMILES for molecules containing dummies are no longer compatible with the canonical SMILES from previous releases. (feature request 186217) - Atom and bond query information is now serializable; i.e. query molecules can now be pickled and not lose the query information. (feature request 1756596) - Query features from mol files are now fully supported. (feature request 1756962) - Conformations now support a dimensionality flag. Dimensionality information is now read from mol blocks and TDT files. (feature request 1906758) - Bulk Dice similarity functions have been added for IntSparseIntVect and LongSparseIntVect (feature request 1936450) - Exceptions are no longer thrown during molecule parsing. Failure in molecule parsing is indicated by returning None. Failure to *open* a file when reading a molecule throws BadFileExceptions (feature requests 1932875 and 1938303) - The various similarity functions for BitVects and SparseIntVects now take an optional returnDistance argument. If this is provided, the functions return the corresponding distance instead of similarity. - Some additional query information from Mol files is now translated when generating SMARTS. Additional queries now translated: - number of ring bonds - unsaturation queries - atom lists are handled better as well (feature request 1902466) - A new algorithm for generating the bits for topological fingerprints has been added. The new approach is a bit quicker and more robust than the old, but is not backwards compatible. Similarity trends are more or less conserved. - The molecule drawing code in Chem.Draw.MolDrawing has been modified so that it creates better drawings. A new option for drawing that uses the aggdraw graphics library has been added. - The RingInfo class supports two new methods: AtomRings() and BondRings() that return tuples of tuples with indices of the atoms or bonds that make up the molecule's rings. Other - Changes in the underlying boost random-number generator in version 1.35 of the boost library may have broken backwards compatibility of 2D fingerprints generated using the old fingerprinter. It is strongly suggested that you regenerate any stored fingerprints (and switch to the new fingerprinter if possible). There is an explicit test for this in $RDBASE/Code/GraphMol/Fingerprints/test1.cpp - The unofficial and very obsolete version of John Torjo's v1 boost::logging library that was included with the RDKit distribution is no longer used. The logging library has been replaced with the much less powerful and flexible approach of just sending things to stdout or stderr. If and when the logging library is accepted into Boost, it will be integrated. - The DbCLI tools (in $RDBASE/Projects/DbCLI) generate topological fingerprints using both the old and new algorithms (unless the --noOldFingerprints option is provided). The default search uses the newer fingerprint. - The directory $RDBASE/Data/SmartsLib contains a library of sample SMARTS contributed by Richard Lewis. |
|
From: Greg L. <gre...@gm...> - 2008-05-17 07:56:17
|
Dear all, After fixing a couple of small bugs and making some other changes, I just tagged a second release candidate for the May 2008 release: http://rdkit.svn.sourceforge.net/svnroot/rdkit/tags/Release_May2008_1RC2 Release notes are here: http://rdkit.svn.sourceforge.net/viewvc/rdkit/tags/Release_May2008_1RC2/ReleaseNotes.txt?revision=675 A tgz with the source is up on the google code site: http://code.google.com/p/rdkit/downloads/list the windows binary is uploading as I write this message. If there are no substantial bugs uncovered, I will plan on doing the May2008 release next weekend (May 24th, 25th, or 26th). If bugs come in, I will fix them as quickly as I can and restart the 1 week clock. Best Regards, -greg |
|
From: Greg L. <gre...@gm...> - 2008-05-15 17:37:04
|
On Thu, May 15, 2008 at 12:08 PM, Adrian Schreyer <am...@ca...> wrote: > Is there any reason not to use the latest Boost release and stick to > 1.34.1 for building RDKit? I have used version 1.35.0 to compile it on > Kubuntu 8.04, works without problems so far. > I'm glad you brought this up; I was planning on mentioning the topic sometime soon. The short answer is that, with one caveat, everything works fine with boost 1.35.0. The short form of the caveat is that the topological fingerprints generated with Chem.DaylightFingerprint using version 1.34.x and 1.35.0 of Boost are not compatible with each other; so as long as you aren't using older fingerprints (or regenerate the fingerprints you have), everything is fine. If you switch to the new (in this release) fingerprinter -- Chem.RDKFingerprint -- then it doesn't matter which version of Boost you use. Here's a more detailed explanation: The algorithm that generates the topological fingerprints uses a pseudo-random number generator (pRNG) provided by the Boost.Random library. One of the properties of a pRNG is that as long as you seed them with the same value, you should always get the same sequence of random numbers. The fingerprinting algorithm uses this property to set bits based on molecular paths: a number is generated for each path, that number is used to seed the pRNG, and then several random bits are set in the fingerprint. The problem is that pRNG I use is generating different sequences of numbers in 1.35.0 than it did in earlier versions. I've been trying (by posting to mailing lists and submitting a bug report) to find out if this is the result of a bug, a bug fix, or an error on my part, but up to this point I have had no response from the boost community. In case anyone is interested in following along at home, the bug is here: http://svn.boost.org/trac/boost/ticket/1856 The new fingerprinter, RDKFingerprint, uses a different one of the Boost.Random pRNGs. The alternative pRNG generates consistent values between releases 1.34.1 and 1.35.0. I believe that, in general, the new fingerprinter is more robust (it's also a tiny bit faster, but this isn't a reason to switch) and I'd suggest using that instead of the old one. If one takes this suggestion, then the differences between Boost version no longer matter and backwards compatibility is not a problem since there is no backwards to be compatible with. :-) ------------------ For those who really care about details: Take a look at : http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/GraphMol/Fingerprints/Fingerprints.cpp?revision=655&view=markup Both DaylightFingerprint and RDKFingerprint work from the same set of molecular paths. The differences between the two algorithms are in how they hash the paths (convert them into seeds for the pRNG) and which pRNG they use. The old algorithm hashes paths by calculating the Balaban J value (a classic topological descriptor) for each path and then converts the bits of the J value (which is a real number) into an int using "brute force" [lines 56-61]. That int is the seed. The new algorithm (RDKFingerprint) assigns a value to each bond in the path and then combines those into a single integer using some Boost functionality for generating hashes [lines 125-185]. ------------------ I hope this helps, -greg |
|
From: Adrian S. <am...@ca...> - 2008-05-15 10:08:07
|
Is there any reason not to use the latest Boost release and stick to 1.34.1 for building RDKit? I have used version 1.35.0 to compile it on Kubuntu 8.04, works without problems so far. Adrian |
|
From: Greg L. <gre...@gm...> - 2008-05-13 20:07:41
|
Dear all, I just tagged a release candidate for the May 2008 release: http://rdkit.svn.sourceforge.net/svnroot/rdkit/tags/Release_May2008_1RC1 Release notes are here: http://rdkit.svn.sourceforge.net/viewvc/*checkout*/rdkit/tags/Release_May2008_1RC1/ReleaseNotes.txt?revision=659 (and there, I just found the first problem: the release notes call this the "April 2008" release) A tgz with the source is up on the google code site: http://code.google.com/p/rdkit/downloads/list I'll put a windows binary up on google code tomorrow morning (CET). Please note that the build process has been somewhat simplified: http://code.google.com/p/rdkit/wiki/BuildingOnLinux_May2008 If there are no substantial bugs uncovered, I will plan on doing the May2008 release next Tuesday (May 20). If bugs come in, I will fix them as quickly as I can and restart the 1 week clock. Best Regards, -greg |
|
From: Greg L. <gre...@gm...> - 2008-05-08 19:07:38
|
Hi, I'd like to do a release candidate for the Q2 2008 release early next week. Things currently seem to be in pretty good shape and I believe that all of the April2008 bugs are fixed with the exception of 1896935 http://sourceforge.net/tracker/index.php?func=detail&aid=1896935&group_id=160139&atid=814650 which I'm postponing because it requires some more substantial code changes. Release notes (also, I think, mostly up to date) are here: http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/ReleaseNotes.txt?revision=HEAD&view=markup If anyone has any last-minute comments, suggestions, or bug reports, please let me know. Unless I hear otherwise, I'll try to put together a release candidate on either Tuesday or Wednesday of next week. -greg |
|
From: Greg L. <gre...@gm...> - 2008-04-24 17:34:40
|
Dear RDKit community, I will be on vacation from Saturday until the following Sunday (May 4). I will not in email contact, so I won't be fixing bugs or answering questions. You'll have to fend for yourselves. :-) I would like to do the Q2 release of the RDKit sometime not too long after I get back, so if anyone has bugs they've noticed and not gotten around to reporting, please do so in the next week or so. Unless something big comes up, I will do the first release candidate sometime around May 10 or 11. -greg |
|
From: Greg L. <gre...@gm...> - 2008-04-12 13:54:27
|
Hi, As part of the fix to issues 1934360: http://sourceforge.net/tracker/index.php?func=detail&aid=1934360&group_id=160139&atid=814650 and 1940646: http://sourceforge.net/tracker/index.php?func=detail&aid=1940646&group_id=160139&atid=814650 I made a couple of slight modification to the aromaticity rules: - An atom is now no longer considered to be a candidate for aromaticity if it has more than one double or triple bond - atoms with triple bonds contribute one electron to the pi electron count for a ring. Some rings that were previously considered aromatic that aren't anymore: C1=C=NCN1 C1#CC=C1 A ring that was not previously considered aromatic that now is: C1#CC=CC=C1 There are now checked in. If anyone has objections, comments, or suggestions, please let me know. -greg |
|
From: Adrian S. <adr...@gm...> - 2008-04-09 21:09:05
|
Hi Greg,
I get the emails - I will test it tomorrow with the MSDchem set and
send you the results.
Adrian
On Wed, Apr 9, 2008 at 8:24 PM, Greg Landrum <gre...@gm...> wrote:
> Adrian,
>
> I'm not sure if you get email when the bugs are updated or not, so
> I'll go ahead and post. The proposed solution here has been
> implemented and checked in:
>
>
> > On Wed, Apr 2, 2008 at 7:14 PM, Greg Landrum <gre...@gm...> wrote:
> > >
>
> > > What about solution A (returning None on parse failure), with messages
> > > displayed to stderr using the logging mechanism? I.e. when you'd see
> > > something like this:
> > > >>> m = Chem.MolFromSmiles('c1cccc1')
> > > [TIMESTAMP] Sanitization error: Can't kekulize mol
> > > >>> m is None
> > > True
>
> The various file parsers (including the suppliers) should no longer
> throw exceptions when they fail. They should display error messages
> and return None. The exception to this is when the file parsers (or
> suppliers) fail to open the input file; in this case you'll get an
> IOError, which is the Pythonic (IMO) way of doing things.
>
> Let me know if you have a chance to try it and find any problems,
> -greg
>
|
|
From: Greg L. <gre...@gm...> - 2008-04-09 19:24:51
|
Adrian,
I'm not sure if you get email when the bugs are updated or not, so
I'll go ahead and post. The proposed solution here has been
implemented and checked in:
> On Wed, Apr 2, 2008 at 7:14 PM, Greg Landrum <gre...@gm...> wrote:
> >
> > What about solution A (returning None on parse failure), with messages
> > displayed to stderr using the logging mechanism? I.e. when you'd see
> > something like this:
> > >>> m = Chem.MolFromSmiles('c1cccc1')
> > [TIMESTAMP] Sanitization error: Can't kekulize mol
> > >>> m is None
> > True
The various file parsers (including the suppliers) should no longer
throw exceptions when they fail. They should display error messages
and return None. The exception to this is when the file parsers (or
suppliers) fail to open the input file; in this case you'll get an
IOError, which is the Pythonic (IMO) way of doing things.
Let me know if you have a chance to try it and find any problems,
-greg
|
|
From: Greg L. <gre...@gm...> - 2008-04-06 10:45:29
|
Dear All, Another quick note about a small change that may have large repercussions: I just checked in (rev594) the next step in the change in dummy atom handling that was started in the last release. Here's an explanation from the release notes: - Support for supplying dummy atoms as "[Du]", "[X]", "[Xa]", etc. is now considered deprecated. In this release a warning will be generated for these forms and in the next release the old form will generate errors. Note that the output of dummy atoms has also changed: the default output format is now "*". (feature request 186217) Best Regards, -greg |
|
From: Greg L. <gre...@gm...> - 2008-04-06 09:17:47
|
Dear all, I just checked in (rev593) a big simplification to the logging system used in the RDKit. The changes shouldn't break existing code, but if you made more than simple use of the logger, it's probably a good idea to check that things still work. The reason for the simplification is that the former system, based on an old version of a proposed boost.logging library, was just too cumbersome to build and work with. There were also problems with operations across shared libraries (error log messages from some shared libraries would not show up when using the python wrappers) that were more or less unresolvable without large effort. I will continue to keep an eye on the developments with the new boost.logging. If it's ever approved as a standard boost library I will redo the logging code to use that. In the meantime, we have something basic that works. If anyone has suggestions for another open source c++ logging library with a compatible license (e.g. nothing GPL'ed) that is reasonably lightweight, I'd be happy to hear about it. Regards -greg |
|
From: Adrian S. <adr...@gm...> - 2008-04-02 18:19:54
|
Would be fine with me!
Adrian
On Wed, Apr 2, 2008 at 7:14 PM, Greg Landrum <gre...@gm...> wrote:
> On Wed, Apr 2, 2008 at 7:29 PM, Adrian Schreyer
> <adr...@gm...> wrote:
> > Personally, I would appreciate a solution which indicates why
> > structure parsing has failed (or at least allows some kind of debug).
> > I do not know if this is feasible in wrapped C++ code but it is really
> > useful to know where the problem arises from, particularly in the
> > context of bug reports.
> >
> > I have to agree solution B is neither very elegant nor consistent but
> > maybe there is a way to implement some kind of debug mode?
>
> What about solution A (returning None on parse failure), with messages
> displayed to stderr using the logging mechanism? I.e. when you'd see
> something like this:
> >>> m = Chem.MolFromSmiles('c1cccc1')
> [TIMESTAMP] Sanitization error: Can't kekulize mol
> >>> m is None
> True
>
>
> -greg
>
|
|
From: Greg L. <gre...@gm...> - 2008-04-02 18:15:14
|
On Wed, Apr 2, 2008 at 7:29 PM, Adrian Schreyer
<adr...@gm...> wrote:
> Personally, I would appreciate a solution which indicates why
> structure parsing has failed (or at least allows some kind of debug).
> I do not know if this is feasible in wrapped C++ code but it is really
> useful to know where the problem arises from, particularly in the
> context of bug reports.
>
> I have to agree solution B is neither very elegant nor consistent but
> maybe there is a way to implement some kind of debug mode?
What about solution A (returning None on parse failure), with messages
displayed to stderr using the logging mechanism? I.e. when you'd see
something like this:
>>> m = Chem.MolFromSmiles('c1cccc1')
[TIMESTAMP] Sanitization error: Can't kekulize mol
>>> m is None
True
-greg
|
|
From: Adrian S. <adr...@gm...> - 2008-04-02 17:29:07
|
Personally, I would appreciate a solution which indicates why
structure parsing has failed (or at least allows some kind of debug).
I do not know if this is feasible in wrapped C++ code but it is really
useful to know where the problem arises from, particularly in the
context of bug reports.
I have to agree solution B is neither very elegant nor consistent but
maybe there is a way to implement some kind of debug mode?
On Wed, Apr 2, 2008 at 5:40 PM, Greg Landrum <gre...@gm...> wrote:
> Hi Adrian,
>
> Sorry for a CC to a list you're subscribed to, but I'm not sure if you
> read the mailing list and this is a message/discussion that needs to
> be on the mailing list.
>
> Here's the text from your bug report:
> ------------------------
> RDkit does neither return a molecule nor raise an error if an element in a
> SMILES string is lowercase:
>
>
> In [15]: Chem.MolFromSmiles('c1c[se]c2=NCC(=c21)CC(C(=O)O)N')
>
> In [16]: Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
> Out[16]: <Chem.rdchem.Mol object at 0x8bc93e4>
> As far as I know lowercase elements are not really SMILES standard
> therefore I would propose to throw some kind of error which can be handled
> nicely in Python.
> ------------------------
>
> There are two things going on here:
>
> 1) Se is not recognized as an element that can be aromatic. The
> consequence of this is that [se] is not recognized. In your line [16]
> above, the molecule is processed correctly, but if you look at the
> SMILES:
> [2]>>> m = Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
>
> [3]>>> Chem.MolToSmiles(m)
> Out[3] 'NC(CC1=C2C=C[Se]C2=NC1)C(=O)O'
>
> you see that the ring is not recognized as aromatic. This is a very
> straightforward fix. Within the proposed OpenSmiles spec
> (http://opensmiles.org/spec/open-smiles-2-grammar.html), both Se and
> As can be marked aromatic. Unless I hear otherwise from the list, I
> will support this and modify the smiles parser and aromaticity
> recognition code to accept Se and As as aromatic atoms.
>
> 2) When the smiles parser (or the other molecule parsers) fails, the
> result is a "None" as a return value. The theory behind this was that
> the code to catch bad molecules is then much simpler:
> for s in smiles:
> m = Chem.MolFromSmiles(s)
> if not m: continue
> instead of:
> for s in smiles:
> try:
> m = Chem.MolFromSmiles(s)
> except:
> continue
> That's the theory. The practice is that the try/except is still needed
> because the sanitization code can throw exceptions:
> [4]>>> Chem.MolFromSmiles('c1cccc1')
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
>
> /home/glandrum/RDKit/Code/GraphMol/SmilesParse/<ipython console> in <module>()
>
> ValueError: Sanitization error: Can't kekulize mol
>
> This is inconsistent, confusing, and wrong. So I'll fix it. The
> question is how. I see two options:
> A) Continue returning None for failed parsing, but catch the
> kekulization exceptions on the C++ side so that they don't make it
> through to Python (i.e. return None from line [4] above)
> B) Have all parse failures raise exceptions.
>
> I'm leaning towards A) because I think the resulting client code is
> much more readable. This really becomes important with suppliers,
> which I think should also be consistent. Solution A) allows code like
> this:
>
> suppl = Chem.SDMolSupplier('file.sdf')
> for mol in suppl:
> if not mol: continue
> <do something>
>
> instead of this, which solution B would require (not tested):
>
> suppl = Chem.SDMolSupplier('file.sdf')
> while 1:
> try:
> m = suppl.next()
> except StopIteration:
> break
> else:
> continue
> < do something >
>
> Any arguments against me implementing solution A?
>
> Thanks again for the detailed and careful bug report,
> -greg
>
|
|
From: Greg L. <gre...@gm...> - 2008-04-02 16:40:55
|
Hi Adrian,
Sorry for a CC to a list you're subscribed to, but I'm not sure if you
read the mailing list and this is a message/discussion that needs to
be on the mailing list.
Here's the text from your bug report:
------------------------
RDkit does neither return a molecule nor raise an error if an element in a
SMILES string is lowercase:
In [15]: Chem.MolFromSmiles('c1c[se]c2=NCC(=c21)CC(C(=O)O)N')
In [16]: Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
Out[16]: <Chem.rdchem.Mol object at 0x8bc93e4>
As far as I know lowercase elements are not really SMILES standard
therefore I would propose to throw some kind of error which can be handled
nicely in Python.
------------------------
There are two things going on here:
1) Se is not recognized as an element that can be aromatic. The
consequence of this is that [se] is not recognized. In your line [16]
above, the molecule is processed correctly, but if you look at the
SMILES:
[2]>>> m = Chem.MolFromSmiles('c1c[Se]c2=NCC(=c21)CC(C(=O)O)N')
[3]>>> Chem.MolToSmiles(m)
Out[3] 'NC(CC1=C2C=C[Se]C2=NC1)C(=O)O'
you see that the ring is not recognized as aromatic. This is a very
straightforward fix. Within the proposed OpenSmiles spec
(http://opensmiles.org/spec/open-smiles-2-grammar.html), both Se and
As can be marked aromatic. Unless I hear otherwise from the list, I
will support this and modify the smiles parser and aromaticity
recognition code to accept Se and As as aromatic atoms.
2) When the smiles parser (or the other molecule parsers) fails, the
result is a "None" as a return value. The theory behind this was that
the code to catch bad molecules is then much simpler:
for s in smiles:
m = Chem.MolFromSmiles(s)
if not m: continue
instead of:
for s in smiles:
try:
m = Chem.MolFromSmiles(s)
except:
continue
That's the theory. The practice is that the try/except is still needed
because the sanitization code can throw exceptions:
[4]>>> Chem.MolFromSmiles('c1cccc1')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/glandrum/RDKit/Code/GraphMol/SmilesParse/<ipython console> in <module>()
ValueError: Sanitization error: Can't kekulize mol
This is inconsistent, confusing, and wrong. So I'll fix it. The
question is how. I see two options:
A) Continue returning None for failed parsing, but catch the
kekulization exceptions on the C++ side so that they don't make it
through to Python (i.e. return None from line [4] above)
B) Have all parse failures raise exceptions.
I'm leaning towards A) because I think the resulting client code is
much more readable. This really becomes important with suppliers,
which I think should also be consistent. Solution A) allows code like
this:
suppl = Chem.SDMolSupplier('file.sdf')
for mol in suppl:
if not mol: continue
<do something>
instead of this, which solution B would require (not tested):
suppl = Chem.SDMolSupplier('file.sdf')
while 1:
try:
m = suppl.next()
except StopIteration:
break
else:
continue
< do something >
Any arguments against me implementing solution A?
Thanks again for the detailed and careful bug report,
-greg
|
|
From: Greg L. <gre...@gm...> - 2008-03-19 19:21:21
|
Today while browsing license documentation and a new Qt4 installation, I noticed a very interesting file in the distribution directory. The contents of said file are basically mirrored here: http://trolltech.com/products/qt/gplexception PyQt has a similar exception in it. I read this to mean that it's possible to BSD license the GUI components of the RDKit (at least new ones, developed with Qt4, since I don't know when this exception appeared). This makes me very happy. I was never happy with being stuck with the GPL on the GUI components, and now it seems that is no longer necessary. Thank you Trolltech and Riverbank Computing! -greg |
|
From: Greg L. <gre...@gm...> - 2008-03-16 18:19:16
|
I'm not sure what it was, but something set me off this weekend and convinced me that the molecule drawing code absolutely *had* to be improved. I kicked around the idea of doing a C++ renderer based on AGG (http://antigrain.com/), but that seemed like too much to get my brain around on a Friday afternoon, so I looked around some more. I've been using matplotlib (http://matplotlib.sourceforge.net/) a bit lately and I knew from an old post by Andrew Dalke (http://www.dalkescientific.com/writings/diary/archive/2005/04/23/matplotlib_without_gui.html) that you could add arbitrary geometry to your matplot lib figures, so I decided to give that a try. [An aside: if you don't know about matplotlib, I strongly recommend looking into it... it's a very nice tool for doing plotting from python] The matplotlib thing worked pretty well, but it's asking a lot to have people install matplotlib just to be able to do molecule rendering. And it somehow feels like a hack. I took another swing at it using aggdraw, from the effbot: http://effbot.org/zone/aggdraw-index.htm and got something that is pretty nice, as the attached image demonstrates. Since molecule drawing really just requires a few primitives, I restructured the existing drawing code so that it will work either with matplotlib, aggdraw, or the old approach. The changes are in branch: http://rdkit.svn.sourceforge.net/viewvc/rdkit/branches/NewDrawing_16March2008/ and are mostly localized to this directory: http://rdkit.svn.sourceforge.net/viewvc/rdkit/branches/NewDrawing_16March2008/Python/Chem/Draw/ For creating images, I think the aggdraw renderer is the best. The sping renderer can draw to pdf or svg (or qt!), which is nice, but one really misses the antialiasing when it's used for images. Back to thinking about rendering from C++: what would be really nice would be to look into porting the AGG demo for molecule drawing to work with RDKit molecules (http://www.antigrain.com/demo/mol_view.cpp.html). But that's a project for another weekend. And it would have to use a pre-license-change version of AGG. Enjoy and let me know what you think. -greg |