rdkit-devel Mailing List for RDKit (Page 16)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(24) |
Jun
(20) |
Jul
|
Aug
(2) |
Sep
(4) |
Oct
(39) |
Nov
(33) |
Dec
(8) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
(17) |
Feb
(13) |
Mar
(35) |
Apr
(10) |
May
(1) |
Jun
(2) |
Jul
(3) |
Aug
(4) |
Sep
(4) |
Oct
(7) |
Nov
(1) |
Dec
|
| 2008 |
Jan
(10) |
Feb
(2) |
Mar
(2) |
Apr
(10) |
May
(8) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
|
| 2009 |
Jan
(2) |
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(7) |
Aug
(2) |
Sep
(6) |
Oct
(12) |
Nov
|
Dec
|
| 2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
(4) |
Jun
(2) |
Jul
(17) |
Aug
(7) |
Sep
(20) |
Oct
(8) |
Nov
(1) |
Dec
(12) |
| 2011 |
Jan
(8) |
Feb
(15) |
Mar
(20) |
Apr
(5) |
May
(8) |
Jun
(2) |
Jul
(17) |
Aug
(8) |
Sep
(4) |
Oct
(15) |
Nov
|
Dec
(2) |
| 2012 |
Jan
(3) |
Feb
|
Mar
(23) |
Apr
(2) |
May
(2) |
Jun
(8) |
Jul
(7) |
Aug
(18) |
Sep
(8) |
Oct
(10) |
Nov
(2) |
Dec
(7) |
| 2013 |
Jan
(6) |
Feb
(3) |
Mar
|
Apr
(3) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
(5) |
Nov
|
Dec
|
| 2014 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(10) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
(6) |
| 2015 |
Jan
(22) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(10) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(9) |
Dec
(3) |
| 2016 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(31) |
May
(3) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(10) |
Dec
(7) |
| 2017 |
Jan
|
Feb
(7) |
Mar
(3) |
Apr
(6) |
May
(4) |
Jun
(6) |
Jul
(5) |
Aug
(1) |
Sep
(7) |
Oct
(1) |
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
(11) |
Apr
(13) |
May
(18) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(3) |
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
(2) |
| 2020 |
Jan
(2) |
Feb
|
Mar
(5) |
Apr
(2) |
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
| 2021 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
(1) |
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Nicholas F. <Nic...@ic...> - 2012-03-08 17:04:44
|
Hi All, I've run into a few more problems with Fingerprinting, I'm trying to create a database of these fingerprints, I've tried a few different methods to write the fingerprints SparseIntVect<boost::uint32_t> *finger; mol=SmilesToMol(line); finger = MorganFingerprints::getFingerprint(*mol, 2); cout << finger.toString() << endl; Which returns Fingerprint.cpp:49: error: request for member ‘toString’ in ‘finger’, which is of non-class type ‘RDKit::SparseIntVect<unsigned int>*’ Then as a fix i tried intreated over the entries in the vector however firstly I can't use the getLength function because of the same error above, and when I put in an arbitrary depth and managed to get some output, however it seemed only to output a huge amount of zero's Thanks, Nick The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. |
|
From: Gianluca S. <gi...@gm...> - 2012-03-07 11:45:41
|
On Wed, Mar 7, 2012 at 1:16 AM, Michael Banck <mb...@gm...> wrote: > The general way of building Debian packages is: > > 1. Configure (run cmake in this case) > 2. Make > 3. Run testsuite, if there is any > 4. Run Make install, overriding the target directory to a staging > directory under debian/ > 5. Assemble package from the contents of the staging directory plus > additional metadata. This looks mostly the same for RPM based distros > > This makes it impossible or at least hackish and awkward to run the > testsuite after install. > > I will have to look how other cmake projects do this with compiled > python modules. Note that I set RDK_INSTALL_INTREE=OFF. I am not sure if you will find it hackish or not, but I'm using RDK_INSTALL_INTREE=OFF and actually running tests in my rpm .spec file here: http://giallu.fedorapeople.org/rdkit.spec I guess you should be ok with a similar arrangement. -- Gianluca Sforna http://morefedora.blogspot.com http://identi.ca/giallu - http://twitter.com/giallu |
|
From: Michael B. <mb...@de...> - 2012-03-07 11:26:59
|
Hi, On Wed, Mar 07, 2012 at 01:16:11AM +0100, Michael Banck wrote: > Maybe it would indeed be easiest to do an in-tree build, but in my > opinion, it should not matter whether one builds in-tree or not. Indeed, this works, provided I set bothe the RDBASE and PYTHONPATH environment variables to the top-level source directory before running the test suite (before, I set RDBASE to the top-level source directory, and PYTHONPATH to the build directory, after copying over all python files to the build directory). However, this way, some binary files in CMakeFiles (apparently some leftovers from some checks, like CMakeFiles/CompilerIdC/a.out or CMakeFiles/CompilerIdCXX/a.out) do not get removed on clean, as well as Code/Geometry/junk.bin. Further, all the cmake files stay after clean (see attached list of files, if it makes it past the sourceforge mail server); maybe this is an artifact of building in-tree (usually one just removes the build directory on clean, which is not possible now), the cmake generated Makefile does not seem to have a "distclean" target, though. Anyway, it is not a big problem in order to build the Debian packages, just slightly inelegant in my opinion. Having some cmake rules which copy over the python files to the corresponding directory under the build directory would fix this I guess. Putting the python byte-code into the source-tree even for out-of-build trees looks wrong to me anyway. Michael |
|
From: Greg L. <gre...@gm...> - 2012-03-07 05:45:48
|
On Wed, Mar 7, 2012 at 1:16 AM, Michael Banck <mb...@gm...> wrote: > >> I'm not sure what you mean here. If you're doing an out-of-source >> build (which it sounds like you are), then you need to do a "make >> install" to get the built .so files copied into the $RDBASE/rdkit >> directory so that the python tests can pass. > > The general way of building Debian packages is: > > 1. Configure (run cmake in this case) > 2. Make > 3. Run testsuite, if there is any > 4. Run Make install, overriding the target directory to a staging > directory under debian/ > 5. Assemble package from the contents of the staging directory plus > additional metadata. > > This makes it impossible or at least hackish and awkward to run the > testsuite after install. Got it. For this, you really need to do an in-tree build. > I will have to look how other cmake projects do this with compiled > python modules. Note that I set RDK_INSTALL_INTREE=OFF. > > Maybe it would indeed be easiest to do an in-tree build, but in my > opinion, it should not matter whether one builds in-tree or not. When you run a plain "make" using the cmake-generated Makefile, you end up with all the binary files deposited under the directory where the main makefile sits. For example, here's some output from a partial RDKit build of one of the python modules: /scratch/RDKit_trunk/build > make -j4 rdBase -- Configuring done -- Generating done -- Build files have been written to: /scratch/RDKit_trunk/build [ 0%] Building CXX object Code/RDBoost/CMakeFiles/RDBoost.dir/Wrap.cpp.o <snip> [100%] Building CXX object Code/RDBoost/Wrap/CMakeFiles/rdBase.dir/RDBase.cpp.o Linking CXX shared module ../../../rdkit/rdBase.so [100%] Built target rdBase /scratch/RDKit_trunk/build > ls rdkit/*.so rdkit/rdBase.so In order for the python tests to run, however, this file needs to be located in ../rdkit, where the rest of the python files are. This is what make install would take care of. The alternative is to do an in-tree build, shown here: /scratch/RDKit_intree > make rdBase Scanning dependencies of target RDBoost [ 0%] Building CXX object Code/RDBoost/CMakeFiles/RDBoost.dir/Wrap.cpp.o <snip> Linking CXX shared module ../../../rdkit/rdBase.so [100%] Built target rdBase /scratch/RDKit_intree > ls rdkit/*.so rdkit/rdBase.so Now I can directly run the python tests. To get the "make install" working correctly into a staging area, you can provide the name of the staging directory to cmake: /scratch/RDKit_intree > cmake -DCMAKE_INSTALL_PREFIX=/scratch/usr . and then do a "make install" after you've run "make" and "ctest": /scratch/RDKit_intree > rm -rf /scratch/usr/* /scratch/RDKit_intree > make -j2 install [ 1%] Built target fastentropy <snip> -- Removed runtime path from "/scratch/usr/lib/python2.6/site-packages/rdkit/Chem/rdChemicalFeatures.so" -- Up-to-date: /scratch/usr/lib/python2.6/site-packages/rdkit/RDPaths.py And now quickly verify that something sensible happened: /scratch/RDKit_intree > ls /scratch/usr/lib/python2.6/site-packages/rdkit Avalon DistanceGeometry Makefile RDRandom.py Chem epydoc.config ML SimDivFilters cmake_install.cmake Excel Numerics sping CTestTestfile.cmake ForceField rdBase.so TestRunner.py DataManip Geometry RDConfig.py utils DataStructs __init__.py RDLogger.py VLib Dbase Logger RDPaths.py >> Up to this point I was fine, and can fix the problem, but this one: >> > rdkit-201112/rdkit/__init__.py >> is in svn, so it shouldn't be showing up as something that didn't >> originally exist. > > Right, that was probably some leftover from some unsuccessful > experiments I did with trying to add some stuff to __init__.py so it > finds the corresponding .so files. So nevermind this. ok, cool. The rest of the changes you had suggested, other than the two .sqlt files in $RDBASE/Data, are checked in. The .sqlt files are checked into svn, so they shouldn't be showing up in a list of uncleaned up files. Could it be because at least one of those files is modified during the tests? Thanks again for your work on this! -greg |
|
From: Michael B. <mb...@gm...> - 2012-03-07 00:16:19
|
Hi, On Tue, Mar 06, 2012 at 04:40:27AM +0100, Greg Landrum wrote: > On Sun, Mar 4, 2012 at 8:59 PM, Michael Banck <mb...@de...> wrote: > > On Sun, Jan 15, 2012 at 10:12:03AM +0100, Greg Landrum wrote: > > 1. Running the test-suite in-place is either impossible or really tricky > > as the cmake puts the compiled objects below obj-* and leaves the python > > code (including the .pyc byte-code files) in the main source tree. I > > had to give up and just copy over rdkit recursively into obj-*. > > I'm not sure what you mean here. If you're doing an out-of-source > build (which it sounds like you are), then you need to do a "make > install" to get the built .so files copied into the $RDBASE/rdkit > directory so that the python tests can pass. The general way of building Debian packages is: 1. Configure (run cmake in this case) 2. Make 3. Run testsuite, if there is any 4. Run Make install, overriding the target directory to a staging directory under debian/ 5. Assemble package from the contents of the staging directory plus additional metadata. This makes it impossible or at least hackish and awkward to run the testsuite after install. I will have to look how other cmake projects do this with compiled python modules. Note that I set RDK_INSTALL_INTREE=OFF. Maybe it would indeed be easiest to do an in-tree build, but in my opinion, it should not matter whether one builds in-tree or not. > Up to this point I was fine, and can fix the problem, but this one: > > rdkit-201112/rdkit/__init__.py > is in svn, so it shouldn't be showing up as something that didn't > originally exist. Right, that was probably some leftover from some unsuccessful experiments I did with trying to add some stuff to __init__.py so it finds the corresponding .so files. So nevermind this. Michael |
|
From: Riccardo V. <ric...@gm...> - 2012-03-06 14:55:46
|
Hi, >> 1. Running the test-suite in-place is either impossible or really tricky >> as the cmake puts the compiled objects below obj-* and leaves the python >> code (including the .pyc byte-code files) in the main source tree. I >> had to give up and just copy over rdkit recursively into obj-*. > > I'm not sure what you mean here. If you're doing an out-of-source > build (which it sounds like you are), then you need to do a "make > install" to get the built .so files copied into the $RDBASE/rdkit > directory so that the python tests can pass. alternatively (esp. in case a similar workflow were more suitable to packaging) one could build inside the source tree, run the test suite, and finally perform the installation to a different filesystem location. Building inside the source tree is often advised against, but the implementation of some tests only works if the binaries are in there. HTH, Riccardo |
|
From: Greg L. <gre...@gm...> - 2012-03-06 03:40:55
|
Dear Michael, On Sun, Mar 4, 2012 at 8:59 PM, Michael Banck <mb...@de...> wrote: > On Sun, Jan 15, 2012 at 10:12:03AM +0100, Greg Landrum wrote: >> I'm very happy to announce that the next version of the RDKit -- >> 2011.12 (a.k.a Q4 2011) -- is released. > > I have uploaded 2011.12 to Debian now. Great! Thanks. > Several minor issues have turned up, I believe they apply to previous > versions as well though: most likely. > 1. Running the test-suite in-place is either impossible or really tricky > as the cmake puts the compiled objects below obj-* and leaves the python > code (including the .pyc byte-code files) in the main source tree. I > had to give up and just copy over rdkit recursively into obj-*. I'm not sure what you mean here. If you're doing an out-of-source build (which it sounds like you are), then you need to do a "make install" to get the built .so files copied into the $RDBASE/rdkit directory so that the python tests can pass. > 2. cmake clean does not remove the .pyc files it generated. > > 3. Some .sqlt files appear in Projects/DbCLI/TestData/bzr (after running > the test suite I assume) and they do not get removed by cmake clean > either. Both of those are easy to fix; I'll do it in the next day or so. > 4. The following files get generated during build and are not cleaned by > cmake: > > rdkit-201112/Code/RDGeneral/versions.h > rdkit-201112/Code/GraphMol/SmilesParse/smiles.tab.hpp > rdkit-201112/Code/GraphMol/SmilesParse/lex.yysmarts.cpp > rdkit-201112/Code/GraphMol/SmilesParse/smarts.tab.cpp > rdkit-201112/Code/GraphMol/SmilesParse/smarts.tab.hpp > rdkit-201112/Code/GraphMol/SmilesParse/lex.yysmiles.cpp > rdkit-201112/Code/GraphMol/SmilesParse/smiles.tab.cpp > rdkit-201112/Code/GraphMol/Depictor/test_data/collisions.sdf > rdkit-201112/Code/GraphMol/Depictor/test_data/test1out.sd > rdkit-201112/Code/GraphMol/Depictor/test_data/cis_trans_cpp.sdf > rdkit-201112/Code/GraphMol/Depictor/test_data/first_200.sdf > rdkit-201112/Code/GraphMol/FileParsers/test_data/outNCI_few.sdf > rdkit-201112/Code/GraphMol/FileParsers/test_data/outSmiles.csv > rdkit-201112/Code/GraphMol/FileParsers/test_data/cdk2_stereo.sdf > rdkit-201112/Code/GraphMol/FileParsers/test_data/outNCI_few.tdt > rdkit-201112/Code/GraphMol/FileParsers/test_data/outNCI_first_200.props.sdf > rdkit-201112/Code/GraphMol/SLNParse/sln.tab.cpp > rdkit-201112/Code/GraphMol/SLNParse/lex.yysln.cpp > rdkit-201112/Code/GraphMol/SLNParse/sln.tab.hpp > rdkit-201112/Code/GraphMol/Wrap/test_data/outNCI_few.sdf > rdkit-201112/Code/GraphMol/Wrap/test_data/outSmiles.txt > rdkit-201112/rdkit/Chem/inchi.py Up to this point I was fine, and can fix the problem, but this one: > rdkit-201112/rdkit/__init__.py is in svn, so it shouldn't be showing up as something that didn't originally exist. > > It could be this is due to how Debian runs cmake (we set > -DRDK_INSTALL_INTREE=OFF and -DRDK_INSTALL_STATIC_LIBS=OFF) currently, > not sure. If not, I think cmake and/or setup.py (is the latter one even > ran during build) should not create/modify files in the main tree, but > put those under obj-*. to answer the question that's embedded in here: no, setup.py is not used for anything. I should remove it from svn. > Anyway, I admit I haven't digged too deep into this yet, so I might be > talking total nonsense. definitely not. :-) -greg |
|
From: Greg L. <gre...@gm...> - 2012-03-05 07:40:31
|
Dear all, One of the joys of working with knime is that it makes it really easy to run RDKit code in parallel. One of the pains is that this is done using multi-threading, which quickly reveals that some of the RDKit code is not quite as thread safe as I had hoped that it was. In order to address some of these problems, I created a branch where I've fixed a fair number of things already: https://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/MultiThreading_25Feb2012 I'm going to hold off on merging this with the trunk until I've done some more testing with different compilers. If anyone else has noticed areas where the RDKit is behaving poorly under multi-threaded conditions, please let me know. -greg |
|
From: Michael B. <mb...@de...> - 2012-03-04 19:59:47
|
Hi, On Sun, Jan 15, 2012 at 10:12:03AM +0100, Greg Landrum wrote: > I'm very happy to announce that the next version of the RDKit -- > 2011.12 (a.k.a Q4 2011) -- is released. I have uploaded 2011.12 to Debian now. Several minor issues have turned up, I believe they apply to previous versions as well though: 1. Running the test-suite in-place is either impossible or really tricky as the cmake puts the compiled objects below obj-* and leaves the python code (including the .pyc byte-code files) in the main source tree. I had to give up and just copy over rdkit recursively into obj-*. 2. cmake clean does not remove the .pyc files it generated. 3. Some .sqlt files appear in Projects/DbCLI/TestData/bzr (after running the test suite I assume) and they do not get removed by cmake clean either. 4. The following files get generated during build and are not cleaned by cmake: rdkit-201112/Code/RDGeneral/versions.h rdkit-201112/Code/GraphMol/SmilesParse/smiles.tab.hpp rdkit-201112/Code/GraphMol/SmilesParse/lex.yysmarts.cpp rdkit-201112/Code/GraphMol/SmilesParse/smarts.tab.cpp rdkit-201112/Code/GraphMol/SmilesParse/smarts.tab.hpp rdkit-201112/Code/GraphMol/SmilesParse/lex.yysmiles.cpp rdkit-201112/Code/GraphMol/SmilesParse/smiles.tab.cpp rdkit-201112/Code/GraphMol/Depictor/test_data/collisions.sdf rdkit-201112/Code/GraphMol/Depictor/test_data/test1out.sd rdkit-201112/Code/GraphMol/Depictor/test_data/cis_trans_cpp.sdf rdkit-201112/Code/GraphMol/Depictor/test_data/first_200.sdf rdkit-201112/Code/GraphMol/FileParsers/test_data/outNCI_few.sdf rdkit-201112/Code/GraphMol/FileParsers/test_data/outSmiles.csv rdkit-201112/Code/GraphMol/FileParsers/test_data/cdk2_stereo.sdf rdkit-201112/Code/GraphMol/FileParsers/test_data/outNCI_few.tdt rdkit-201112/Code/GraphMol/FileParsers/test_data/outNCI_first_200.props.sdf rdkit-201112/Code/GraphMol/SLNParse/sln.tab.cpp rdkit-201112/Code/GraphMol/SLNParse/lex.yysln.cpp rdkit-201112/Code/GraphMol/SLNParse/sln.tab.hpp rdkit-201112/Code/GraphMol/Wrap/test_data/outNCI_few.sdf rdkit-201112/Code/GraphMol/Wrap/test_data/outSmiles.txt rdkit-201112/rdkit/__init__.py rdkit-201112/rdkit/Chem/inchi.py It could be this is due to how Debian runs cmake (we set -DRDK_INSTALL_INTREE=OFF and -DRDK_INSTALL_STATIC_LIBS=OFF) currently, not sure. If not, I think cmake and/or setup.py (is the latter one even ran during build) should not create/modify files in the main tree, but put those under obj-*. Anyway, I admit I haven't digged too deep into this yet, so I might be talking total nonsense. Michael |
|
From: Greg L. <gre...@gm...> - 2012-01-15 09:12:30
|
I'm very happy to announce that the next version of the RDKit -- 2011.12 (a.k.a Q4 2011) -- is released. The release notes are below. The source release is on the sourceforge downloads page: http://sourceforge.net/projects/rdkit/files/rdkit/Q4_2011/ The files can also be downloaded from the google project page: http://code.google.com/p/rdkit/downloads/list The binaries for Windows, Python 2.6 and Python 2.7 are uploaded already. Thanks to the everyone who submitted bug reports and suggestions for this release! Please let me know if you find any problems with the release or have suggestions for the next one. -greg ****** Release_2011.12.1 ******* (Changes relative to Release_2011.09.1) !!!!!! IMPORTANT !!!!!! - The functions for creating bit vector fingerprints using atom pairs and topological torsions have been changed. The new default behavior will return different fingerprints than previous RDKit versions. This affects usage from c++, python, and within the postgresql cartridge. See the "Other" section below for more details. - Due to a bug fix in the parameter set, the MolLogP and MolMR descriptor calculators now return different values for some molecules. See the "Bug Fixes" section below for more details. - To make storage more efficient, the size of the fingerprint used to store morgan fingerprints in the database cartridge has been changed from 1024 bits to 512 bits. If you update the cartridge version all morgan and featmorgan fingerprints and indices will need to be re-generated. Acknowledgements: Andrew Dalke, JP Ebejer, Roger Sayle, Adrian Schreyer, Gianluca Sforna, Riccardo Vianello, Toby Wright Bug Fixes: - molecules with polymeric S group information are now rejected by the Mol file parser. (Issue 3432136) - A bad atom type definition and a bad smarts definition were fixed in $RDBASE/Data/Crippen.txt. This affects the values returned by the logp and MR calculators. (Issue 3433771) - Unused atom-map numbers in reaction products now produce warnings instead of errors. (Issue 3434271) - rdMolDescriptors.GetHashedAtomPairFingerprint() now works. (Issue 3441641) - ReplaceSubstructs() now copies input molecule conformations to the output molecule. (Issue 3453144) - three-coordinate S and Se are now stereogenic (i.e. the stereochemistry of O=[S@](C)F is no longer ignored). (Issue 3453172) New Features: - Integration with the new IPython graphical canvas has been added. For details see this wiki page: http://code.google.com/p/rdkit/wiki/IPythonIntegration - Input and output from Andrew Dalke's FPS format (http://code.google.com/p/chem-fingerprints/wiki/FPS) for fingerprints. - The descriptor CalcNumAmideBonds() was added. New Database Cartridge Features: - Support for PostgreSQL v9.1 - Integration with PostgreSQL's KNN-GIST functionality. (Thanks to Adrian Schreyer) - the functions all_values_gt(sfp,N) and all_values_lt(sfp,N) were added. New Java Wrapper Features: - A function for doing diversity picking using fingerprint similarity. - support for the Avalon Toolkit (see below) Deprecated modules (to be removed in next release): - rdkit.Excel - rdkit.ML.Descriptors.DescriptorsCOM - rdkit.ML.Composite.CompositeCOM Removed modules: - rdkit.WebUtils - rdkit.Reports - rdkit.mixins Other: - Improvements to the SMARTS parser (Roger Sayle) - The atom-pair and topological-torsion fingerprinting functions that return bit vectors now simulate counts by setting multiple bits in the fingerprint per atom-pair/torsion. The number of bits used is controlled by the nBitsPerEntry argument, which now defaults to 4. The new default behavior does a much better job of reproducing the similarities calculated using count-based fingerprints: 95% of calculated similarities are within 0.09 of the count-based value compared with 0.22 or 0.17 for torsions and atom-pairs previously. To get the old behavior, set nBitsPerEntry to 1. - Optional support has been added for the Avalon Toolkit (https://sourceforge.net/projects/avalontoolkit/) to provide an alternate smiles canonicalization, fingerprint, and 2D coordination generation algorithm. - The SLN support can now be switched off using the cmake variable RDK_BUILD_SLN_SUPPORT. - There are now instructions for building the RDKit and the SWIG wrappers in 64bit mode on windows. |
|
From: Uwe H. <che...@uw...> - 2012-01-10 08:51:23
|
Hi, you can find a hg repos at http://sourceforge.net/p/rdk-py3-branch/code (direct hg url: http://hg.code.sf.net/p/rdk-py3-branch/code) This can be a starting point for further python3 support of rdkit. Definitely not working yet: old pickles which bundled binary data within a string, that means almost any old pickles are not readable anymore (they are read as PyUnicode). I haven't looked at possible memory leaks (and many other things) yet. I used PIL-1.1.7-py3-source.zip from http://www.lfd.uci.edu/~gohlke/pythonlibs/ PyCairo support can be achieved with the git repos at git://git.cairographics.org/git/pycairo You must additionally patch src/surface.c with surface.c.diff in the rdkit root directory. cpango/pangocairo) is now achieved via pygobject. test run with python 3.2.2 (linux 64bit): 93% tests passed, 5 tests failed out of 76 Total Test time (real) = 115.67 sec The following tests FAILED: 4 - pyDiscreteValueVect (Failed) 67 - pyRanker (Failed) 70 - pythonTestDbCLI (Failed) 71 - pythonTestDirML (Failed) 76 - pythonTestDirChem (Failed) Errors while running CTest make: *** [test] Fehler 8 these failures are mostly due to old pickles. test run with python 2.7.2 (linux 64bit): 97% tests passed, 2 tests failed out of 76 Total Test time (real) = 262.41 sec The following tests FAILED: 4 - pyDiscreteValueVect (Failed) 61 - pyGraphMolWrap (OTHER_FAULT) Errors while running CTest make: *** [test] Fehler 8 (HINT : test 75 had to be killed manually, haven't looked why. therefore three tests failed and Total Test Time is not representative) regards Uwe |
|
From: Greg L. <gre...@gm...> - 2012-01-07 08:12:34
|
Dear all, This morning I tagged the beta for the 2011.12 (Q4 2011 in the old numbering) release in svn: http://rdkit.svn.sourceforge.net/viewvc/rdkit/tags/Release_2011_12_1beta1/ and uploaded a source distribution to the google code site: http://code.google.com/p/rdkit/downloads/detail?name=RDKit_2011_12_1beta1.tgz If there's demand for it, I will also put up a windows binary. As usual: if no show-stopper bugs appear, I will do the release itself in about a week. Excerpts from the release notes are below. Best Regards, -greg ****** Release_2011.12.1 ******* (Changes relative to Release_2011.09.1) !!!!!! IMPORTANT !!!!!! - The functions for creating bit vector fingerprints using atom pairs and topological torsions have been changed. The new default behavior will return different fingerprints than previous RDKit versions. This affects usage from c++, python, and within the postgresql cartridge. See the "Other" section below for more details. - Due to a bug fix in the parameter set, the MolLogP and MolMR descriptor calculators now return different values for some molecules. See the "Bug Fixes" section below for more details. Acknowledgements: Andrew Dalke, JP Ebejer, Roger Sayle, Adrian Schreyer, Gianluca Sforna, Riccardo Vianello, Toby Wright Bug Fixes: - molecules with polymeric S group information are now rejected by the Mol file parser. (Issue 3432136) - A bad atom type definition and a bad smarts definition were fixed in $RDBASE/Data/Crippen.txt. This affects the values returned by the logp and MR calculators. (Issue 3433771) - Unused atom-map numbers in reaction products now produce warnings instead of errors. (Issue 3434271) - rdMolDescriptors.GetHashedAtomPairFingerprint() now works. (Issue 3441641) - ReplaceSubstructs() now copies input molecule conformations to the output molecule. (Issue 3453144) - three-coordinate S and Se are now stereogenic (i.e. the stereochemistry of O=[S@](C)F is no longer ignored). (Issue 3453172) New Features: - Integration with the new IPython graphical canvas has been added. For details see this wiki page: http://code.google.com/p/rdkit/wiki/IPythonIntegration - Input and output from Andrew Dalke's FPS format (http://code.google.com/p/chem-fingerprints/wiki/FPS) for fingerprints. - The descriptor CalcNumAmideBonds() was added. New Database Cartridge Features: - Support for PostgreSQL v9.1 - Integration with PostgreSQL's KNN-GIST functionality. (Thanks to Adrian Schreyer) - the functions all_values_gt(sfp,N) and all_values_lt(sfp,N) were added. New Java Wrapper Features: - A function for doing diversity picking using fingerprint similarity. - support for the Avalon Toolkit (see below) Deprecated modules (to be removed in next release): - rdkit.Excel - rdkit.ML.Descriptors.DescriptorsCOM - rdkit.ML.Composite.CompositeCOM Removed modules: - rdkit.WebUtils - rdkit.Reports - rdkit.mixins Other: - Improvements to the SMARTS parser (Roger Sayle) - The atom-pair and topological-torsion fingerprinting functions that return bit vectors now simulate counts by setting multiple bits in the fingerprint per atom-pair/torsion. The number of bits used is controlled by the nBitsPerEntry argument, which now defaults to 4. The new default behavior does a much better job of reproducing the similarities calculated using count-based fingerprints: 95% of calculated similarities are within 0.09 of the count-based value compared with 0.22 or 0.17 for torsions and atom-pairs previously. To get the old behavior, set nBitsPerEntry to 1. - Optional support has been added for the Avalon Toolkit (https://sourceforge.net/projects/avalontoolkit/) to provide an alternate smiles canonicalization, fingerprint, and 2D coordination generation algorithm. - The SLN support can now be switched off using the cmake variable RDK_BUILD_SLN_SUPPORT. - There are now instructions for building the RDKit and the SWIG wrappers in 64bit mode on windows. |
|
From: Greg L. <gre...@gm...> - 2011-12-03 03:57:26
|
Hi Gianluca, On Sat, Dec 3, 2011 at 12:35 AM, Gianluca Sforna <gi...@gm...> wrote: > The attached patch adds a "STATIC_LINK" flag to the PostgreSQL > cartridge build so it is possible to easily choose between > static/dynamic linking with no code changes. > > The current default (static linking) is unchanged; building with > dynamic link is achieved with "make STATIC_LINK=0" Thanks! I just checked in (a slightly modified version of) the patch. -greg |
|
From: Gianluca S. <gi...@gm...> - 2011-12-02 23:36:01
|
The attached patch adds a "STATIC_LINK" flag to the PostgreSQL cartridge build so it is possible to easily choose between static/dynamic linking with no code changes. The current default (static linking) is unchanged; building with dynamic link is achieved with "make STATIC_LINK=0" -- Gianluca Sforna http://morefedora.blogspot.com http://identi.ca/giallu - http://twitter.com/giallu |
|
From: Greg L. <gre...@gm...> - 2011-10-26 04:00:13
|
Adrian, On Tue, Oct 25, 2011 at 10:41 AM, Adrian Schreyer <am...@ca...> wrote: > > Great - did you do some benchmarks? I only tested it quickly on a > virtual machine. I ran some benchmarks last night and the results aren't uniformly encouraging. The attached file has my benchmarking script. The database I'm querying is 3.9 million vendor compounds that have been filtered to have reasonable properties. The ensure that the query always returns something, I pick a random set of molecules from the database to use as queries. Here's the output for the first queries on my machine, the "base" number is the timing for the standard "order by" query, the "nn" query uses the knn index, and "all" returns all the results (timing should be more or less the same as the first one). The number in parens is the number of rows returned. [21:37:50] INFO: base: 6.07 (5) nn: 3.39 (5) all: 6.08 (594) [21:38:09] INFO: base: 6.09 (5) nn: 6.20 (5) all: 6.08 (32) [21:38:27] INFO: base: 6.10 (5) nn: 6.07 (5) all: 6.15 (310) [21:38:45] INFO: base: 6.07 (5) nn: 6.11 (5) all: 6.08 (57) [21:39:02] INFO: base: 6.05 (5) nn: 4.83 (5) all: 6.08 (162) [21:39:17] INFO: base: 6.06 (5) nn: 2.61 (5) all: 6.09 (330) [21:39:35] INFO: base: 6.06 (5) nn: 6.16 (5) all: 6.06 (16) [21:39:54] INFO: base: 6.08 (5) nn: 6.17 (5) all: 6.11 (68) [21:40:12] INFO: base: 6.08 (5) nn: 5.71 (5) all: 6.09 (162) [21:40:30] INFO: base: 6.08 (5) nn: 6.17 (5) all: 6.18 (587) [21:40:48] INFO: base: 6.09 (5) nn: 6.21 (5) all: 6.08 (166) [21:41:07] INFO: base: 6.08 (5) nn: 6.16 (5) all: 6.10 (272) [21:41:25] INFO: base: 6.09 (5) nn: 6.06 (5) all: 6.07 (49) [21:41:43] INFO: base: 6.05 (5) nn: 6.17 (5) all: 6.11 (78) [21:42:02] INFO: base: 6.16 (5) nn: 6.12 (5) all: 6.25 (55) [21:42:17] INFO: base: 6.09 (5) nn: 2.48 (5) all: 6.12 (444) There are some cases where the knn index makes a big difference, but in general it's not huge. -greg |
|
From: Greg L. <gre...@gm...> - 2011-10-25 10:06:25
|
On Tue, Oct 25, 2011 at 10:41 AM, Adrian Schreyer <am...@ca...> wrote: > > On Tue, Oct 25, 2011 at 05:35, Greg Landrum <gre...@gm...> wrote: >> On Mon, Oct 24, 2011 at 6:12 PM, Adrian Schreyer <am...@ca...> wrote: >>> >>> Here is the changeset: >>> https://bitbucket.org/aschreyer/rdkit/changeset/8ae28b173a8a . >>> Unfortunately, the IDE I use stripped trailing white space therefore >>> the diffs might be a bit confusing. >> >> Those changes are in. > > Great - did you do some benchmarks? I only tested it quickly on a > virtual machine. Not yet. I added a quick test to the regression suite, but I haven't done any benchmarking yet. -greg |
|
From: Greg L. <gre...@gm...> - 2011-10-25 10:05:29
|
On Tue, Oct 25, 2011 at 11:11 AM, Adrian Schreyer <am...@ca...> wrote: > I pulled the changes from SVN and it builds and works fine - I had to > remove the following lines from the Makefile though in order to make > it work with 9.1: > > $(EXTENSION)--$(EXTVERSION).sql: $(EXTENSION).sql91.in > cp $< $@ > > EXTRA_CLEAN = $(EXTENSION)--$(EXTVERSION).sql Looks like I forgot to check in a file. I'll fix that tonight/tomorrow morning. -greg |
|
From: Adrian S. <am...@ca...> - 2011-10-25 09:11:40
|
I pulled the changes from SVN and it builds and works fine - I had to remove the following lines from the Makefile though in order to make it work with 9.1: $(EXTENSION)--$(EXTVERSION).sql: $(EXTENSION).sql91.in cp $< $@ EXTRA_CLEAN = $(EXTENSION)--$(EXTVERSION).sql Adrian On Tue, Oct 25, 2011 at 09:41, Adrian Schreyer <am...@ca...> wrote: > Hi Greg, > > On Tue, Oct 25, 2011 at 05:35, Greg Landrum <gre...@gm...> wrote: >> On Mon, Oct 24, 2011 at 6:12 PM, Adrian Schreyer <am...@ca...> wrote: >>> >>> Here is the changeset: >>> https://bitbucket.org/aschreyer/rdkit/changeset/8ae28b173a8a . >>> Unfortunately, the IDE I use stripped trailing white space therefore >>> the diffs might be a bit confusing. >> >> Those changes are in. > > Great - did you do some benchmarks? I only tested it quickly on a > virtual machine. > >>> There is still some work to be done ;) - I think some refactoring is >>> necessary because the bfp_distance function shares a lot of code with >>> the bfp_consistent one (I was a bit lazy and simply copied the code). >>> Also the whole thing has to be extended to the sfp and mol data types. >> >> I agree with the sfp data type, but I don't think I do with mol. How >> would you define distance there? > > Yes you are right it does not make sense to have KNN-GIST operators > for the mol data type since its operators return bools, and there is > no query scenario of course where the query planner would use > KNN-GIST. > > Adrian > >> -greg >> > |
|
From: Adrian S. <am...@ca...> - 2011-10-25 08:42:01
|
Hi Greg, On Tue, Oct 25, 2011 at 05:35, Greg Landrum <gre...@gm...> wrote: > On Mon, Oct 24, 2011 at 6:12 PM, Adrian Schreyer <am...@ca...> wrote: >> >> Here is the changeset: >> https://bitbucket.org/aschreyer/rdkit/changeset/8ae28b173a8a . >> Unfortunately, the IDE I use stripped trailing white space therefore >> the diffs might be a bit confusing. > > Those changes are in. Great - did you do some benchmarks? I only tested it quickly on a virtual machine. >> There is still some work to be done ;) - I think some refactoring is >> necessary because the bfp_distance function shares a lot of code with >> the bfp_consistent one (I was a bit lazy and simply copied the code). >> Also the whole thing has to be extended to the sfp and mol data types. > > I agree with the sfp data type, but I don't think I do with mol. How > would you define distance there? Yes you are right it does not make sense to have KNN-GIST operators for the mol data type since its operators return bools, and there is no query scenario of course where the query planner would use KNN-GIST. Adrian > -greg > |
|
From: Greg L. <gre...@gm...> - 2011-10-25 04:35:51
|
On Mon, Oct 24, 2011 at 6:12 PM, Adrian Schreyer <am...@ca...> wrote: > > Here is the changeset: > https://bitbucket.org/aschreyer/rdkit/changeset/8ae28b173a8a . > Unfortunately, the IDE I use stripped trailing white space therefore > the diffs might be a bit confusing. Those changes are in. > There is still some work to be done ;) - I think some refactoring is > necessary because the bfp_distance function shares a lot of code with > the bfp_consistent one (I was a bit lazy and simply copied the code). > Also the whole thing has to be extended to the sfp and mol data types. I agree with the sfp data type, but I don't think I do with mol. How would you define distance there? -greg |
|
From: Adrian S. <am...@ca...> - 2011-10-24 16:12:57
|
Hi Greg, Here is the changeset: https://bitbucket.org/aschreyer/rdkit/changeset/8ae28b173a8a . Unfortunately, the IDE I use stripped trailing white space therefore the diffs might be a bit confusing. There is still some work to be done ;) - I think some refactoring is necessary because the bfp_distance function shares a lot of code with the bfp_consistent one (I was a bit lazy and simply copied the code). Also the whole thing has to be extended to the sfp and mol data types. Adrian On Mon, Oct 24, 2011 at 16:58, Greg Landrum <gre...@gm...> wrote: > On Mon, Oct 24, 2011 at 5:28 PM, Adrian Schreyer <am...@ca...> wrote: >> Hi Greg, >> >> I just added KNN-GIST >> (http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/) >> ORDER BY operators for the bfp data type, <%> for Tanimoto-based >> queries and <#> for Dice. An example benchmark is here: >> https://gist.github.com/1309271 (with query plans). > > I'm both sad and thrilled that you beat me to this one. ;-) > >> Currently the performance increase is around 30-40% based on the >> operator thresholds. I'm sure there is much more possible but I don't >> fully understand the RDKit GIST implementation so I simply used parts >> of the consistency functions to make it work. Extending this to the >> other data types should be fairly simple as well. >> >> I am going to push the changes to my RDKit bitbucket repository later. > > Let me know when it's there and I'll go through it and then merge it. > > -greg > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Rdkit-devel mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > |
|
From: Greg L. <gre...@gm...> - 2011-10-24 15:59:08
|
On Mon, Oct 24, 2011 at 5:28 PM, Adrian Schreyer <am...@ca...> wrote: > Hi Greg, > > I just added KNN-GIST > (http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/) > ORDER BY operators for the bfp data type, <%> for Tanimoto-based > queries and <#> for Dice. An example benchmark is here: > https://gist.github.com/1309271 (with query plans). I'm both sad and thrilled that you beat me to this one. ;-) > Currently the performance increase is around 30-40% based on the > operator thresholds. I'm sure there is much more possible but I don't > fully understand the RDKit GIST implementation so I simply used parts > of the consistency functions to make it work. Extending this to the > other data types should be fairly simple as well. > > I am going to push the changes to my RDKit bitbucket repository later. Let me know when it's there and I'll go through it and then merge it. -greg |
|
From: Adrian S. <am...@ca...> - 2011-10-24 15:29:11
|
Hi Greg, I just added KNN-GIST (http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/) ORDER BY operators for the bfp data type, <%> for Tanimoto-based queries and <#> for Dice. An example benchmark is here: https://gist.github.com/1309271 (with query plans). Currently the performance increase is around 30-40% based on the operator thresholds. I'm sure there is much more possible but I don't fully understand the RDKit GIST implementation so I simply used parts of the consistency functions to make it work. Extending this to the other data types should be fairly simple as well. I am going to push the changes to my RDKit bitbucket repository later. Best Regards, Adrian |
|
From: Greg L. <gre...@gm...> - 2011-10-24 05:26:50
|
On Sun, Oct 23, 2011 at 7:43 PM, Adrian Schreyer <am...@ca...> wrote: > > SET search_path = 'public' is not required in 9.1 and will throw an > error if I remember correctly - the schema can be set at any time with > ALTER EXTENSION SET SCHEMA = <schema> if the relocatable = true flag > is set in the extension control file. Fixed. > It is possible to have only one Makefile for 9.1 and <9.1. The only > thing that is necessary is to have a conditional inside the Makefile > to change the DATA variable to the rdkit--3.1.sql file. Here is an > example from the pair extension Makefile > (http://api.pgxn.org/src/pair/pair-0.1.3/Makefile): > > PG91 = $(shell $(PG_CONFIG) --version | grep -qE " 8\.| 9\.0" && echo > no || echo yes) > > ifeq ($(PG91),yes) > all: sql/$(EXTENSION)--$(EXTVERSION).sql > > sql/$(EXTENSION)--$(EXTVERSION).sql: sql/$(EXTENSION).sql > cp $< $@ > > DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql > EXTRA_CLEAN = sql/$(EXTENSION)--$(EXTVERSION).sql > endif > I'll take a look at this over the next day or so. > 9.1 Also introduces the ability to upgrade extensions in place with > ALTER EXTENSION UPDATE (using update scripts) without dropping all > data types and functions. This is especially useful I think if the > data types haven't changed and only new functions were introduced, > avoiding the need to recreate all molecules, fingerprints and indexes > again. This might be something for a future version. That's a good one for the wiki/docs/ > I have added the -march=native flag to my Makefile, I'm not sure if > this would cause problems for people creating packages for RDKit. Have > you tried this flag for compiling RDKit itself? I would be curious > about the performance gains, if any. Using the timings script ($RDBASE/Regress/Scripts/timings.py) I don't see a measurable difference on my standard build/test machine. That's a crude test, the run times are too short, but it's at least an indicator that the differences aren't large. -greg |
|
From: Adrian S. <am...@ca...> - 2011-10-23 17:44:04
|
Hi Greg, Just a couple of comments: SET search_path = 'public' is not required in 9.1 and will throw an error if I remember correctly - the schema can be set at any time with ALTER EXTENSION SET SCHEMA = <schema> if the relocatable = true flag is set in the extension control file. It is possible to have only one Makefile for 9.1 and <9.1. The only thing that is necessary is to have a conditional inside the Makefile to change the DATA variable to the rdkit--3.1.sql file. Here is an example from the pair extension Makefile (http://api.pgxn.org/src/pair/pair-0.1.3/Makefile): PG91 = $(shell $(PG_CONFIG) --version | grep -qE " 8\.| 9\.0" && echo no || echo yes) ifeq ($(PG91),yes) all: sql/$(EXTENSION)--$(EXTVERSION).sql sql/$(EXTENSION)--$(EXTVERSION).sql: sql/$(EXTENSION).sql cp $< $@ DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql EXTRA_CLEAN = sql/$(EXTENSION)--$(EXTVERSION).sql endif 9.1 Also introduces the ability to upgrade extensions in place with ALTER EXTENSION UPDATE (using update scripts) without dropping all data types and functions. This is especially useful I think if the data types haven't changed and only new functions were introduced, avoiding the need to recreate all molecules, fingerprints and indexes again. This might be something for a future version. I have added the -march=native flag to my Makefile, I'm not sure if this would cause problems for people creating packages for RDKit. Have you tried this flag for compiling RDKit itself? I would be curious about the performance gains, if any. Best Regards, Adrian On Sun, Oct 23, 2011 at 15:28, Greg Landrum <gre...@gm...> wrote: > Adrian, > > On Wed, Oct 12, 2011 at 2:30 PM, Adrian Schreyer <am...@ca...> wrote: >> >> I made some small changes to the database cartridge code in order to >> compile it under 9.1 and also to use the new extension infrastructure. >> The exact changes are probably the easiest to see in my bitbucket >> repository changesets >> (https://bitbucket.org/aschreyer/rdkit/changesets). It's currently >> running on our 9.1 server without any problems. > > I just merged your changes in along with some tweaks to allow the > cartridge to still build under older versions of postgresql. Thanks > for figuring out how to do this! > > The build instructions have changed, the new version is here: > http://code.google.com/p/rdkit/wiki/BuildingTheCartridge > > Best Regards, > -greg > |