The current algorithm for enumerating Bemis-Murcko fragments in GenerateFragment.java works well when the ring fragments of a molecule are linearly arranged, but misses some fragments when a molecule is 'branched'. For example, the list of murcko fragments for the 'four-rings.mol' (in the attached gzipped-tar archive) does not include one of all four rings. A possible alternative is to enumerate the fragments recursively ('GenerateFragments.java.patch' in the attached archive). Instead of directly generating murcko fragments, the double loop f and g in the original code (beginning at lines 157 and 162 respectively) are used to generate relevant data on linkers that are composed of path atoms only. The information on no-ring linkers are then passed to the newly added method recursiveFragmentAdd().
a test case and a possible patch for recursive murcko fragments
Christian, can you please have a look at the recent bug report (with unit test) and this patch?
I took a rather high level look and the patch seems to make sense. However I can't apply the patch (via patch) so can't run it. Also a unit test should be provided
Hi Rajarshi - Just re-checked the patch file: it was generated against GenerateFragement.java in 1.2.1, but ran against that in 1.2.3 (with the one extra line properly offset by patch). I am using patch 2.5.9 (on a Debian-based GNU/Linux). Could you describe how patch fails for you, so I could perhaps re-make a '.patch' file that will work? Or should I just attach a patched java file? I will also try to work up a unit test. Thanks, Boryeu
patch 2.5.8 (OS X Leopard)
I~/src/java/cdk (uit)$ patch -p3 < ~/Downloads/GenerateFragments.java.patch
missing header for context diff at line 3 of patch
can't find file to patch at input line 3
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
I see - 'patch -p3 < (patchfile)' tries to deduce the name of file to patch from the info in the patchfile, and the file from which I made the patchfile (and its name encoded into the patchfile) is in a non-std location , thus the failure. SO this is likely not due to the version of the patch command. The best way would be to apply the patch explicitly:
$ (cd to the directory where GenerateFragments.java is located)
$ patch GenerateFragments.java < ~/Downloads/GenerateFragments.java.patch
patching file GenerateFragments.java
Hunk #1 succeeded at 125 (offset 1 line).
Hunk #2 succeeded at 202 (offset 1 line).
Hunk #3 succeeded at 265 (offset 1 line).
Hunk #4 succeeded at 298 (offset 1 line).
Hunk #5 succeeded at 420 (offset 1 line).
Hunk #6 succeeded at 434 (offset 1 line).
$
(The above output came from patch command applying the patchfile to GenerateFragments.java in version 1.2.3) (For version 1.2.1, there would be no offsets)
unit tests for 'GenerateFragments.java-patch'
Unit test Test18 for GenerateFragements.java is enclosed in the tgz file 'GenerateFragments-recursive-unitTest.tgz', intended for the recursive enumeration routine. A second test (Test19) in the tgz file is intended for the fix of Tracker:Bugs ID 2871449; the fix is a part of 'GenerateFragments.java.patch'.
Attached a git patch against CDK master.
I am not very eager to say if this patch is good or not. Firstly, with the current code base I have several failing unit tests; the patch increases this number.
Some comments for improvement:
CLosing this since a new Murcko fragment class is being written which should be cleaner and more maintanable