From: gilleain t. <gil...@gm...> - 2011-06-04 16:14:43
|
Hi, So in terms of the java, here are some suggestions : 1) Make a class that is responsible for the whole Ring finding activity (BarkerRingFinder ? :) and put it in a .java file with the same name. 2) Test this class from other classes - eg, Junit tests, or just other classes with main methods. 3) A lot of the methods have comments just under the signature, if they are method comments, they can be clearer on top. 4) reduceMolecule and stripTerminalAtoms seem to be the public methods, so make these public and the rest private. Can they be combined into one method that calls both? 5) Move the debug methods to a separate class, or maybe use logging. That's all minor (boring!) stuff, so also I would say that I ran the code on the first 1000 NCIOPEN_SMI structures, and looked at a few that 'failed' - that is, the number of cycles did not equal the Cauchy number. Three of them are : c14c([s]c(n1)c2ccccc2)cc3ccccc3c4 (fused system connected to a benzene) CC23C(C1=C(CCCC1)CC2)CCCC3 (three fused cyclohexanes) CC12C(C)(C)C(C(=O)C1=O)CC2 (bridged cyclohexane) the third one is similar to the structure Andrew linked to in the OEChem document, except with less symmetry. It might be a good idea to make some of the failing structures as test cases, so that if you make changes to the algorithm, they are 'on hand' to be checked. gilleain On 6/4/11, Ed Barker <mre...@ya...> wrote: > OK > > The code and a set of notes describing the idea are on github. > > https://github.com/edbarker/cdk-reduced-edge-graph-sssr > > The code is not pretty, I apologise - I'm not a coder. I had to learn java > and cdk in order to write this so not fully aware of what I can do with > either. The key thing is that it works and I have tested it quite > extensively on a large dataset. The dataset I use is a set of SMILES from > the NCI - 250251 compounds in all. I had to change this by hand to stop it > from crashing when it came across heavy elements like Th and Sm - there is > probably an easy solution to this. So it will crash unless you doctor your > dataset as I had to do. > > Any queries then please don't hesitate to email me. Any ideas or suggestions > welcome. > > Thanks > Ed. |