Patch that introduced signature calculation for the CDK. The patch is developed by Gilliean, with some patched here and there from me to address CDK stability issues. Aimed at CDK master. The patch is mostly wrapping around the code from in the signature.jar from Gilleain's signature project:
http://github.com/gilleain/signatures
With some discussion here:
http://gilleain.blogspot.com/2009/06/more-chemical-signature-example.html
Reviewing most welcome. The patch is available from the 155-signatures branch at my account:
http://github.com/egonw/cdk/tree/155-signatures
You need commit d01f1e8bab6c91daa6e9.
The Nightly reports are here:
http://pele.farmbio.uu.se/nightly-signatures/
All is clean (fully unit tests, OK JavaDoc, mostly OK on PMD), except for some false positives from PDM.
Reviewing of the API is most important, but any comment is, as always, welcome.
Looks good. A few issues.
1) Is there a reason for the signature code to be in a separate jar file? What is in that file (the signature-SNAPSHOT)?
2) For AtomSignature, it'd be useful to have a constructor that took an IAtom as a first arhument
3) Does getConnected() In AtomSignature have to be public? If so, this is a rather general function and could be made prt of IAtomContainer. I'd rather see it private or protected
4) In MolecularSignature is it useful to have getOrbits as public? Is an Orbit object useful on its own? Could it be made protected? Similarly for the Orbit class itself. Can that be made protected?
(My aim in the last two points is to keep the API simple and avoid extraneous mehtods/classes
1) The signature implementation tries to be toolkit-independant, and is a separate project.
2) Agreed. Will add.
3) It doesn't, and is a good idea not to show it, as it is only intended to be used by the base class.
4) Yes, it is definitely useful. It's a bit of a nasty design that an 'Orbit' is essentially the same as a 'SymmetryClass' as both are just a list of atom (indices) and a string label. Roughly speaking, all atoms in the same orbit are equivalent - they have the same signature.
One annoying aspect of the design is that there are useful methods in the abstract base classes - especially AbstractGraphSignature - that may not be clear to CDK users. I don't know how the javadocs or other documentation for the signature library code can be smoothly included in the CDK.
Changes committed and pushed to the 155-signatures branch on my github-cdk.
Pushed to my 155-signatures branch, and added two commits extra:
Rajarshi, are your questions sufficiently answered?
I pulled the latest code (using Egon repo as the remote) and I don't see the AtomSignature constructor that takes an IAtom.
Regarding docs for the AbstractGraphSignature - would the @inherits Javadoc tag be of use here?
Ah, the reason you didn't see them is that I forgot to implement them, sorry about that. Now commited/pushed to my branch.
The inherits javadoc tag indeed works - but only so long as the javadoc tool is provided a source directory (not jarfile) for the classes that the inheriting is being done from.
This can be done by unpacking the source of the signature project into a temp directory, but then they appear in the final javadocs. Apart from the unpalatable hack of cutting them back out of the generated HTML, I don't see how to do this cleanly...
Rajarshi, can you have a look at Gilleain's latest replies, and let me know if there are things left to be done?
Egon, I think its all OK now. I think that patch should not be held back on the Jaavdoc issue - I think Gillean is right in that its a little hacky, but is really part of the Javadoc build process, so a separate feature request could be put in for that.
Otherwise good to go
OK, thanx. I will file a separate bug report about @inheritDoc and JavaDoc in jars... I have already asked on #classpath with the open source Java wizards, but might try StackExchange...
I will apply the patches to master then...
Applied to master.
Gilleain, thanx for this really interesint patch! Rajarshi, thanx for your speedy reviewing!