From: SourceForge.net <no...@so...> - 2011-02-24 15:12:21
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Settings changed) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x >Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |