From: SourceForge.net <no...@so...> - 2011-02-16 10:15:08
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Tracker Item Submitted) made by mark_rynbeek You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-16 10:18:15
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: Mark R. <ma...@eb...> - 2011-02-16 10:41:06
|
> Initial Comment: > > This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) > The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. > > > > ---------------------------------------------------------------------- > >> Comment By: Egon Willighagen (egonw) > Date: 2011-02-16 11:18 > > Message: > Cool! I hope so 8-P I haven't assigned it to anyone, I'll check with Chris if he wants to have a look perhaps. It's an interesting paper, but I find it hard to assess if the output is chemically complete. There are also limitations to the class' interpretation of more exotic InChIs and their connection table. Anyway - I made it because I can use it for Orchem, as a search option (to include tautomers for a substructure search) M |
From: Egon W. <ego...@gm...> - 2011-02-16 10:45:01
|
On Wed, Feb 16, 2011 at 10:40 AM, Mark Rijnbeek <ma...@eb...> wrote: >> This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) >>> Comment By: Egon Willighagen (egonw) >> Cool! > I hope so 8-P Well, I have a use case! This is so going to be a chapter in the book :) > I haven't assigned it to anyone, I'll check with Chris if he wants to have a look perhaps. Just noting that he cannot be the sole reviewer, being in the same lab. But I will look at it asap, but not before this weekend. > It's an interesting paper, but I find it hard to assess if the output is > chemically complete. The InChI heuristics are not. Check warfarin: it should find some 40 tautomers. > There are also limitations to the class' > interpretation of more exotic InChIs and their connection table. > Anyway - I made it because I can use it for Orchem, as a search option > (to include tautomers for a substructure search) Now we only need to see how to get the CDK InChI wrapper to create InChIs with the mobile-hydrogen layer turned of, adding the "/f" fixed-hydrogen layer... Have you looked at that yet? (I haven't looked at your patch yet) Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: SourceForge.net <no...@so...> - 2011-02-19 14:23:32
|
Patches item #3183552, was opened at 2011-02-16 05:15 Message generated for change (Comment added) made by rajarshi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 09:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 05:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 11:51:00
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Comment added) made by mark_rynbeek You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 11:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 14:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 10:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 11:58:07
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Comment added) made by gilleain You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 11:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 11:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 14:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 10:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 12:10:19
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Comment added) made by mark_rynbeek You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 11:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 11:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 14:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 10:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 12:40:09
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Comment added) made by gilleain You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Nobody/Anonymous (nobody) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 11:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 11:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 14:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 10:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 13:55:50
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) >Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 14:05:09
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 14:19:56
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 14:49:41
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:01:52
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:02:34
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:12:11
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:12:21
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Settings changed) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x >Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:14:29
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:15:20
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 15:41:35
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Comment added) made by gilleain You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 15:41 Message: Uhm. Just to be clear, I was not suggesting factoring out the inchi parsing code. Eventually, if (and when) the inchi parsing is done better by someone, then it would be worthwhile. Also, I don't see such a problem with long lines or short variable names in test cases, as tests are never very readable. Especially considering the nice factoring out (as done here) of a method to run common parts of multiple tests which actually reduces the amount of test code. Finally, SMSD is considered a better choice than UIT isn't it? I guess I would say that, since I'm working with Asad, but I swear it's an honest opinion :) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 13:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 11:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 11:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 14:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 10:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 17:07:27
|
Patches item #3183552, was opened at 2011-02-16 10:15 Message generated for change (Comment added) made by mark_rynbeek You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 17:07 Message: Thanks, I have followed up on most comments. I removed the sent patches as I have incorporated them into one patch (squashed), now attached. I have kept SMSD as I can't see why UIT would be a better choice. Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers. Lastly I can't find anywhere how to run the ojdcheck.jar, could you point me to a web page Egon? ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 15:41 Message: Uhm. Just to be clear, I was not suggesting factoring out the inchi parsing code. Eventually, if (and when) the inchi parsing is done better by someone, then it would be worthwhile. Also, I don't see such a problem with long lines or short variable names in test cases, as tests are never very readable. Especially considering the nice factoring out (as done here) of a method to run common parts of multiple tests which actually reduces the amount of test code. Finally, SMSD is considered a better choice than UIT isn't it? I guess I would say that, since I'm working with Asad, but I swear it's an honest opinion :) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 13:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 11:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 11:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 14:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 10:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 18:01:58
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 19:01 Message: "I removed the sent patches as I have incorporated them into one patch (squashed), now attached." Excellent. I'll look at it tonight! "I have kept SMSD as I can't see why UIT would be a better choice." It would reduce the dependencies. "Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers." Yeah, I should have added some context there. It's not important for this patch, but we need to go there at some point. I have a long standing wish to use information from the auxilary layer... "ojdcheck.jar" See the javadoc.xml. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 18:07 Message: Thanks, I have followed up on most comments. I removed the sent patches as I have incorporated them into one patch (squashed), now attached. I have kept SMSD as I can't see why UIT would be a better choice. Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers. Lastly I can't find anywhere how to run the ojdcheck.jar, could you point me to a web page Egon? ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 16:41 Message: Uhm. Just to be clear, I was not suggesting factoring out the inchi parsing code. Eventually, if (and when) the inchi parsing is done better by someone, then it would be worthwhile. Also, I don't see such a problem with long lines or short variable names in test cases, as tests are never very readable. Especially considering the nice factoring out (as done here) of a method to run common parts of multiple tests which actually reduces the amount of test code. Finally, SMSD is considered a better choice than UIT isn't it? I guess I would say that, since I'm working with Asad, but I swear it's an honest opinion :) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 19:11:54
|
Patches item #3183552, was opened at 2011-02-16 05:15 Message generated for change (Comment added) made by rajarshi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-24 14:11 Message: I agree about SMSD - I'd like to see all new code requiring isomorphism use SMSD over UIT. (And I need to get the SMARTS tool switched over) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 13:01 Message: "I removed the sent patches as I have incorporated them into one patch (squashed), now attached." Excellent. I'll look at it tonight! "I have kept SMSD as I can't see why UIT would be a better choice." It would reduce the dependencies. "Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers." Yeah, I should have added some context there. It's not important for this patch, but we need to go there at some point. I have a long standing wish to use information from the auxilary layer... "ojdcheck.jar" See the javadoc.xml. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:07 Message: Thanks, I have followed up on most comments. I removed the sent patches as I have incorporated them into one patch (squashed), now attached. I have kept SMSD as I can't see why UIT would be a better choice. Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers. Lastly I can't find anywhere how to run the ojdcheck.jar, could you point me to a web page Egon? ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 10:41 Message: Uhm. Just to be clear, I was not suggesting factoring out the inchi parsing code. Eventually, if (and when) the inchi parsing is done better by someone, then it would be worthwhile. Also, I don't see such a problem with long lines or short variable names in test cases, as tests are never very readable. Especially considering the nice factoring out (as done here) of a method to run common parts of multiple tests which actually reduces the amount of test code. Finally, SMSD is considered a better choice than UIT isn't it? I guess I would say that, since I'm working with Asad, but I swear it's an honest opinion :) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 09:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 09:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 09:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 08:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 07:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 07:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 06:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 06:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 09:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 05:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-24 19:37:40
|
Patches item #3183552, was opened at 2011-02-16 11:15 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x >Group: Accepted Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-02-24 20:37 Message: I'm good. I'm fine with SMSD. I don't know how much faster it is. I have heard people comment on bugs in the SMSD code, and asked them to file bug reports, which I hope will show up soon. But obviously, until those have been filed and confirmed, it's absolutely no argument. I am still expecting that a quick count of double bonded oxygens, nitrogens, and possibly carbons and comparing two structures like that before UIT will speed up things, but will try to confirm that with statistics later :) ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-24 20:11 Message: I agree about SMSD - I'd like to see all new code requiring isomorphism use SMSD over UIT. (And I need to get the SMARTS tool switched over) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 19:01 Message: "I removed the sent patches as I have incorporated them into one patch (squashed), now attached." Excellent. I'll look at it tonight! "I have kept SMSD as I can't see why UIT would be a better choice." It would reduce the dependencies. "Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers." Yeah, I should have added some context there. It's not important for this patch, but we need to go there at some point. I have a long standing wish to use information from the auxilary layer... "ojdcheck.jar" See the javadoc.xml. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 18:07 Message: Thanks, I have followed up on most comments. I removed the sent patches as I have incorporated them into one patch (squashed), now attached. I have kept SMSD as I can't see why UIT would be a better choice. Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers. Lastly I can't find anywhere how to run the ojdcheck.jar, could you point me to a web page Egon? ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 16:41 Message: Uhm. Just to be clear, I was not suggesting factoring out the inchi parsing code. Eventually, if (and when) the inchi parsing is done better by someone, then it would be worthwhile. Also, I don't see such a problem with long lines or short variable names in test cases, as tests are never very readable. Especially considering the nice factoring out (as done here) of a method to run common parts of multiple tests which actually reduces the amount of test code. Finally, SMSD is considered a better choice than UIT isn't it? I guess I would say that, since I'm working with Asad, but I swear it's an honest opinion :) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 16:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 15:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 13:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 13:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 12:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 15:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 11:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |
From: SourceForge.net <no...@so...> - 2011-02-25 01:59:06
|
Patches item #3183552, was opened at 2011-02-16 05:15 Message generated for change (Settings changed) made by rajarshi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Accepted >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Mark Rijnbeek (mark_rynbeek) Assigned to: Egon Willighagen (egonw) Summary: Tautomer generation based on InChI Initial Comment: This patch is for a class to generate tautomers, based on a paper which describes an InChI based algorithm (see: http://pubs.acs.org/doi/abs/10.1021/ci1001179) The patch contains a class to generate tautomers, a test class and some minor changes to make the ant build/test work. I put the new class in its own module 'tautomer'. ---------------------------------------------------------------------- >Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-24 20:59 Message: Applied and pushed ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 14:37 Message: I'm good. I'm fine with SMSD. I don't know how much faster it is. I have heard people comment on bugs in the SMSD code, and asked them to file bug reports, which I hope will show up soon. But obviously, until those have been filed and confirmed, it's absolutely no argument. I am still expecting that a quick count of double bonded oxygens, nitrogens, and possibly carbons and comparing two structures like that before UIT will speed up things, but will try to confirm that with statistics later :) ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-24 14:11 Message: I agree about SMSD - I'd like to see all new code requiring isomorphism use SMSD over UIT. (And I need to get the SMARTS tool switched over) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 13:01 Message: "I removed the sent patches as I have incorporated them into one patch (squashed), now attached." Excellent. I'll look at it tonight! "I have kept SMSD as I can't see why UIT would be a better choice." It would reduce the dependencies. "Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers." Yeah, I should have added some context there. It's not important for this patch, but we need to go there at some point. I have a long standing wish to use information from the auxilary layer... "ojdcheck.jar" See the javadoc.xml. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 12:07 Message: Thanks, I have followed up on most comments. I removed the sent patches as I have incorporated them into one patch (squashed), now attached. I have kept SMSD as I can't see why UIT would be a better choice. Also, the InChI 'parsing' is not worth going into its own class - it's a very limited parse of particular aspects of the InChI that are relevant for the tautomers. Lastly I can't find anywhere how to run the ojdcheck.jar, could you point me to a web page Egon? ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 10:41 Message: Uhm. Just to be clear, I was not suggesting factoring out the inchi parsing code. Eventually, if (and when) the inchi parsing is done better by someone, then it would be worthwhile. Also, I don't see such a problem with long lines or short variable names in test cases, as tests are never very readable. Especially considering the nice factoring out (as done here) of a method to run common parts of multiple tests which actually reduces the amount of test code. Finally, SMSD is considered a better choice than UIT isn't it? I guess I would say that, since I'm working with Asad, but I swear it's an honest opinion :) ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:15 Message: Ok, and as a last comment then, the class seems to have code to parse InChIs too, which is what Gilleain was referring to, I guess. I think it would indeed make sense to have that in a separate class. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:14 Message: BTW, I am wondering if the SMSD dependency is really needed... what's wrong with using the UIT? If performance is an issue, I suggest to first scan the two molecules for double/triple bond patters, counting the number of carbons, oxygens, nitrogens with double bonds, which should remove most of the isomorphism checking. That is something that might be interesting anyway, also in combination with SMSD. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:12 Message: Mark, please review my patches, and have a look at the below comments. They're mostly small things, but the threading issue should be resolved. Please rewrite the code to not use global variables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:02 Message: Otherwise, please run PMD and the JavaDoc checkers on your code to find more comments. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 10:01 Message: Is there a need for removeDuplicates() to be public? If so, can you please add a unit test? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 09:49 Message: logger calls can best use concatenation with commas to not concatenate Strings by default. So, instead of: logger.debug("Mapped InChI "+inchiAtom.getSymbol()+" "+inchiAtom.getID()+" to "+molAtom.getSymbol()+" "+molAtom.getID()); you can better use: logger.debug( "Mapped InChI ", inchiAtom.getSymbol(), " ", inchiAtom.getID(), " to ", molAtom.getSymbol(), " ", molAtom.getID() ); which will make the code speed up when debugging is turned off. See: http://chem-bla-ics.blogspot.com/2011/02/cleaner-cdk-code-9-using-loggingtool.html ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 09:19 Message: Mark, do you think you can rewrite the code to be thread-safe? ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 09:05 Message: Some quick comments on InChITautomerGeneratorTest: * lines are longer then the CDK 120 chars standard * please use readable variable names (tautomerGenerator rather than tc) and at least of 3 chars long * the method writeToSDF is not used in the class * there is a @cdk.inchi tag which can be used to tie methods and unit tests to InChIs ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-24 08:55 Message: It has seen two reviews already... doing some checking too. The patches apply cleanly to cdk-1.4.x, so I suggest we aim for that branch. It's new functionality, so no changed API, and as such not unsuitable to the frozen cdk-1.4.x either. Mark, please check Gilleain's second patch (the speed up)... I will also ask you to review some patches I will cook up right now... ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 07:40 Message: Done. The second one is really just a suggestion, as I am not 100% sure that it speeds things up - but it does avoid running through the list of duplicates at the end. Otherwise, I think the class looks good. I guess it shows how nice it would be to have a full InChI parser, that provides all the stuff like mobile hydrogens, etc. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 07:10 Message: Thanks Gilleain. Could you update your patches so that they work in case they're outdated? By the way, Egon mentioned warfarin earlier. I have added it as a unit test, you get six tautomers. The mobile hydrogens for warfarin are set only if you set flag KET for the InChI generation (https://sourceforge.net/mailarchive/forum.php?forum_name=inchi-discuss) ---------------------------------------------------------------------- Comment By: gilleain maclean torrance (gilleain) Date: 2011-02-24 06:58 Message: Ah. I just made some patches for a couple of possible changes. They're probably outdated, but they are : a) Narrowing the thrown Exception to CDKException/CloneNotSupported b) Slightly faster duplicate removal code I'll attach the patches anyway, to show what I mean. ---------------------------------------------------------------------- Comment By: Mark Rijnbeek (mark_rynbeek) Date: 2011-02-24 06:51 Message: Thanks, I removed the IMoleculeSet (which seemed an approriate class, is it 'deprecated'?) and replaced it with List<IAtomContainer>. The IAtomContainer is more practical than IMolecule, as I pass them into SMSD which works on atom containers. I also added more test cases. ---------------------------------------------------------------------- Comment By: Rajarshi Guha (rajarshi) Date: 2011-02-19 09:23 Message: Looks good. My only comment is that I'd rather see List<IMolecule> used as the return type than IMoleculeSet in getTautomers() and elsewhere. Otherwise I think it can be applied ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-02-16 05:18 Message: Cool! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3183552&group_id=20024 |