From: SourceForge.net <no...@so...> - 2012-08-29 01:51:47
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Tracker Item Submitted) made by asadrahman You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-29 11:46:17
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by asadrahman You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- >Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the laste patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-29 11:46:51
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by asadrahman You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the laste patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 10:21:49
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 10:31:33
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 10:55:38
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 14:40:53
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by asadrahman You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- >Comment By: Asad (asadrahman) Date: 2012-08-30 07:40 Message: Hi John, Thanks for the detailed review. Here is the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/SP_fingerprinter_updated_patch_with_reviewers_recommendations.patch a) I have made changes as recommended by you (most of therm). b) Let me know if it works at your end. c)I I will check the failed case once you are done with this patch. Just to clarify. Should the fingerprint report that they are subsets? Many thanks Asad ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 15:18:53
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:18 Message: Just checking, would it be possible to remove netbeans project files in nbproject/ ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-30 07:40 Message: Hi John, Thanks for the detailed review. Here is the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/SP_fingerprinter_updated_patch_with_reviewers_recommendations.patch a) I have made changes as recommended by you (most of therm). b) Let me know if it works at your end. c)I I will check the failed case once you are done with this patch. Just to clarify. Should the fingerprint report that they are subsets? Many thanks Asad ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 15:51:47
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:51 Message: Okay looks good. As you've said the failed tests case seem to be due to the new fingerprinter accounting for stereo parity in which case the entire substructure doesn't match. Am happy to sign this off. Some niggles but these could be fixed later: I think the hashing would work just as good if you use the raw integer values. If you look at the hashCode() for an Integer it is.: public int hashCode() { return value; } As the hash isn't anything special you could convert: for (Iterator<IAtomContainer> it = sssr.atomContainers().iterator(); it.hasNext();) { IAtomContainer ring = it.next(); int toHashCode = String.valueOf(ring.getAtomCount()).hashCode(); paths.add(patternIndex, toHashCode); patternIndex++; } into (see below for iterator removal): for (IAtomContainer ring : sssr.atomContainers()) { paths.add(patternIndex++, ring.getAtomCount()); } Say the atom count is 10. The hash code of "10".hashCode() is not 10 but it will always be the same as "10.hashCode()". You can therefore just use the integer 10. This could be the same for all numeric/enumerations. What really matters with the hash coding is that you combined correctly. public int compare(IAtom o1, IAtom o2) { if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } As the method needs an IAtom (which is also an IChemObject) then the method will throw a class cast exception even without the if statement. Explicitly adding this means the internal byte code looks like this (not really but you ge the gist): if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } If this is just to check for nulls then != null is a lot faster. I myself use instanceof too much, see why it's bad here: http://www.javapractices.com/topic/TopicAction.do?Id=31 For iterator can be replaced with foreach for (Iterator<IAtom> it = container.atoms().iterator(); it.hasNext();) { IAtom atom = it.next(); } is exactly the same as (but more code) for(IAtom atom : container.atoms()){ .... } for(IAtomContainer atom : sssr.containers){ .... } ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:18 Message: Just checking, would it be possible to remove netbeans project files in nbproject/ ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-30 07:40 Message: Hi John, Thanks for the detailed review. Here is the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/SP_fingerprinter_updated_patch_with_reviewers_recommendations.patch a) I have made changes as recommended by you (most of therm). b) Let me know if it works at your end. c)I I will check the failed case once you are done with this patch. Just to clarify. Should the fingerprint report that they are subsets? Many thanks Asad ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 16:32:49
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 09:32 Message: Signed off. Pull Request: https://github.com/egonw/cdk/pull/18 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:51 Message: Okay looks good. As you've said the failed tests case seem to be due to the new fingerprinter accounting for stereo parity in which case the entire substructure doesn't match. Am happy to sign this off. Some niggles but these could be fixed later: I think the hashing would work just as good if you use the raw integer values. If you look at the hashCode() for an Integer it is.: public int hashCode() { return value; } As the hash isn't anything special you could convert: for (Iterator<IAtomContainer> it = sssr.atomContainers().iterator(); it.hasNext();) { IAtomContainer ring = it.next(); int toHashCode = String.valueOf(ring.getAtomCount()).hashCode(); paths.add(patternIndex, toHashCode); patternIndex++; } into (see below for iterator removal): for (IAtomContainer ring : sssr.atomContainers()) { paths.add(patternIndex++, ring.getAtomCount()); } Say the atom count is 10. The hash code of "10".hashCode() is not 10 but it will always be the same as "10.hashCode()". You can therefore just use the integer 10. This could be the same for all numeric/enumerations. What really matters with the hash coding is that you combined correctly. public int compare(IAtom o1, IAtom o2) { if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } As the method needs an IAtom (which is also an IChemObject) then the method will throw a class cast exception even without the if statement. Explicitly adding this means the internal byte code looks like this (not really but you ge the gist): if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } If this is just to check for nulls then != null is a lot faster. I myself use instanceof too much, see why it's bad here: http://www.javapractices.com/topic/TopicAction.do?Id=31 For iterator can be replaced with foreach for (Iterator<IAtom> it = container.atoms().iterator(); it.hasNext();) { IAtom atom = it.next(); } is exactly the same as (but more code) for(IAtom atom : container.atoms()){ .... } for(IAtomContainer atom : sssr.containers){ .... } ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:18 Message: Just checking, would it be possible to remove netbeans project files in nbproject/ ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-30 07:40 Message: Hi John, Thanks for the detailed review. Here is the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/SP_fingerprinter_updated_patch_with_reviewers_recommendations.patch a) I have made changes as recommended by you (most of therm). b) Let me know if it works at your end. c)I I will check the failed case once you are done with this patch. Just to clarify. Should the fingerprint report that they are subsets? Many thanks Asad ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-08-30 16:33:21
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Settings changed) made by jwmay You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master Group: None Status: Open >Resolution: Accepted Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 09:32 Message: Signed off. Pull Request: https://github.com/egonw/cdk/pull/18 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:51 Message: Okay looks good. As you've said the failed tests case seem to be due to the new fingerprinter accounting for stereo parity in which case the entire substructure doesn't match. Am happy to sign this off. Some niggles but these could be fixed later: I think the hashing would work just as good if you use the raw integer values. If you look at the hashCode() for an Integer it is.: public int hashCode() { return value; } As the hash isn't anything special you could convert: for (Iterator<IAtomContainer> it = sssr.atomContainers().iterator(); it.hasNext();) { IAtomContainer ring = it.next(); int toHashCode = String.valueOf(ring.getAtomCount()).hashCode(); paths.add(patternIndex, toHashCode); patternIndex++; } into (see below for iterator removal): for (IAtomContainer ring : sssr.atomContainers()) { paths.add(patternIndex++, ring.getAtomCount()); } Say the atom count is 10. The hash code of "10".hashCode() is not 10 but it will always be the same as "10.hashCode()". You can therefore just use the integer 10. This could be the same for all numeric/enumerations. What really matters with the hash coding is that you combined correctly. public int compare(IAtom o1, IAtom o2) { if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } As the method needs an IAtom (which is also an IChemObject) then the method will throw a class cast exception even without the if statement. Explicitly adding this means the internal byte code looks like this (not really but you ge the gist): if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } If this is just to check for nulls then != null is a lot faster. I myself use instanceof too much, see why it's bad here: http://www.javapractices.com/topic/TopicAction.do?Id=31 For iterator can be replaced with foreach for (Iterator<IAtom> it = container.atoms().iterator(); it.hasNext();) { IAtom atom = it.next(); } is exactly the same as (but more code) for(IAtom atom : container.atoms()){ .... } for(IAtomContainer atom : sssr.containers){ .... } ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:18 Message: Just checking, would it be possible to remove netbeans project files in nbproject/ ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-30 07:40 Message: Hi John, Thanks for the detailed review. Here is the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/SP_fingerprinter_updated_patch_with_reviewers_recommendations.patch a) I have made changes as recommended by you (most of therm). b) Let me know if it works at your end. c)I I will check the failed case once you are done with this patch. Just to clarify. Should the fingerprint report that they are subsets? Many thanks Asad ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |
From: SourceForge.net <no...@so...> - 2012-10-01 14:16:23
|
Patches item #3562570, was opened at 2012-08-28 18:51 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: master >Group: Accepted >Status: Closed Resolution: Accepted Priority: 5 Private: No Submitted By: Asad (asadrahman) Assigned to: Egon Willighagen (egonw) Summary: SP Fingerprint Initial Comment: Attached is the Shortest path based fingerprinter. more info: http://chembioinfo.com/ http://wp.me/JsHe Patch is here https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/shortest_path_fingerprint.patch ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2012-10-01 07:16 Message: OK, matching the requirements for master: one independent approval. I have additional comments, though (Do you prefer me to file them as bug reports too?): ---------------------------------------------------- * missing JavaDoc for some public classes and methods * missing test classes for SimpleAtomCanonicalisation and RandomNumber * SimpleAtomCanonicalisation and RandomNumber should be part of the fingerprint module and not standard, where they are used * add the missing commons-math-**.jar.meta file the content has version, name, copyright/license info for the library, like this one for xpp3-1.1.4c.jar: [xpp3-1.1.4c.jar] Library=XML Pull Parser Version=1.1.4c Copyright=(c) 2001 Extreme! Lab, Indiana University. All rights reserved. License=Indiana University Extreme! Lab Software License LicenseURL=http://www.extreme.indiana.edu/viewcvs/~checkout~/XPP3/LICENSE.txt Download=http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/old_versions/ SourceCode=http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/old_versions/ Homepage=http://www.extreme.indiana.edu/xgws/xsoap/xpp/ ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 09:32 Message: Signed off. Pull Request: https://github.com/egonw/cdk/pull/18 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:51 Message: Okay looks good. As you've said the failed tests case seem to be due to the new fingerprinter accounting for stereo parity in which case the entire substructure doesn't match. Am happy to sign this off. Some niggles but these could be fixed later: I think the hashing would work just as good if you use the raw integer values. If you look at the hashCode() for an Integer it is.: public int hashCode() { return value; } As the hash isn't anything special you could convert: for (Iterator<IAtomContainer> it = sssr.atomContainers().iterator(); it.hasNext();) { IAtomContainer ring = it.next(); int toHashCode = String.valueOf(ring.getAtomCount()).hashCode(); paths.add(patternIndex, toHashCode); patternIndex++; } into (see below for iterator removal): for (IAtomContainer ring : sssr.atomContainers()) { paths.add(patternIndex++, ring.getAtomCount()); } Say the atom count is 10. The hash code of "10".hashCode() is not 10 but it will always be the same as "10.hashCode()". You can therefore just use the integer 10. This could be the same for all numeric/enumerations. What really matters with the hash coding is that you combined correctly. public int compare(IAtom o1, IAtom o2) { if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } As the method needs an IAtom (which is also an IChemObject) then the method will throw a class cast exception even without the if statement. Explicitly adding this means the internal byte code looks like this (not really but you ge the gist): if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) { throw new ClassCastException(); } If this is just to check for nulls then != null is a lot faster. I myself use instanceof too much, see why it's bad here: http://www.javapractices.com/topic/TopicAction.do?Id=31 For iterator can be replaced with foreach for (Iterator<IAtom> it = container.atoms().iterator(); it.hasNext();) { IAtom atom = it.next(); } is exactly the same as (but more code) for(IAtom atom : container.atoms()){ .... } for(IAtomContainer atom : sssr.containers){ .... } ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 08:18 Message: Just checking, would it be possible to remove netbeans project files in nbproject/ ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-30 07:40 Message: Hi John, Thanks for the detailed review. Here is the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/SP_fingerprinter_updated_patch_with_reviewers_recommendations.patch a) I have made changes as recommended by you (most of therm). b) Let me know if it works at your end. c)I I will check the failed case once you are done with this patch. Just to clarify. Should the fingerprint report that they are subsets? Many thanks Asad ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:55 Message: Okay I found the problem. The super class test bug706786 fails as the fingerprint of one molecule is a subset of the other. I've attached the images of the molecules to this tracker. Original Bug: https://sourceforge.net/tracker/?func=detail&aid=706786&group_id=20024&atid=120024 ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:31 Message: Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p. The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class. Anyways the all new tests pass as expected. AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why.... J ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-08-30 03:21 Message: Hi Asad, It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments). Questions/Comments: StringBuffer/Synchronised Collections Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation. SimpleAtomCanonicalisation Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used. AtomTyping/Aromaticity Detection I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity. I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion. - ShortestPathFingerprinterTest should be in module test-fingerprint - commons-math*.jar needs to be added to test-fingerprint.libdepends - AtomComparator needs a cdk-module I think... Minor Coding Suggestions: AtomComparator: - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase() RandomNumber: - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead. - generateMersenneTwisterRandomNumber(int maximum) is never used ShortestPathFingerprinter: - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date. Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read. StringBuilder can take primitives in it's append method: - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount()); MFingerprintTests - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests Thanks, J ---------------------------------------------------------------------- Comment By: Asad (asadrahman) Date: 2012-08-29 04:46 Message: Kindly ignore the previous patch as its not complete. Please use the latest patch. https://github.com/downloads/asad/ShortestPathMoleculeFingerprinter/correct_code_for_shortest_path_fingerprint.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3562570&group_id=20024 |