From: Ralf S. <ra...@ar...> - 2012-08-08 07:29:42
|
Hello, draw a propane molecule: H3C-CH2-CH3 and replace the middle C with a Cl or an H atom. The resulting molecule with Cl seems to be an allowed and connected molecule (also in MSketch), while the same with H triggers an exception on cleanup in JCP, probably because it's not recognized as connected by the ConnectivityChecker. My question: before I file a bug on this, I would like to know how H3C-H-CH3 should be treated: is it an allowed molecule, and connected? Because if it's not allowed as a connected entity (H bridges, DNA etc?) then the H placement should have separated the two halves already, contrary to H3C-Cl-CH3. Is there a consensus on this? Is H a special case? |
From: Egon W. <ego...@gm...> - 2012-08-08 07:59:22
|
On Wed, Aug 8, 2012 at 9:23 AM, Ralf Stephan <ra...@ar...> wrote: > draw a propane molecule: H3C-CH2-CH3 and replace the middle C > with a Cl or an H atom. The resulting molecule with Cl seems > to be an allowed and connected molecule (also in MSketch), while > the same with H triggers an exception on cleanup in JCP, That is a bug. Bridged hydrogens are rare, and may be marked with red (which a 6 coordinate carbon should be too), but should not throw an exception... > probably because it's not recognized as connected by the > ConnectivityChecker. What is the full stacktrace? > My question: before I file a bug on this, I would like to know > how H3C-H-CH3 should be treated: is it an allowed molecule, Yes, chemically unreasonably though. > and connected? Yes. The change of the element type should normally not break bonds. > Because if it's not allowed as a connected entity > (H bridges, DNA etc?) then the H placement should have separated > the two halves already, contrary to H3C-Cl-CH3. I am not sure about the state of drawing of 2-electron-3-atom bonds, but from a data model perspective it is possible. > Is there a consensus on this? Is H a special case? The atom typer does not have a bridging hydrogen atom type, mostly because I have no clue what atom type properties to give it, and if it requires fixing of algorithms... But, JCP should be flexible about it... These are just initial comments... let's continue to explore what is going on... I think JCP should not throw stacktraces with that molecule, and I like to learn about the cause... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: <ra...@ar...> - 2012-08-08 08:06:27
|
On Wed, Aug 08, 2012 at 09:58:55AM +0200, Egon Willighagen wrote: > But, JCP should be flexible about it... So, it should not separate the molecule? > These are just initial comments... let's continue to explore what is > going on... I think JCP should not throw stacktraces with that > molecule, and I like to learn about the cause... It's not so difficult: your code in StuctureDiagramGenerator.java:258 explicitly removes all Hs before starting a layout. So, a cleanup unexpectedly leaves a separated molecule. |
From: Nina J. <jel...@gm...> - 2012-08-08 08:20:29
|
On 8 August 2012 11:00, <ra...@ar...> wrote: > On Wed, Aug 08, 2012 at 09:58:55AM +0200, Egon Willighagen wrote: > > But, JCP should be flexible about it... > > So, it should not separate the molecule? > > > These are just initial comments... let's continue to explore what is > > going on... I think JCP should not throw stacktraces with that > > molecule, and I like to learn about the cause... > > It's not so difficult: your code in StuctureDiagramGenerator.java:258 > explicitly removes all Hs before starting a layout. So, a cleanup > unexpectedly leaves a separated molecule. > > > Do we have means in the data model to distinguish H bridges and usual covalent bonds? I am guessing this is also related to my recent report on InChIs and fingerprints performance. Nina > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |
From: Egon W. <ego...@gm...> - 2012-08-08 08:22:53
|
On Wed, Aug 8, 2012 at 10:20 AM, Nina Jeliazkova <jel...@gm...> wrote: > Do we have means in the data model to distinguish H bridges and usual > covalent bonds? I am guessing this is also related to my recent report on > InChIs and fingerprints performance. No, not at this moment, but I am more than happy to mentally support the development of a patch against master for this! There is at least one failing unit test related to the lack of it. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2012-08-08 08:21:59
|
On Wed, Aug 8, 2012 at 10:00 AM, <ra...@ar...> wrote: > On Wed, Aug 08, 2012 at 09:58:55AM +0200, Egon Willighagen wrote: >> But, JCP should be flexible about it... > > So, it should not separate the molecule? I am not entire sure, but which of the two bonds should it break then? What about the bond breaking cause? A charge on the carbon, or a radical? If the edit action was just to change the element, I would personally not touch the bonding... >> These are just initial comments... let's continue to explore what is >> going on... I think JCP should not throw stacktraces with that >> molecule, and I like to learn about the cause... > > It's not so difficult: your code in StuctureDiagramGenerator.java:258 That class is not my code, or not originally anyway, though I may have done code changes in it, to update for API changes, etc :) > explicitly removes all Hs before starting a layout. So, a cleanup > unexpectedly leaves a separated molecule. Ah! The culprit... yeah, obviously it should not remove bridged hydrogens... I remember somewhere, once, I wrote code to remove hydrogens except for such hydrogens... Have a look at AtomContainerManipulator.removeHydrogensPreserveMultiplyBonded() Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: <ra...@ar...> - 2012-08-08 08:35:22
|
On Wed, Aug 08, 2012 at 10:21:12AM +0200, Egon Willighagen wrote: > Ah! The culprit... yeah, obviously it should not remove bridged > hydrogens... I remember somewhere, once, I wrote code to remove > hydrogens except for such hydrogens... > > Have a look at AtomContainerManipulator.removeHydrogensPreserveMultiplyBonded() The patch should be easy then. Thanks! |
From: John M. <joh...@gm...> - 2012-08-08 16:02:07
|
Maybe I'm being dumb, but what's a hydrogen bridge/bridging hydrogen? Google's not helping are you referring to hydrogen bonds? From Ralf's patch the problem seems to occur only when you clean the structure? It would be good not to crash but when you start introducing specific exceptions to rules about removing hydrogens is probably going to cause more headaches/confusion later on. I'd take the pragmatic approach: two covalent bonds on a hydrogen = you're doing something very wrong = boundary case = report error to user. On a side note - Chris was trying to convince an intern to look into improving the structure diagram generator last year but unfortunately they looked at it and decided to wrestle with NW Chem (fortran) instead. Out of interest does the JChemPaint use the same SDG implementation? I tend to avoid the current CDK implementation for "cleaning" as it was clobbering the stereo centres. J On 8 Aug 2012, at 09:29, ra...@ar... wrote: > On Wed, Aug 08, 2012 at 10:21:12AM +0200, Egon Willighagen wrote: >> Ah! The culprit... yeah, obviously it should not remove bridged >> hydrogens... I remember somewhere, once, I wrote code to remove >> hydrogens except for such hydrogens... >> >> Have a look at AtomContainerManipulator.removeHydrogensPreserveMultiplyBonded() > > The patch should be easy then. Thanks! > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: <ra...@ar...> - 2012-08-08 17:23:43
|
On Wed, Aug 08, 2012 at 05:01:50PM +0100, John May wrote: > From Ralf's patch the problem seems to occur only when you clean the structure? It would be good not to crash but when you start introducing specific exceptions to rules about removing hydrogens is probably going to cause more headaches/confusion later on. Yes only with cleanup. > I'd take the pragmatic approach: two covalent bonds on a hydrogen = you're doing something very wrong = boundary case = report error to user. John, that's why I asked. But you should be aware that MSketch has neither a problem with such molecules, so, pragmatically... what about other programs? > On a side note - Chris was trying to convince an intern to look into improving the structure diagram generator last year but unfortunately they looked at it and decided to wrestle with NW Chem (fortran) instead. So that it would be faster than Java, duh. > Out of interest does the JChemPaint use the same SDG implementation? I tend to avoid the current CDK implementation for "cleaning" as it was clobbering the stereo centres. Is there an alternative? |
From: John M. <joh...@gm...> - 2012-08-08 18:17:00
|
Firstly I'd just like to add that I've very glad you're patching up JChemPaint :-). On 8 Aug 2012, at 18:17, ra...@ar... wrote: > On Wed, Aug 08, 2012 at 05:01:50PM +0100, John May wrote: >> From Ralf's patch the problem seems to occur only when you clean the structure? It would be good not to crash but when you start introducing specific exceptions to rules about removing hydrogens is probably going to cause more headaches/confusion later on. > > Yes only with cleanup. > >> I'd take the pragmatic approach: two covalent bonds on a hydrogen = you're doing something very wrong = boundary case = report error to user. > > John, that's why I asked. But you should be aware that MSketch > has neither a problem with such molecules, so, pragmatically... > what about other programs? It's difficult for sure but I wouldn't always look to other programs to see what's correct. I'd still say there are two separate issues, the real crux is that an exception is being thrown when the code fails to clean an invalid molecule. Then the second (less important) issue is handling this specific case. How would you handle a case like this. If you introduce this new "condition" what happens…? "C-H-H-H-H" It will now remove 1 to 4 of the hydrogens depending on which one you start at. Okay so another conditional is added to account for this, then another… ad infinitum. The more cases you account for the more bloated the algorithm becomes, the slower it becomes and the harder it becomes to maintain. >> On a side note - Chris was trying to convince an intern to look into improving the structure diagram generator last year but unfortunately they looked at it and decided to wrestle with NW Chem (fortran) instead. > > So that it would be faster than Java, duh. Not always… but that was more a comment the intern didn't work on the diagram generation - not really relevant sorry. >> Out of interest does the JChemPaint use the same SDG implementation? I tend to avoid the current CDK implementation for "cleaning" as it was clobbering the stereo centres. > > Is there an alternative? Not that I know of. I tend to work on mol/cml compounds so don't normally need to generate structures. Any suggestions are welcome. I guess it's possible to calculate the rotations before cleaning then flip any wedge/hatch bonds after cleaning if the rotation has changed. I will also add this for reference: http://www.steinbeck-molecular.de/steinblog/index.php/2007/08/14/structure-diagram-generation-sdg-2d-layout-in-the-chemistry-development-kit-part-1/ > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2012-08-09 06:05:40
|
On Wed, Aug 8, 2012 at 6:01 PM, John May <joh...@gm...> wrote: > Maybe I'm being dumb, but what's a hydrogen bridge/bridging hydrogen? > Google's not helping are you referring to hydrogen bonds? E.g. http://en.wikipedia.org/wiki/Diborane 2-electron-3-atom bonds ... It doesn't typically happen for carbon chemistry, but known chemistry. > I'd take the pragmatic approach: two covalent bonds on a hydrogen = you're > doing something very wrong = boundary case = report error to user. Yes, but not a stacktrace... for JChemPaint there is an visual indication that the atom type is not known... it will show a red line under the element symbol, to indicate this error. A stacktrace is not useful. > On a side note - Chris was trying to convince an intern to look into > improving the structure diagram generator last year but unfortunately they > looked at it and decided to wrestle with NW Chem (fortran) instead. > Out of interest does the JChemPaint use the same SDG implementation? I tend > to avoid the current CDK implementation for "cleaning" as it was clobbering > the stereo centres. Yes... unless Mark and Stefan forked that too... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: <ra...@ar...> - 2012-08-09 06:33:08
|
On Wed, Aug 08, 2012 at 07:16:49PM +0100, John May wrote: > It's difficult for sure but I wouldn't always look to other programs to see what's correct. Looking to other programs may give a hint as to why the case is accepted (if we had a half bond CH3...H...H3C would be valid, so there is no point disallowing CH3-H-H3C, as that would be unexpected for the user). > How would you handle a case like this. If you introduce this new "condition" what happens…? > > "C-H-H-H-H" Heh, I could now state in the same vein, C-H-H-H-H does not happen, so no need to complicate the implementation. But I have no problems accepting that my algorithm is wrong, so please expect a further patch in the pull request. |
From: Egon W. <ego...@gm...> - 2012-08-09 06:46:05
|
On Thu, Aug 9, 2012 at 8:27 AM, <ra...@ar...> wrote: > On Wed, Aug 08, 2012 at 07:16:49PM +0100, John May wrote: >> It's difficult for sure but I wouldn't always look to other programs to see what's correct. > > Looking to other programs may give a hint as to why the case is > accepted (if we had a half bond CH3...H...H3C would be valid, so > there is no point disallowing CH3-H-H3C, as that would be unexpected > for the user). > >> How would you handle a case like this. If you introduce this new "condition" what happens…? >> >> "C-H-H-H-H" > > Heh, I could now state in the same vein, C-H-H-H-H does not happen, Neither does this happen: C-O-O-O-O. Yet, from an atom type perspective, it is perfectly fine, even with the current CDK... At this moment, JChemPaint does not have means to identify unlikely substructures, only unknown atom types... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: John M. <joh...@gm...> - 2012-08-09 07:55:03
|
On 9 Aug 2012, at 07:05, Egon Willighagen <ego...@gm...> wrote: > E.g. http://en.wikipedia.org/wiki/Diborane > > 2-electron-3-atom bonds ... > > It doesn't typically happen for carbon chemistry, but known chemistry. Crazy! > Yes... unless Mark and Stefan forked that too… Don't think so... > so please expect a further patch in the pull request. Does cdk-sdg have a dependency on the cdk-standard? If so you should patch it to use Egon's suggestion: > Have a look at AtomContainerManipulator.removeHydrogensPreserveMultiplyBonded() > http://pele.farmbio.uu.se/nightly/api/org/openscience/cdk/tools/manipulator/AtomContainerManipulator.html#removeHydrogensPreserveMultiplyBonded(org.openscience.cdk.interfaces.IAtomContainer) It might also be possible to remove the shallow copy as the manipulator method will shallow copy anyways (you may need an IMolecule though). The removeHydrogensPreserveMultiplyBonded method might need some small changes though. I doesn't suffer from the order problem of H-H-H but it's doing a linear search for each atom. J |
From: <ra...@ar...> - 2012-08-09 10:17:17
|
On Thu, Aug 09, 2012 at 08:54:52AM +0100, John May wrote: > It might also be possible to remove the shallow copy as the manipulator method will shallow copy anyways (you may need an IMolecule though). The removeHydrogensPreserveMultiplyBonded method might need some small changes though. I doesn't suffer from the order problem of H-H-H but it's doing a linear search for each atom. Um, what's bothering me is what becomes of C-H-H? will it then be C-H2? We can't just let the atom disappear, can we? If so, then the algorithm should only remove Hs that become implicit later. |
From: John M. <joh...@gm...> - 2012-08-09 10:37:27
|
Tricky... > Um, what's bothering me is what becomes of C-H-H? will it then be C-H2? > We can't just let the atom disappear, can we? Yeah as you say I don't think you can let stuff disappear - in which case I would think C-H-H should stay as C-H-H. Interestingly '-C-H-H' doesn't break on the JChemPaint version I have and actually lays it out. I guess this is because the end hydrogen is removed and then the next one. > If so, then the algorithm should only remove Hs that become implicit > later. Yep, from what I understand the H removal is to make the layout of the backbone easier. How about only remove hydrogens that have a single bond to a non-H atom? Would that work? J On 9 Aug 2012, at 11:11, ra...@ar... wrote: > On Thu, Aug 09, 2012 at 08:54:52AM +0100, John May wrote: >> It might also be possible to remove the shallow copy as the manipulator method will shallow copy anyways (you may need an IMolecule though). The removeHydrogensPreserveMultiplyBonded method might need some small changes though. I doesn't suffer from the order problem of H-H-H but it's doing a linear search for each atom. > > Um, what's bothering me is what becomes of C-H-H? will it then be C-H2? > We can't just let the atom disappear, can we? > > If so, then the algorithm should only remove Hs that become implicit > later. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2012-08-09 11:38:08
|
On Thu, Aug 9, 2012 at 12:37 PM, John May <joh...@gm...> wrote: > Yep, from what I understand the H removal is to make the layout of the backbone easier. IIRC, the SDG is an algorithm to work on an hydrogen depleted graph... but ask Chris for a more definite answer... > How about only remove hydrogens that have a single bond to a non-H atom? Would that work? Probably, and the reason why that other method removeXXX() method existed... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |