From: Rajarshi G. <raj...@gm...> - 2011-02-12 17:04:03
|
Hi, if a file format indicates that a set of atoms are aromatic, but says nothing about corresponding bonds, does this mean that a reader should not go ahead and mark bonds as aromatic (even though both atoms in a bond may be explicitly noted as aromatic)? My feeling is that this is the correct behavior based on only parsing what is explicitly provided in a file format -- Rajarshi Guha NIH Chemical Genomics Center |
From: Egon W. <ego...@gm...> - 2011-02-12 17:55:42
|
On Sat, Feb 12, 2011 at 5:03 PM, Rajarshi Guha <raj...@gm...> wrote: > Hi, if a file format indicates that a set of atoms are aromatic, but > says nothing about corresponding bonds, does this mean that a reader > should not go ahead and mark bonds as aromatic (even though both atoms > in a bond may be explicitly noted as aromatic)? My feeling is that > this is the correct behavior based on only parsing what is explicitly > provided in a file format This is one of those examples where it is handy to have that Appendix D around :) It could just be that the format description actually writes that, for example, ring bonds between two aromatic atoms are aromatic too... who knows... In general I try to stick as close to what the file has to give, unless I know for sure how the format documentation expects reader to get more information (like the SMILES specification expecting readers to perceive aromaticity...) Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: gilleain t. <gil...@gm...> - 2011-02-12 18:08:31
|
Hi, >From a graph perspective, a set of vertices induces a subgraph - in other words, there is a set of bonds that could be marked aromatic based only on the aromatic atom information. Perhaps there could be an IO option to force this? gilleain On Sat, Feb 12, 2011 at 5:55 PM, Egon Willighagen <ego...@gm...> wrote: > On Sat, Feb 12, 2011 at 5:03 PM, Rajarshi Guha <raj...@gm...> wrote: >> Hi, if a file format indicates that a set of atoms are aromatic, but >> says nothing about corresponding bonds, does this mean that a reader >> should not go ahead and mark bonds as aromatic (even though both atoms >> in a bond may be explicitly noted as aromatic)? My feeling is that >> this is the correct behavior based on only parsing what is explicitly >> provided in a file format > > This is one of those examples where it is handy to have that Appendix > D around :) It could just be that the format description actually > writes that, for example, ring bonds between two aromatic atoms are > aromatic too... who knows... > > In general I try to stick as close to what the file has to give, > unless I know for sure how the format documentation expects reader to > get more information (like the SMILES specification expecting readers > to perceive aromaticity...) > > Egon > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Institutet för miljömedicin > Karolinska Institutet > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |
From: Egon W. <ego...@gm...> - 2011-02-12 18:27:19
|
On Sat, Feb 12, 2011 at 6:08 PM, gilleain torrance <gil...@gm...> wrote: > >From a graph perspective, a set of vertices induces a subgraph - in > other words, there is a set of bonds that could be marked aromatic > based only on the aromatic atom information. Perhaps there could be an > IO option to force this? Not every bond between two aromatic bonds is aromatic itself... think biphenyl. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2011-02-12 18:27:51
|
On Sat, Feb 12, 2011 at 6:26 PM, Egon Willighagen <ego...@gm...> wrote: > Not every bond between two aromatic bonds is aromatic itself... think biphenyl. Ummm: Not every bond between two aromatic atoms is aromatic itself... think biphenyl. :) Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Rajarshi G. <raj...@gm...> - 2011-02-12 18:29:04
|
On Sat, Feb 12, 2011 at 1:26 PM, Egon Willighagen <ego...@gm...> wrote: > On Sat, Feb 12, 2011 at 6:08 PM, gilleain torrance > <gil...@gm...> wrote: >> >From a graph perspective, a set of vertices induces a subgraph - in >> other words, there is a set of bonds that could be marked aromatic >> based only on the aromatic atom information. Perhaps there could be an >> IO option to force this? > > Not every bond between two aromatic bonds is aromatic itself... think biphenyl. Good point. I think that it is best to stick with just marking atoms and let aromaticity perception deal with it when requried. Egon, could you take a look at the HIN patch when you get a chance? -- Rajarshi Guha NIH Chemical Genomics Center |
From: Egon W. <ego...@gm...> - 2011-02-12 18:34:44
|
On Sat, Feb 12, 2011 at 6:28 PM, Rajarshi Guha <raj...@gm...> wrote: > Egon, could you take a look at the HIN patch when you get a chance? I invite others to look at it. I'm in the process of updating Bioclipse to CDK 1.3.8 with CDK-JChemPaint 17. Takes more effort than hoped, as a lot has changed, mostly on the Bioclipse side... but also InChI stuff... and a new Signatures module, and have not even started getting SMSD into Bioclipse... :( Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Andrew D. <da...@da...> - 2011-02-13 15:26:12
|
On Feb 12, 2011, at 6:55 PM, Egon Willighagen wrote: > In general I try to stick as close to what the file has to give, > unless I know for sure how the format documentation expects reader to > get more information (like the SMILES specification expecting readers > to perceive aromaticity...) Really? I pushed hard to make sure the OpenSMILES specification does not require aromaticity perception, but I haven't reviewed the entire spec for a while. In my opinion the rule for reading a SMILES should be that: - if a bond is of unspecified type (eg, the bond in 'cs') - and both atoms are aromatic - and the bond is in a ring => then the bond is aromatic otherwise it's a single bond. If both atoms are aromatic and the bond is a ring bond and the bond should be a single bond, then it is the responsibility of the writer to put an explicit "-" in that place, rather than depend on aromaticity perception in the reader. I believe this is sufficient to allow people to write simplified SMILES readers which only preserve input aromaticity. This would be useful in a database system where it can be assumed that the SMILES is valid, and where fast interchange between SMILES to molecule is important, and where the aromaticity model may be defined by some external program. Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2011-02-13 15:29:46
|
On Sun, Feb 13, 2011 at 3:25 PM, Andrew Dalke <da...@da...> wrote: > On Feb 12, 2011, at 6:55 PM, Egon Willighagen wrote: >> In general I try to stick as close to what the file has to give, >> unless I know for sure how the format documentation expects reader to >> get more information (like the SMILES specification expecting readers >> to perceive aromaticity...) > > Really? I pushed hard to make sure the OpenSMILES specification does > not require aromaticity perception, but I haven't reviewed the entire > spec for a while. I said SMILES, not OpenSMILES :) > I believe this is sufficient to allow people to write simplified > SMILES readers which only preserve input aromaticity. Indeed. It works until people start complaining... :/ BTW, the CDK SMILES parser allows this now, I think... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Andrew D. <da...@da...> - 2011-02-13 15:40:05
|
On Feb 13, 2011, at 4:29 PM, Egon Willighagen wrote: > I said SMILES, not OpenSMILES :) True. But then again, CDK doesn't exactly implement SMILES nor OpenSMILES. :) I was curious so I looked up the Daylight documentation. It says single and aromatic bonds may always be omitted and Aromaticity must be deduced in a system such as SMILES so what I said definitely is not part of SMILES. But it also says: it is not necessary to enter any structure as aromatic if the user prefers to enter an aliphatic (Kekulé-like) structure I take from that the implication that people enter structures, while my use case is for when structures entered by other software. Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2011-02-13 15:45:40
|
On Sun, Feb 13, 2011 at 3:39 PM, Andrew Dalke <da...@da...> wrote: > On Feb 13, 2011, at 4:29 PM, Egon Willighagen wrote: >> I said SMILES, not OpenSMILES :) > > True. But then again, CDK doesn't exactly implement SMILES nor OpenSMILES. :) It tries the first... bug reports welcome... I've seen so many for SMILES... we solved many know bugs, but I'd be the last to say, there are none left. (CDK community: if someone is interested in working on the parser... it doesn't have a maintainer.) > I was curious so I looked up the Daylight documentation. It says > > single and aromatic bonds may always be omitted > > and > > Aromaticity must be deduced in a system such as SMILES > > so what I said definitely is not part of SMILES. > > But it also says: > > it is not necessary to enter any structure as aromatic if the > user prefers to enter an aliphatic (Kekulé-like) structure > > I take from that the implication that people enter structures, while > my use case is for when structures entered by other software. Yeah, it's this kind of ambiguity why I like OpenSMILES, as that project tries to iron out those missing details... Then again, these things are exactly why I personally favor explicit formats. Just say what you mean. CML comes closest. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Andrew D. <da...@da...> - 2011-02-13 16:30:53
|
On Feb 13, 2011, at 4:45 PM, Egon Willighagen wrote: >> True. But then again, CDK doesn't exactly implement SMILES nor OpenSMILES. :) > > It tries the first... bug reports welcome... Oh, quite the contrary! I can see how what I wrote might imply there was a problem with the CDK, but I was thinking about how the CDK handles a wider set of aromatic atoms than Daylight, and how that's probably the correct thing to do. Andrew da...@da... |