From: Egon W. <ego...@gm...> - 2010-04-17 11:40:44
|
Hi Vincent, On Wed, Mar 24, 2010 at 1:34 PM, Vincent Le Guilloux <vin...@un...> wrote: > - When a Group Abbreviation is used for a molecule, well it seems the > CDK set this abbreviation as atomic symbol when writing back the > molecule to MDL format, eg: Did someone reply on this yet? Like the author of the group support? CDK developers: who wrote the functionality? Can he or she please comment on the bug report? Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2010-04-19 10:54:42
|
On Mon, Apr 19, 2010 at 11:42 AM, Vincent Le Guilloux <vin...@gm...> wrote: > When the CDK load a molecule having aromatic bond flag set to 4, the bond > order is set to 1. For example, if I load benzene with aromatic flag set to > 4, and I write the molecule back, I will get cyclohexane. I think this is > quite important issue that should be fixed asap. That situation was recently discussed on this mailing list. The molfile specification does not have a bonder order 4, just 1-3. Many files write, however, molfile query structures, where 4 indicates an aromatic atom, but it is not a bond order. Now, the CDK current does not have a unknown bond order (we need a patch for that), so upon reading it (incorrectly?) defaults to SINGLE... causing the cyclohexane... now, please let us know what software created the incorrect MDL molfile, and please file a bug report against that project... Otherwise, use the DeduceBondOrder (or so) class to assign proper bond orders... Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Vincent Le G. <vin...@gm...> - 2010-04-19 12:54:34
|
2010/4/19 Egon Willighagen <ego...@gm...> > On Mon, Apr 19, 2010 at 11:42 AM, Vincent Le Guilloux > <vin...@gm...> wrote: > > When the CDK load a molecule having aromatic bond flag set to 4, the bond > > order is set to 1. For example, if I load benzene with aromatic flag set > to > > 4, and I write the molecule back, I will get cyclohexane. I think this is > > quite important issue that should be fixed asap. > > That situation was recently discussed on this mailing list. Yes indeed, my fault... I forgot (yet I was the topic starter as far as I remember) :) The > molfile specification does not have a bonder order 4, just 1-3. Many > files write, however, molfile query structures, where 4 indicates an > aromatic atom, but it is not a bond order. > Yep. > > Now, the CDK current does not have a unknown bond order (we need a > patch for that), so upon reading it (incorrectly?) defaults to > SINGLE... causing the cyclohexane... now, please let us know what > software created the incorrect MDL molfile, and please file a bug > report against that project... > It's not really a software specific problem (well at least not in my experience). I've encountered various situations... few people know in details each chemical format, and even more few people know how various softwares handle it. In my (short and modest) experience, I've already seen modeler / chemist that aromatize molecules (using Marvin for example -> if you aromatize a molecule and save it as SDF, the bond flag will be set to 4), or SDF files sent by chemical provider which contains aromatized molecules... In such cases, even if structures should no be defined using this flag, setting the bond order to 1 is clearly dangerous, even more dangerous because it is made silently. I guess that, until an appropriate flag is defined, the DeduceBondOrder should be used by default (along with a proper warning) in the reader when these specific cases are encountered. > Otherwise, use the DeduceBondOrder (or so) class to assign proper bond > orders... > > Egon > > -- > Post-doc @ Uppsala University > Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg > Homepage: http://egonw.github.com/ > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > |
From: Nina J. <ni...@ac...> - 2010-04-20 10:56:14
|
Egon, All, Egon Willighagen wrote: > On Mon, Apr 19, 2010 at 11:42 AM, Vincent Le Guilloux > <vin...@gm...> wrote: > >> When the CDK load a molecule having aromatic bond flag set to 4, the bond >> order is set to 1. For example, if I load benzene with aromatic flag set to >> 4, and I write the molecule back, I will get cyclohexane. I think this is >> quite important issue that should be fixed asap. >> > > That situation was recently discussed on this mailing list. Yes indeed. > The > molfile specification does not have a bonder order 4, just 1-3. Many > files write, however, molfile query structures, where 4 indicates an > aromatic atom, but it is not a bond order. > This is correct, and still - wouldn't it be a better approach (more user friendly) to interpret bond order 4 as aromatic - or at least have this as optional behaviour in MolReader code ? > Now, the CDK current does not have a unknown bond order (we need a > patch for that), so upon reading it (incorrectly?) defaults to > SINGLE... causing the cyclohexane... now, please let us know what > software created the incorrect MDL molfile, and please file a bug > report against that project... > > Otherwise, use the DeduceBondOrder (or so) class to assign proper bond orders... > Unfortunately no code will be able to guess benzene from cyclohexane ... Best regards, Nina > Egon > > |
From: Egon W. <ego...@gm...> - 2010-04-20 11:00:45
|
On Tue, Apr 20, 2010 at 12:56 PM, Nina Jeliazkova <ni...@ac...> wrote: >> The >> molfile specification does not have a bonder order 4, just 1-3. Many >> files write, however, molfile query structures, where 4 indicates an >> aromatic atom, but it is not a bond order. > > This is correct, and still - wouldn't it be a better approach (more user > friendly) to interpret bond order 4 as aromatic - or at least have this as > optional behaviour in MolReader code ? If not mistaken, the RELAXED mode is doing this... but would need to check... > Now, the CDK current does not have a unknown bond order (we need a > patch for that), so upon reading it (incorrectly?) defaults to > SINGLE... causing the cyclohexane... now, please let us know what > software created the incorrect MDL molfile, and please file a bug > report against that project... > > Otherwise, use the DeduceBondOrder (or so) class to assign proper bond > orders... > > Unfortunately no code will be able to guess benzene from cyclohexane ... True... but the data model in the file is not really cyclohexane... the lack of explicit hydrogens ensures this to work... Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Nina J. <ni...@ac...> - 2010-04-20 11:15:15
|
Egon Willighagen wrote: > On Tue, Apr 20, 2010 at 12:56 PM, Nina Jeliazkova <ni...@ac...> wrote: > >>> The >>> molfile specification does not have a bonder order 4, just 1-3. Many >>> files write, however, molfile query structures, where 4 indicates an >>> aromatic atom, but it is not a bond order. >>> >> This is correct, and still - wouldn't it be a better approach (more user >> friendly) to interpret bond order 4 as aromatic - or at least have this as >> optional behaviour in MolReader code ? >> > > If not mistaken, the RELAXED mode is doing this... but would need to check... > Ah, good to know! Nina > >> Now, the CDK current does not have a unknown bond order (we need a >> patch for that), so upon reading it (incorrectly?) defaults to >> SINGLE... causing the cyclohexane... now, please let us know what >> software created the incorrect MDL molfile, and please file a bug >> report against that project... >> >> Otherwise, use the DeduceBondOrder (or so) class to assign proper bond >> orders... >> >> Unfortunately no code will be able to guess benzene from cyclohexane ... >> > > True... but the data model in the file is not really cyclohexane... > the lack of explicit hydrogens ensures this to work... > > Egon > > |
From: Vincent Le G. <vin...@gm...> - 2010-04-20 11:22:12
|
2010/4/20 Egon Willighagen <ego...@gm...> > On Tue, Apr 20, 2010 at 12:56 PM, Nina Jeliazkova <ni...@ac...> wrote: > >> The > >> molfile specification does not have a bonder order 4, just 1-3. Many > >> files write, however, molfile query structures, where 4 indicates an > >> aromatic atom, but it is not a bond order. > > > > This is correct, and still - wouldn't it be a better approach (more user > > friendly) to interpret bond order 4 as aromatic - or at least have this > as > > optional behaviour in MolReader code ? > > If not mistaken, the RELAXED mode is doing this... but would need to > check... > > I don't know the RELAXED mode. But i'm thinking: if there is a RELAXED mode allowing such bond, then the STRICT one (which we are talking about ?) should send an error when meeting the aromatic flag doesn't it? Also, even if it read the aromatic flag and set the corresponding CDK aromatic property, what about bond orders? Especially, what would be the output SDF in such case? I'm gonna make some tests about that today if I've time :) > > Now, the CDK current does not have a unknown bond order (we need a > > patch for that), so upon reading it (incorrectly?) defaults to > > SINGLE... causing the cyclohexane... now, please let us know what > > software created the incorrect MDL molfile, and please file a bug > > report against that project... > > > > Otherwise, use the DeduceBondOrder (or so) class to assign proper bond > > orders... > > > > Unfortunately no code will be able to guess benzene from cyclohexane ... > > True... but the data model in the file is not really cyclohexane... > Once the molecule have been loaded by the current reader, it becomes cyclohexane (all bonds order are set to 1). There are only two possibilities I think: 1. Allow such bond representation (aromatic = no explicit order defined) in the CDK molecular model, along with proper bond order definition. 2. Use DeduceBondOrder-like algorithm by (during the loading process...) default when such flag is encountered. the lack of explicit hydrogens ensures this to work... > This means that I can get the benzene back if I use DeduceBondOrder after reading and before adding hydrogens (if I understand well). But then the problem would be: 1. One don't know when to use the deduce bond order, as the aromatic flag is silently read (and converted) by the reader. 2. It's still quite dangerous: after reading, the molecule is transformed. So we clearely have to use Hadder in the right order to get the molecule back. Anyway, I will try out relaxed mode just to see how it handle these issues :) |
From: Egon W. <ego...@gm...> - 2010-04-20 11:28:44
|
On Tue, Apr 20, 2010 at 1:22 PM, Vincent Le Guilloux <vin...@gm...> wrote: >> If not mistaken, the RELAXED mode is doing this... but would need to >> check... > > I didn't know the RELAXED mode. > > But i'm thinking: if there is a RELAXED mode allowing such bond, then the > STRICT one (which we are talking about ?) should send an error when meeting > the aromatic flag doesn't it? Indeed. Just checked, but it is not doing this yet in MDLV200Reader: } else if (order == 4) { // aromatic bond newBond = molecule.getBuilder().newInstance(IBond.class,a1, a2, IBond.Order.SINGLE, stereo); // mark both atoms and the bond as aromatic newBond.setFlag(CDKConstants.ISAROMATIC, true); a1.setFlag(CDKConstants.ISAROMATIC, true); a2.setFlag(CDKConstants.ISAROMATIC, true); } else { throw new CDKException ("Detected 'query bond type ' (value="+order +")."+ " Could not create regular molecule."); } And using such bonds are aromatic by default... but also note the explicit setting IBond.Order.SINGLE... need to fix that... Will post a RFC... > Also, even if it read the aromatic flag and set the corresponding CDK > aromatic property, what about bond orders? Especially, what would be the > output SDF in such case? I'm gonna make some tests about that today if I've > time :) Cool! Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Vincent Le G. <vin...@gm...> - 2010-04-20 14:13:31
|
2010/4/20 Egon Willighagen <ego...@gm...> > On Tue, Apr 20, 2010 at 1:22 PM, Vincent Le Guilloux > <vin...@gm...> wrote: > >> If not mistaken, the RELAXED mode is doing this... but would need to > >> check... > > > > I didn't know the RELAXED mode. > > > > But i'm thinking: if there is a RELAXED mode allowing such bond, then the > > STRICT one (which we are talking about ?) should send an error when > meeting > > the aromatic flag doesn't it? > > Indeed. > > Just checked, but it is not doing this yet in MDLV200Reader: > > } else if (order == 4) { > // aromatic bond > newBond = > molecule.getBuilder().newInstance(IBond.class,a1, a2, > IBond.Order.SINGLE, stereo); > // mark both atoms and the bond as aromatic > newBond.setFlag(CDKConstants.ISAROMATIC, true); > a1.setFlag(CDKConstants.ISAROMATIC, true); > a2.setFlag(CDKConstants.ISAROMATIC, true); > } > else { > throw new CDKException > ("Detected 'query bond type ' (value="+order +")."+ > " Could not create regular molecule."); > } > > And using such bonds are aromatic by default... but also note the > explicit setting IBond.Order.SINGLE... need to fix that... > Yep that's the problem. I've done some basic tests and the conclusion is as expected: there is no difference (at least for the aromaticity flag issue) between RELAXED and STRICT mode: they both set bond order to single at reading, the STRICT mode does not throws exception when we encounter aromatic flag, and both produce the same output molecule using MDLWriter. These issues are all visible in the snippet you pointed :) I also noticed something else: if a negative charge is present in the molecule, the resulting output SDF file will contains the charge as MDL property: M CHG 1 9 -1 but the corresponding charge flag in the atom block is not set at all (always 0). I think It's not a real problem: in the MDL specification, the charge on atom block is kept for compatibility reason, and Chemaxon's tools and MOE does not complains. But I've slightly different results with some JOELib descriptors depending on the input molecule (generated with CDK, Marvin or MOE)... I've to investigate that to see why :) |
From: Egon W. <ego...@gm...> - 2010-04-23 09:04:18
|
Hej Vincent, On Mon, Apr 19, 2010 at 2:56 PM, Vincent Le Guilloux <vin...@gm...> wrote: >> Please send an email to cdk-devel announcing yourself as picking up >> the maintainer task of the command line utilities, and let me know how >> I can assist you in getting started (which can also go via the >> cdk-devel mailing list). > > I've a lot of question indeed... Lets start with basic ones: > > - Are these tools intended to be bundled with the CDK library itself, or > should they be part of a distinct project with an external JAR file which > only includes these tools? What about subversion / git (which I don't know) > access ? Within the CDK project, there are a few projects, the CDK library being one. The CLI tools would be an add-on project. I guess it depends on the distribution whether it will have one jar, or many, but I can imagine that to bootstrap a community, a single jar might be a good starting point. In the past I used SH scripts wrapping around the jar(s), so that it would nicely integrate on a POSIX system... they should be around somewhere... I think in the SVN of Debian actually... on Alioth... we could dig them up... BTW, do you want git or subversion? > - As I'm a bit confused with all existing CDK version, tags, truncs... which > version of the CDK should be used build these tools? Depends... how will you want to distribute? Will you release a single package with all? Do you want to make Debian/Ubuntu packages (that would require CDK 1.0.2)? > - What if I want to use 2D depiction ? I'm usually working with nightly > builds (http://pele.farmbio.uu.se/nightly/), Then I'd go for CDK 1.3.x + the CDK-JChemPaint patches. The latter you can download as two single jars, and just put in your classpath. > and I know that depiction code is not in the official CDK release... Yeah, yeah... working on it, working on it... :) if people like to help, please let me know... there are many smaller, not too difficult things to do... quite a few junior-job level things to do. (writing some unit tests, JavaDoc, port patches from the JChemPaint 3.x applet...) Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Vincent Le G. <vin...@gm...> - 2010-04-23 09:24:13
|
OK so basically I totally free about the form and the content of these tools :) 2010/4/23 Egon Willighagen <ego...@gm...> > Within the CDK project, there are a few projects, the CDK library > being one. The CLI tools would be an add-on project. I guess it > depends on the distribution whether it will have one jar, or many, but > I can imagine that to bootstrap a community, a single jar might be a > good starting point. > > In the past I used SH scripts wrapping around the jar(s), so that it > would nicely integrate on a POSIX system... they should be around > somewhere... I think in the SVN of Debian actually... on Alioth... we > could dig them up... > > BTW, do you want git or subversion? > Subversion would be nice, I'm using it for all my projects, and I don't know GIT so it would be a gain of time for me. So I guess the next step would be you allocating me some space & access in the subversrion repository? > > - As I'm a bit confused with all existing CDK version, tags, truncs... > which > > version of the CDK should be used build these tools? > > Depends... how will you want to distribute? Will you release a single > package with all? Do you want to make Debian/Ubuntu packages (that > would require CDK 1.0.2)? > > Don"t know yet :) maybe two basic forms: a single jar containing all dependences, and one light-weight jar containing only the tools... Nothing is fixed yet. I'll send an email about my plans about these tools asap.... > > - What if I want to use 2D depiction ? I'm usually working with nightly > > builds (http://pele.farmbio.uu.se/nightly/), > > Then I'd go for CDK 1.3.x + the CDK-JChemPaint patches. The latter you > can download as two single jars, and just put in your classpath. > > > and I know that depiction code is not in the official CDK release... > > Yeah, yeah... working on it, working on it... :) if people like to > help, please let me know... there are many smaller, not too difficult > things to do... quite a few junior-job level things to do. (writing > some unit tests, JavaDoc, port patches from the JChemPaint 3.x > applet...) > > Egon > > > -- > Post-doc @ Uppsala University > Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg > Homepage: http://egonw.github.com/ > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > |
From: Egon W. <ego...@gm...> - 2010-04-23 09:35:12
|
Hi Vincent, On Fri, Apr 23, 2010 at 11:24 AM, Vincent Le Guilloux <vin...@gm...> wrote: > OK so basically I totally free about the form and the content of these tools > :) Indeed :) > 2010/4/23 Egon Willighagen <ego...@gm...> >> BTW, do you want git or subversion? > > Subversion would be nice, I'm using it for all my projects, and I don't know > GIT so it would be a gain of time for me. So I guess the next step would be > you allocating me some space & access in the subversrion repository? I suggest you continue from this location: http://cdk.svn.sf.net/viewvc/cdk/cdk-clapps/trunk/ Feel free to move aside the current trunk into a branch/ for historical reasons, if you rather start from scratch. What's your SF account again? Can give you SVN write access then... >> > - As I'm a bit confused with all existing CDK version, tags, truncs... >> > which version of the CDK should be used build these tools? >> >> Depends... how will you want to distribute? Will you release a single >> package with all? Do you want to make Debian/Ubuntu packages (that >> would require CDK 1.0.2)? > > Don"t know yet :) maybe two basic forms: a single jar containing all > dependences, and one light-weight jar containing only the tools... > Nothing is fixed yet. And no need too. Best to let your user base express their preference first... > I'll send an email about my plans about these tools asap.... Great, welcome! Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Vincent Le G. <vin...@gm...> - 2010-04-19 09:42:32
|
Hi Egon, Thanks for your answer. To my knowledge no one answered to this. Just to sum up several issues (quite important in my opinion) I've encountered using the CDK: ** SDF reader / writer - Group Abbreviation error ** When a Group Abbreviation is used for a molecule, well it seems the CDK set this abbreviation as atomic symbol when writing back the molecule to MDL format, eg: 2.1012 -2.6351 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 (...) G 15 10 Ph Becomes the following when using SDF writer: 2.1012 -2.6351 0.0000 Ph 0 0 0 0 0 0 0 0 0 0 0 0 ** SDF Reader / writer - Bond stereo ** When the bon stereochemistry is set to 4 (either up or down), a CDK exception occures when writing back the molecule. The location of this bug is identified: as the UNDEFINED stereo CDK constant is set to NULL, using the molecule stereo in a switch statement raise the NPE, for example in the MDL writer: 297 switch(bond.getStereo()){ ** SDF Reader - Aromatic bond flag ** When the CDK load a molecule having aromatic bond flag set to 4, the bond order is set to 1. For example, if I load benzene with aromatic flag set to 4, and I write the molecule back, I will get cyclohexane. I think this is quite important issue that should be fixed asap. Possible solutions: - Throw an error if aromatic flag is found - Guess bond orders if aromatic flag is found (I would choose this one). On Wed, Mar 24, 2010 at 1:34 PM, Vincent Le Guilloux > <vin...@un...> wrote: > > - When a Group Abbreviation is used for a molecule, well it seems the > > CDK set this abbreviation as atomic symbol when writing back the > > molecule to MDL format, eg: > > Did someone reply on this yet? Like the author of the group support? > > CDK developers: who wrote the functionality? Can he or she please > comment on the bug report? > > Egon > > -- > Post-doc @ Uppsala University > Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg > Homepage: http://egonw.github.com/ > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > |