You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(28) |
Nov
(13) |
Dec
(25) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(14) |
Feb
(30) |
Mar
(8) |
Apr
(24) |
May
(13) |
Jun
(8) |
Jul
(12) |
Aug
(46) |
Sep
(30) |
Oct
(40) |
Nov
(68) |
Dec
(15) |
2003 |
Jan
(20) |
Feb
(93) |
Mar
(56) |
Apr
(21) |
May
(28) |
Jun
(78) |
Jul
(58) |
Aug
(54) |
Sep
(213) |
Oct
(162) |
Nov
(81) |
Dec
(54) |
2004 |
Jan
(139) |
Feb
(227) |
Mar
(87) |
Apr
(150) |
May
(107) |
Jun
(70) |
Jul
(42) |
Aug
(87) |
Sep
(17) |
Oct
(34) |
Nov
(60) |
Dec
(93) |
2005 |
Jan
(45) |
Feb
(76) |
Mar
(67) |
Apr
(109) |
May
(90) |
Jun
(46) |
Jul
(39) |
Aug
(78) |
Sep
(67) |
Oct
(32) |
Nov
(81) |
Dec
(86) |
2006 |
Jan
(85) |
Feb
(76) |
Mar
(85) |
Apr
(84) |
May
(144) |
Jun
(78) |
Jul
(55) |
Aug
(55) |
Sep
(85) |
Oct
(71) |
Nov
(60) |
Dec
(30) |
2007 |
Jan
(27) |
Feb
(74) |
Mar
(48) |
Apr
(183) |
May
(33) |
Jun
(50) |
Jul
(83) |
Aug
(37) |
Sep
(110) |
Oct
(109) |
Nov
(78) |
Dec
(126) |
2008 |
Jan
(112) |
Feb
(81) |
Mar
(58) |
Apr
(38) |
May
(167) |
Jun
(115) |
Jul
(143) |
Aug
(164) |
Sep
(173) |
Oct
(143) |
Nov
(98) |
Dec
(134) |
2009 |
Jan
(185) |
Feb
(116) |
Mar
(125) |
Apr
(201) |
May
(59) |
Jun
(110) |
Jul
(56) |
Aug
(85) |
Sep
(109) |
Oct
(129) |
Nov
(315) |
Dec
(93) |
2010 |
Jan
(49) |
Feb
(93) |
Mar
(207) |
Apr
(123) |
May
(114) |
Jun
(63) |
Jul
(111) |
Aug
(160) |
Sep
(70) |
Oct
(254) |
Nov
(11) |
Dec
(91) |
2011 |
Jan
(34) |
Feb
(155) |
Mar
(92) |
Apr
(15) |
May
(82) |
Jun
(191) |
Jul
(102) |
Aug
(71) |
Sep
(113) |
Oct
(44) |
Nov
(66) |
Dec
(84) |
2012 |
Jan
(51) |
Feb
(95) |
Mar
(31) |
Apr
(100) |
May
(133) |
Jun
(73) |
Jul
(103) |
Aug
(90) |
Sep
(84) |
Oct
(217) |
Nov
(113) |
Dec
(30) |
2013 |
Jan
(9) |
Feb
(18) |
Mar
(10) |
Apr
(17) |
May
(26) |
Jun
(30) |
Jul
|
Aug
(10) |
Sep
(13) |
Oct
(65) |
Nov
(22) |
Dec
(30) |
2014 |
Jan
(55) |
Feb
(19) |
Mar
(31) |
Apr
(21) |
May
(15) |
Jun
(5) |
Jul
(16) |
Aug
(29) |
Sep
(37) |
Oct
(9) |
Nov
(7) |
Dec
(22) |
2015 |
Jan
(4) |
Feb
(22) |
Mar
(24) |
Apr
(18) |
May
(41) |
Jun
(13) |
Jul
(2) |
Aug
(7) |
Sep
(10) |
Oct
(43) |
Nov
(14) |
Dec
(18) |
2016 |
Jan
(7) |
Feb
(22) |
Mar
(12) |
Apr
(9) |
May
(10) |
Jun
(24) |
Jul
(10) |
Aug
(13) |
Sep
(1) |
Oct
(5) |
Nov
|
Dec
(3) |
2017 |
Jan
(1) |
Feb
(8) |
Mar
|
Apr
(2) |
May
(8) |
Jun
(4) |
Jul
(9) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
(3) |
Feb
|
Mar
(10) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
(3) |
Nov
(1) |
Dec
(3) |
2019 |
Jan
(13) |
Feb
(3) |
Mar
|
Apr
(6) |
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(3) |
Oct
|
Nov
(1) |
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(7) |
Sep
|
Oct
|
Nov
(1) |
Dec
|
2022 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: E.L. W. <eg...@sc...> - 2002-11-23 16:09:20
|
On Saturday 23 November 2002 14:45, Peter Murray-Rust wrote: > >The current reader only parses CML1. The 1,S,2,D,3,T is part > >of CML2. (right?) > > CML V1.0 implicitly defined the convention 1,S,2,D,3,T but did not > deliberately mandate it. The paper encouraged the use of the above > convention but made it clear that other conventions were allowable (e.g. > some systems have bond orders of 4, -5, etc.) It has been some time that I actually read the article... most of the times, I check the DTD... i.e. the explicit definition... but, I understand that, DTD is too limited to allow 1,S,etc... Ok, bug accepted ;) > All the CML writers that I > have encountered other than CDK have adopted the 1,S,2,D,3,T convention. > This means that all CML files except those from CDK are interoperable. Ok. CDK will be no different then... > If CDK wishes to have its own convention for bond orders it is welcome to > do so, but they should be labelled as such. So you could write > <string builtin="order" convention="CDK">1.5</string> > and this would be acceptable. However you would need a CMLReader that > understood CDK bond order conventions to read this and you would have to > convince the other software writers that this was useful. > > The CML convention was designed to be extensible so that if you wished to > have a bond order of (say) 2.5 you could write: > <string builtin="order" >A</string> > <...> > <string builtin="order" convention="CDK">2.5</string> The CDK output will preferable not use any non default convention. That's the idea... if it does not, then it actually is a bug in the CML writer... > Most CMLReaders would understand the first. I don't know what CDK would do > with it. Note that order is a <string> and not a <float>. This is > deliberate To understand the second they would require to implement a CDK > convention reader. I suspect most would default to "unknown bond order". > > > This will be fixed when a CML2 reader is > >written. > > CML2 does not mandate a controlled vocabulary for bond orders. Ah, then it won't .... > It is important that CML Readers and Writers produce valid CML and it is > important to work for interoperability. Agreed. That's the whole point of CML... > CMLWriters are easier to create > than CMLReaders especially if they have to deal with multiple conventions. Ah, yes... kind of you to mention that... ;) the strenght of my CML reader is the flexibility in reading conventions... Programs can very easily write and add a handler for parsing a specific convention. Egon |
From: Peter Murray-R. <pm...@ca...> - 2002-11-23 13:52:28
|
At 11:59 22/11/2002 -0800, no...@so... wrote: >Bugs item #642456, was opened at 2002-11-22 19:46 >You can respond by visiting: >https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 > > >Category: org.openscience.cdk.io >Group: None >Status: Open >Resolution: None >Priority: 5 >Submitted By: Peter Murray-Rust (petermr) > >Assigned to: Egon Willighagen (egonw) >Summary: IChI reader does not map atoms > >Initial Comment: >The IChI reader (V1.3) does not appear to read the ><atom.orig-nbr> >element so that atoms in the final structure are garbled. This >means that IChI does not output correct connection tables. > > > >---------------------------------------------------------------------- > > >Comment By: Egon Willighagen (egonw) >Date: 2002-11-22 20:59 > >Message: >Logged In: YES >user_id=25678 First, I am using IChI V0.9beta. Is this what you have used? >The documentation states that this informaion is auxiliary. >As such it should not be necessary for the identifier >itself. agreed > Description of the <basic> element describes how >atoms are numbered. The output by the IChI program even says >"Auxiliary info is not a part of the identifier, it is not >unique". agreed >In any case, v1.3 does indeed not read ><atom.orig-nbr>. Could you elaborate why this is a bug, i.e. >explain why the garbling is actually caused by the missing ><atom.orig-nbr>? We created a number of files using IChI where the dbonds did not map onto the given bonds and appeared to require the atom.orig-nbr. I agree that this shouldn't be necessary. Here is a file distributed with IChI which shows a dbond (8-7-) not originally in the bond list: --------------------8<-------------- <IChI version="0.9Beta"> <structure number="1" id.name="" id.value=""> <identifier version="0.9Beta" tautomeric="0"> <basic>C*6C1*16CC, 2-1 4-3 6-5 7-1 8-2 9-3 10-4 11-7 12-8 13-9 14-10 15-11 16-12 17-13 18-14 19-15 20-16 21-17 22-18 23-5-19-21 24-6-20-22</basic> <charge></charge> <stereo> <dbond>8-7- 13-9- 14-10- 15-11+ 16-12+ 21-17+ 22-18+ 23-19+ 24-20+</dbond> <sp3></sp3> </stereo> </identifier> <identifier.auxiliary-info version="0.9Beta" tautomeric="0"> <!-- Auxiliary info is not a part of the identifier, it is not unique --> <atom.orig-nbr>13 14 23 24 10 11 9 15 1 22 8 18 2 21 7 17 3 20 6 16 4 19 5 12</atom.orig-nbr> <atom.equivalence>(1 2 3 4)(5 6)(7 8 9 10)(11 12 13 14)(15 16 17 18)(19 20 21 22)(23 24)</atom.equivalence> </identifier.auxiliary-info> </structure> </IChI> --------------------8<-------------- I have a suspicion that this is a problem with IChIV0.9beta . It suggests that there needs to be more error checking in an IChIReader. I do not have the problem files with me but should be able to send some samples on Monday. I suspect this may also be responsible for one of the other bugs I reported. It also emphasises the need to have test sets that are available so that we can agree on what are bugs and whose responsibility it P. >---------------------------------------------------------------------- > >You can respond by visiting: >https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Peter Murray-R. <pm...@ca...> - 2002-11-23 13:51:20
|
At 11:27 22/11/2002 +0100, J=F6rg K. Wegner wrote: >Hello, <pmr> >>I don't intend to write a SMARTS parser. Instead we are developing a >>generic CML query language which encompasses the concepts in "most" of >>the current chemical systems and generic structure representation. > >Until now i've often found SMARTS patterns in the literature for >coding substucture patterns. I think there should be at least an=20 >additional SMARTS -> CML query converter. Of sure there could be some >different opinions about standards, but many people use SMARTS ... >i think that's a similar problem like: >LaTeX equation <-> MathML We are committed to producing legacy converters to CML wherever possible so= =20 SMARTS2CQL would be a natural. But the problem with all the proprietary=20 methods are that they have opaque semantics. For example the concept of=20 aromatic atoms will differ between Daylight and CDK. That means that a=20 SMARTS concept will run differently on the two systems. P. >>Best >> >>P. > >Regards, Joerg > >> >> >> >>------------------------------------------------------- >>This sf.net email is sponsored by: Battle your brains against the best >>in the Thawte Crypto Challenge. Be the first to crack the code - >>register now: http://www.gothawte.com/rd521.html >>_______________________________________________ >>Cdk-devel mailing list >>Cdk...@li... >>https://lists.sourceforge.net/lists/listinfo/cdk-devel > > >-- >Dipl. Chem. Joerg K. Wegner >Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany >Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091 >E-Mail: mailto:we...@in... >WWW: http://www-ra.informatik.uni-tuebingen.de > > > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Peter Murray-R. <pm...@ca...> - 2002-11-23 13:50:57
|
At 12:03 22/11/2002 -0800, no...@so... wrote: >Bugs item #639455, was opened at 2002-11-16 22:29 >You can respond by visiting: >https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024 > >Category: org.openscience.cdk.io >Group: None >Status: Open > >Resolution: Postponed >Priority: 5 >Submitted By: Peter Murray-Rust (petermr) >Assigned to: Egon Willighagen (egonw) >Summary: CML bonder orders have wrong format > >Initial Comment: >The bond orders emitted as CML are floats (1.0, 1.5). This >is inconsistent with the CML specification which uses >strings 1,S,2,D,3,T and A. For anything else a convention >attribute is required. > > > >---------------------------------------------------------------------- > > >Comment By: Egon Willighagen (egonw) >Date: 2002-11-22 21:03 > >Message: >Logged In: YES >user_id=25678 > >The current reader only parses CML1. The 1,S,2,D,3,T is part >of CML2. (right?) CML V1.0 implicitly defined the convention 1,S,2,D,3,T but did not deliberately mandate it. The paper encouraged the use of the above convention but made it clear that other conventions were allowable (e.g. some systems have bond orders of 4, -5, etc.) All the CML writers that I have encountered other than CDK have adopted the 1,S,2,D,3,T convention. This means that all CML files except those from CDK are interoperable. If CDK wishes to have its own convention for bond orders it is welcome to do so, but they should be labelled as such. So you could write <string builtin="order" convention="CDK">1.5</string> and this would be acceptable. However you would need a CMLReader that understood CDK bond order conventions to read this and you would have to convince the other software writers that this was useful. The CML convention was designed to be extensible so that if you wished to have a bond order of (say) 2.5 you could write: <string builtin="order" >A</string> <...> <string builtin="order" convention="CDK">2.5</string> Most CMLReaders would understand the first. I don't know what CDK would do with it. Note that order is a <string> and not a <float>. This is deliberate To understand the second they would require to implement a CDK convention reader. I suspect most would default to "unknown bond order". > This will be fixed when a CML2 reader is >written. CML2 does not mandate a controlled vocabulary for bond orders. It is important that CML Readers and Writers produce valid CML and it is important to work for interoperability. CMLWriters are easier to create than CMLReaders especially if they have to deal with multiple conventions. P. >---------------------------------------------------------------------- > >You can respond by visiting: >https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024 > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: <no...@so...> - 2002-11-22 20:54:43
|
Bugs item #642365, was opened at 2002-11-22 17:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024 Category: org.openscience.cdk.io Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: MDLReader fails on certain files Initial Comment: In reading the NCI diversity set (as SDF, converted to MOL), certain files return a null molecule from MDLReader. In org.openscience.cdk.applications.Viewer no molecule is displayed. The molecules all appear to contain Pt or Hg but I haven't found the cause. I enclose a typical MOL file - there are ca 10 others I can send. ---------------------------------8<----------------- 20410 8 49915 33D 20410 50 53 1 1 V2000 0.0021 -0.0041 0.0020 Hg 0 0 0 0 0 0 0 0 0 -0.0264 2.0957 0.0136 C 0 0 0 0 0 0 0 0 0 0.0297 -2.0438 -0.0092 N 0 0 0 0 0 0 0 0 0 1.4090 2.6253 0.0027 C 0 0 1 0 0 0 0 0 0 -1.0583 -2.8742 -0.0033 C 0 0 0 0 0 0 0 0 0 1.1256 -2.8340 -0.0298 C 0 0 0 0 0 0 0 0 0 1.3882 4.1551 0.0111 C 0 0 0 0 0 0 0 0 0 2.1001 2.1519 1.1604 O 0 0 0 0 0 0 0 0 0 -0.5575 -4.1807 -0.0156 C 0 0 0 0 0 0 0 0 0 -2.4506 -2.6291 0.0118 C 0 0 0 0 0 0 0 0 0 0.7634 -4.0958 -0.0316 N 0 0 0 0 0 0 0 0 0 2.7626 4.6622 0.0007 N 0 0 0 0 0 0 0 0 0 1.3868 2.6433 2.2970 C 0 0 0 0 0 0 0 0 0 -1.4205 -5.2676 -0.0134 N 0 0 0 0 0 0 0 0 0 -2.9031 -1.4984 0.0222 O 0 0 0 0 0 0 0 0 0 -3.2712 -3.7426 0.0134 N 0 0 0 0 0 0 0 0 0 3.9098 3.9326 -0.0146 C 0 0 0 0 0 0 0 0 0 3.2529 6.2616 -0.0022 S 0 0 0 0 0 0 0 0 0 -0.9122 -6.6416 -0.0257 C 0 0 0 0 0 0 0 0 0 -2.7650 -5.0077 0.0012 C 0 0 0 0 0 0 0 0 0 -4.7243 -3.5567 0.0287 C 0 0 0 0 0 0 0 0 0 3.8817 2.7174 -0.0211 O 0 0 0 0 0 0 0 0 0 5.1848 4.6630 -0.0236 C 0 0 0 0 0 0 0 0 0 2.8070 6.9314 -1.1734 O 0 0 0 0 0 0 0 0 0 2.8258 6.9299 1.1769 O 0 0 0 0 0 0 0 0 0 5.0125 6.0532 -0.0171 C 0 0 0 0 0 0 0 0 0 -3.5395 -5.9451 0.0034 O 0 0 0 0 0 0 0 0 0 6.5025 4.1741 -0.0389 C 0 0 0 0 0 0 0 0 0 6.0727 6.9151 -0.0257 C 0 0 0 0 0 0 0 0 0 7.5678 5.0456 -0.0446 C 0 0 0 0 0 0 0 0 0 7.3666 6.4143 -0.0373 C 0 0 0 0 0 0 0 0 0 -0.5536 2.4568 -0.8695 H 0 0 0 0 0 0 0 0 0 -0.5366 2.4472 0.9104 H 0 0 0 0 0 0 0 0 0 1.9192 2.2738 -0.8941 H 0 0 0 0 0 0 0 0 0 2.1460 -2.4806 -0.0433 H 0 0 0 0 0 0 0 0 0 0.8610 4.5162 -0.8719 H 0 0 0 0 0 0 0 0 0 0.8780 4.5066 0.9079 H 0 0 0 0 0 0 0 0 0 1.8795 2.3059 3.2089 H 0 0 0 0 0 0 0 0 0 1.3723 3.7329 2.2736 H 0 0 0 0 0 0 0 0 0 0.3645 2.2657 2.2754 H 0 0 0 0 0 0 0 0 0 0.1776 -6.6257 -0.0359 H 0 0 0 0 0 0 0 0 0 -1.2597 -7.1655 0.8647 H 0 0 0 0 0 0 0 0 0 -1.2765 -7.1557 -0.9151 H 0 0 0 0 0 0 0 0 0 -4.9543 -2.4913 0.0371 H 0 0 0 0 0 0 0 0 0 -5.1586 -4.0153 -0.8597 H 0 0 0 0 0 0 0 0 0 -5.1412 -4.0254 0.9202 H 0 0 0 0 0 0 0 0 0 6.6801 3.1088 -0.0459 H 0 0 0 0 0 0 0 0 0 5.9042 7.9819 -0.0238 H 0 0 0 0 0 0 0 0 0 8.5748 4.6555 -0.0560 H 0 0 0 0 0 0 0 0 0 8.2106 7.0881 -0.0445 H 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 2 4 1 0 0 0 0 2 32 1 0 0 0 0 2 33 1 0 0 0 0 3 5 1 0 0 0 0 3 6 1 0 0 0 0 4 8 1 1 0 0 0 4 7 1 0 0 0 0 4 34 1 6 0 0 0 5 9 2 0 0 0 0 5 10 1 0 0 0 0 6 11 2 0 0 0 0 6 35 1 0 0 0 0 7 12 1 0 0 0 0 7 36 1 0 0 0 0 7 37 1 0 0 0 0 8 13 1 0 0 0 0 9 14 1 0 0 0 0 9 11 1 0 0 0 0 10 15 2 0 0 0 0 10 16 1 0 0 0 0 12 17 1 0 0 0 0 12 18 1 0 0 0 0 13 38 1 0 0 0 0 13 39 1 0 0 0 0 13 40 1 0 0 0 0 14 19 1 0 0 0 0 14 20 1 0 0 0 0 16 21 1 0 0 0 0 16 20 1 0 0 0 0 17 22 2 0 0 0 0 17 23 1 0 0 0 0 18 24 2 0 0 0 0 18 25 2 0 0 0 0 18 26 1 0 0 0 0 19 41 1 0 0 0 0 19 42 1 0 0 0 0 19 43 1 0 0 0 0 20 27 2 0 0 0 0 21 44 1 0 0 0 0 21 45 1 0 0 0 0 21 46 1 0 0 0 0 23 28 2 0 0 0 0 23 26 1 0 0 0 0 26 29 2 0 0 0 0 28 30 1 0 0 0 0 28 47 1 0 0 0 0 29 31 1 0 0 0 0 29 48 1 0 0 0 0 30 31 2 0 0 0 0 30 49 1 0 0 0 0 31 50 1 0 0 0 0 M END ---------------------------------8<----------------- ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024 |
From: <no...@so...> - 2002-11-22 20:53:13
|
Bugs item #639455, was opened at 2002-11-16 22:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024 Category: org.openscience.cdk.io Group: None >Status: Pending Resolution: Postponed Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Egon Willighagen (egonw) Summary: CML bonder orders have wrong format Initial Comment: The bond orders emitted as CML are floats (1.0, 1.5). This is inconsistent with the CML specification which uses strings 1,S,2,D,3,T and A. For anything else a convention attribute is required. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2002-11-22 21:03 Message: Logged In: YES user_id=25678 The current reader only parses CML1. The 1,S,2,D,3,T is part of CML2. (right?) This will be fixed when a CML2 reader is written. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024 |
From: <no...@so...> - 2002-11-22 20:52:24
|
Bugs item #642456, was opened at 2002-11-22 19:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 Category: org.openscience.cdk.io Group: None >Status: Pending Resolution: None Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Egon Willighagen (egonw) Summary: IChI reader does not map atoms Initial Comment: The IChI reader (V1.3) does not appear to read the <atom.orig-nbr> element so that atoms in the final structure are garbled. This means that IChI does not output correct connection tables. ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2002-11-22 20:59 Message: Logged In: YES user_id=25678 The documentation states that this informaion is auxiliary. As such it should not be necessary for the identifier itself. Description of the <basic> element describes how atoms are numbered. The output by the IChI program even says "Auxiliary info is not a part of the identifier, it is not unique". In any case, v1.3 does indeed not read <atom.orig-nbr>. Could you elaborate why this is a bug, i.e. explain why the garbling is actually caused by the missing <atom.orig-nbr>? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 |
From: <no...@so...> - 2002-11-22 20:51:40
|
Bugs item #642429, was opened at 2002-11-22 18:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024 Category: org.openscience.cdk.io Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Egon Willighagen (egonw) Summary: IChIReader fails with multiple molecules Initial Comment: IChIReader fails on strings such as <identifier version="0.9Beta" tautomeric="0"> <basic>N2*4SSCCN1N1Cu, 7-5 8-6 9-7 10-8 11-1-2-3-4- 9-10;C1*7NCC, 2-1 4-3 5-1 6-3 7-2 8-4 9-5-6 10-7-8- 9;NS1C, 3-1-2</basic> <charge>+2;;</charge> </identifier> where ";" is used to separate discrete molecules. It also fails with ArrayIndexOutOfBoundsException (20) on large molecules: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 20 at org.openscience.cdk.AtomContainer.getAtomAt (AtomContainer.java:241) at org.openscience.cdk.io.ichi.IChIHandler.analyseBondsEnco ding(IChIHand ler.java:269) at org.openscience.cdk.io.ichi.IChIHandler.endElement (IChIHandler.java:1 24) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024 |
From: <no...@so...> - 2002-11-22 20:03:32
|
Bugs item #642429, was opened at 2002-11-22 18:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024 Category: org.openscience.cdk.io Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Peter Murray-Rust (petermr) >Assigned to: Egon Willighagen (egonw) Summary: IChIReader fails with multiple molecules Initial Comment: IChIReader fails on strings such as <identifier version="0.9Beta" tautomeric="0"> <basic>N2*4SSCCN1N1Cu, 7-5 8-6 9-7 10-8 11-1-2-3-4- 9-10;C1*7NCC, 2-1 4-3 5-1 6-3 7-2 8-4 9-5-6 10-7-8- 9;NS1C, 3-1-2</basic> <charge>+2;;</charge> </identifier> where ";" is used to separate discrete molecules. It also fails with ArrayIndexOutOfBoundsException (20) on large molecules: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 20 at org.openscience.cdk.AtomContainer.getAtomAt (AtomContainer.java:241) at org.openscience.cdk.io.ichi.IChIHandler.analyseBondsEnco ding(IChIHand ler.java:269) at org.openscience.cdk.io.ichi.IChIHandler.endElement (IChIHandler.java:1 24) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024 |
From: <no...@so...> - 2002-11-22 20:03:11
|
Bugs item #639455, was opened at 2002-11-16 22:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024 Category: org.openscience.cdk.io Group: None Status: Open >Resolution: Postponed Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Egon Willighagen (egonw) Summary: CML bonder orders have wrong format Initial Comment: The bond orders emitted as CML are floats (1.0, 1.5). This is inconsistent with the CML specification which uses strings 1,S,2,D,3,T and A. For anything else a convention attribute is required. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2002-11-22 21:03 Message: Logged In: YES user_id=25678 The current reader only parses CML1. The 1,S,2,D,3,T is part of CML2. (right?) This will be fixed when a CML2 reader is written. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024 |
From: <no...@so...> - 2002-11-22 19:59:35
|
Bugs item #642456, was opened at 2002-11-22 19:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 >Category: org.openscience.cdk.io Group: None Status: Open Resolution: None Priority: 5 Submitted By: Peter Murray-Rust (petermr) >Assigned to: Egon Willighagen (egonw) Summary: IChI reader does not map atoms Initial Comment: The IChI reader (V1.3) does not appear to read the <atom.orig-nbr> element so that atoms in the final structure are garbled. This means that IChI does not output correct connection tables. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2002-11-22 20:59 Message: Logged In: YES user_id=25678 The documentation states that this informaion is auxiliary. As such it should not be necessary for the identifier itself. Description of the <basic> element describes how atoms are numbered. The output by the IChI program even says "Auxiliary info is not a part of the identifier, it is not unique". In any case, v1.3 does indeed not read <atom.orig-nbr>. Could you elaborate why this is a bug, i.e. explain why the garbling is actually caused by the missing <atom.orig-nbr>? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 |
From: <no...@so...> - 2002-11-22 18:46:14
|
Bugs item #642456, was opened at 2002-11-22 18:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Nobody/Anonymous (nobody) Summary: IChI reader does not map atoms Initial Comment: The IChI reader (V1.3) does not appear to read the <atom.orig-nbr> element so that atoms in the final structure are garbled. This means that IChI does not output correct connection tables. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024 |
From: <no...@so...> - 2002-11-22 17:58:31
|
Bugs item #642429, was opened at 2002-11-22 17:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024 Category: org.openscience.cdk.io Group: None Status: Open Resolution: None Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Nobody/Anonymous (nobody) Summary: IChIReader fails with multiple molecules Initial Comment: IChIReader fails on strings such as <identifier version="0.9Beta" tautomeric="0"> <basic>N2*4SSCCN1N1Cu, 7-5 8-6 9-7 10-8 11-1-2-3-4- 9-10;C1*7NCC, 2-1 4-3 5-1 6-3 7-2 8-4 9-5-6 10-7-8- 9;NS1C, 3-1-2</basic> <charge>+2;;</charge> </identifier> where ";" is used to separate discrete molecules. It also fails with ArrayIndexOutOfBoundsException (20) on large molecules: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 20 at org.openscience.cdk.AtomContainer.getAtomAt (AtomContainer.java:241) at org.openscience.cdk.io.ichi.IChIHandler.analyseBondsEnco ding(IChIHand ler.java:269) at org.openscience.cdk.io.ichi.IChIHandler.endElement (IChIHandler.java:1 24) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024 |
From: <no...@so...> - 2002-11-22 17:55:33
|
Bugs item #642426, was opened at 2002-11-22 17:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642426&group_id=20024 Category: org.openscience.cdk.io Group: None Status: Open Resolution: None Priority: 5 Submitted By: Peter Murray-Rust (petermr) Assigned to: Nobody/Anonymous (nobody) Summary: MDLReader fails on certain files Initial Comment: In reading the NCI diversity set (as SDF, converted to MOL), certain files return a null molecule from MDLReader. In org.openscience.cdk.applications.Viewer no molecule is displayed. The molecules all appear to contain Pt or Hg but I haven't found the cause. I enclose a typical MOL file - there are ca 10 others I can send. ---------------------------------8<----------------- 20410 8 49915 33D 20410 50 53 1 1 V2000 0.0021 -0.0041 0.0020 Hg 0 0 0 0 0 0 0 0 0 -0.0264 2.0957 0.0136 C 0 0 0 0 0 0 0 0 0 0.0297 -2.0438 -0.0092 N 0 0 0 0 0 0 0 0 0 1.4090 2.6253 0.0027 C 0 0 1 0 0 0 0 0 0 -1.0583 -2.8742 -0.0033 C 0 0 0 0 0 0 0 0 0 1.1256 -2.8340 -0.0298 C 0 0 0 0 0 0 0 0 0 1.3882 4.1551 0.0111 C 0 0 0 0 0 0 0 0 0 2.1001 2.1519 1.1604 O 0 0 0 0 0 0 0 0 0 -0.5575 -4.1807 -0.0156 C 0 0 0 0 0 0 0 0 0 -2.4506 -2.6291 0.0118 C 0 0 0 0 0 0 0 0 0 0.7634 -4.0958 -0.0316 N 0 0 0 0 0 0 0 0 0 2.7626 4.6622 0.0007 N 0 0 0 0 0 0 0 0 0 1.3868 2.6433 2.2970 C 0 0 0 0 0 0 0 0 0 -1.4205 -5.2676 -0.0134 N 0 0 0 0 0 0 0 0 0 -2.9031 -1.4984 0.0222 O 0 0 0 0 0 0 0 0 0 -3.2712 -3.7426 0.0134 N 0 0 0 0 0 0 0 0 0 3.9098 3.9326 -0.0146 C 0 0 0 0 0 0 0 0 0 3.2529 6.2616 -0.0022 S 0 0 0 0 0 0 0 0 0 -0.9122 -6.6416 -0.0257 C 0 0 0 0 0 0 0 0 0 -2.7650 -5.0077 0.0012 C 0 0 0 0 0 0 0 0 0 -4.7243 -3.5567 0.0287 C 0 0 0 0 0 0 0 0 0 3.8817 2.7174 -0.0211 O 0 0 0 0 0 0 0 0 0 5.1848 4.6630 -0.0236 C 0 0 0 0 0 0 0 0 0 2.8070 6.9314 -1.1734 O 0 0 0 0 0 0 0 0 0 2.8258 6.9299 1.1769 O 0 0 0 0 0 0 0 0 0 5.0125 6.0532 -0.0171 C 0 0 0 0 0 0 0 0 0 -3.5395 -5.9451 0.0034 O 0 0 0 0 0 0 0 0 0 6.5025 4.1741 -0.0389 C 0 0 0 0 0 0 0 0 0 6.0727 6.9151 -0.0257 C 0 0 0 0 0 0 0 0 0 7.5678 5.0456 -0.0446 C 0 0 0 0 0 0 0 0 0 7.3666 6.4143 -0.0373 C 0 0 0 0 0 0 0 0 0 -0.5536 2.4568 -0.8695 H 0 0 0 0 0 0 0 0 0 -0.5366 2.4472 0.9104 H 0 0 0 0 0 0 0 0 0 1.9192 2.2738 -0.8941 H 0 0 0 0 0 0 0 0 0 2.1460 -2.4806 -0.0433 H 0 0 0 0 0 0 0 0 0 0.8610 4.5162 -0.8719 H 0 0 0 0 0 0 0 0 0 0.8780 4.5066 0.9079 H 0 0 0 0 0 0 0 0 0 1.8795 2.3059 3.2089 H 0 0 0 0 0 0 0 0 0 1.3723 3.7329 2.2736 H 0 0 0 0 0 0 0 0 0 0.3645 2.2657 2.2754 H 0 0 0 0 0 0 0 0 0 0.1776 -6.6257 -0.0359 H 0 0 0 0 0 0 0 0 0 -1.2597 -7.1655 0.8647 H 0 0 0 0 0 0 0 0 0 -1.2765 -7.1557 -0.9151 H 0 0 0 0 0 0 0 0 0 -4.9543 -2.4913 0.0371 H 0 0 0 0 0 0 0 0 0 -5.1586 -4.0153 -0.8597 H 0 0 0 0 0 0 0 0 0 -5.1412 -4.0254 0.9202 H 0 0 0 0 0 0 0 0 0 6.6801 3.1088 -0.0459 H 0 0 0 0 0 0 0 0 0 5.9042 7.9819 -0.0238 H 0 0 0 0 0 0 0 0 0 8.5748 4.6555 -0.0560 H 0 0 0 0 0 0 0 0 0 8.2106 7.0881 -0.0445 H 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 2 4 1 0 0 0 0 2 32 1 0 0 0 0 2 33 1 0 0 0 0 3 5 1 0 0 0 0 3 6 1 0 0 0 0 4 8 1 1 0 0 0 4 7 1 0 0 0 0 4 34 1 6 0 0 0 5 9 2 0 0 0 0 5 10 1 0 0 0 0 6 11 2 0 0 0 0 6 35 1 0 0 0 0 7 12 1 0 0 0 0 7 36 1 0 0 0 0 7 37 1 0 0 0 0 8 13 1 0 0 0 0 9 14 1 0 0 0 0 9 11 1 0 0 0 0 10 15 2 0 0 0 0 10 16 1 0 0 0 0 12 17 1 0 0 0 0 12 18 1 0 0 0 0 13 38 1 0 0 0 0 13 39 1 0 0 0 0 13 40 1 0 0 0 0 14 19 1 0 0 0 0 14 20 1 0 0 0 0 16 21 1 0 0 0 0 16 20 1 0 0 0 0 17 22 2 0 0 0 0 17 23 1 0 0 0 0 18 24 2 0 0 0 0 18 25 2 0 0 0 0 18 26 1 0 0 0 0 19 41 1 0 0 0 0 19 42 1 0 0 0 0 19 43 1 0 0 0 0 20 27 2 0 0 0 0 21 44 1 0 0 0 0 21 45 1 0 0 0 0 21 46 1 0 0 0 0 23 28 2 0 0 0 0 23 26 1 0 0 0 0 26 29 2 0 0 0 0 28 30 1 0 0 0 0 28 47 1 0 0 0 0 29 31 1 0 0 0 0 29 48 1 0 0 0 0 30 31 2 0 0 0 0 30 49 1 0 0 0 0 31 50 1 0 0 0 0 M END ---------------------------------8<----------------- ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642426&group_id=20024 |
From: <no...@so...> - 2002-11-22 16:24:29
|
Bugs item #642365, was opened at 2002-11-22 08:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024 Category: org.openscience.cdk.io Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: MDLReader fails on certain files Initial Comment: In reading the NCI diversity set (as SDF, converted to MOL), certain files return a null molecule from MDLReader. In org.openscience.cdk.applications.Viewer no molecule is displayed. The molecules all appear to contain Pt or Hg but I haven't found the cause. I enclose a typical MOL file - there are ca 10 others I can send. ---------------------------------8<----------------- 20410 8 49915 33D 20410 50 53 1 1 V2000 0.0021 -0.0041 0.0020 Hg 0 0 0 0 0 0 0 0 0 -0.0264 2.0957 0.0136 C 0 0 0 0 0 0 0 0 0 0.0297 -2.0438 -0.0092 N 0 0 0 0 0 0 0 0 0 1.4090 2.6253 0.0027 C 0 0 1 0 0 0 0 0 0 -1.0583 -2.8742 -0.0033 C 0 0 0 0 0 0 0 0 0 1.1256 -2.8340 -0.0298 C 0 0 0 0 0 0 0 0 0 1.3882 4.1551 0.0111 C 0 0 0 0 0 0 0 0 0 2.1001 2.1519 1.1604 O 0 0 0 0 0 0 0 0 0 -0.5575 -4.1807 -0.0156 C 0 0 0 0 0 0 0 0 0 -2.4506 -2.6291 0.0118 C 0 0 0 0 0 0 0 0 0 0.7634 -4.0958 -0.0316 N 0 0 0 0 0 0 0 0 0 2.7626 4.6622 0.0007 N 0 0 0 0 0 0 0 0 0 1.3868 2.6433 2.2970 C 0 0 0 0 0 0 0 0 0 -1.4205 -5.2676 -0.0134 N 0 0 0 0 0 0 0 0 0 -2.9031 -1.4984 0.0222 O 0 0 0 0 0 0 0 0 0 -3.2712 -3.7426 0.0134 N 0 0 0 0 0 0 0 0 0 3.9098 3.9326 -0.0146 C 0 0 0 0 0 0 0 0 0 3.2529 6.2616 -0.0022 S 0 0 0 0 0 0 0 0 0 -0.9122 -6.6416 -0.0257 C 0 0 0 0 0 0 0 0 0 -2.7650 -5.0077 0.0012 C 0 0 0 0 0 0 0 0 0 -4.7243 -3.5567 0.0287 C 0 0 0 0 0 0 0 0 0 3.8817 2.7174 -0.0211 O 0 0 0 0 0 0 0 0 0 5.1848 4.6630 -0.0236 C 0 0 0 0 0 0 0 0 0 2.8070 6.9314 -1.1734 O 0 0 0 0 0 0 0 0 0 2.8258 6.9299 1.1769 O 0 0 0 0 0 0 0 0 0 5.0125 6.0532 -0.0171 C 0 0 0 0 0 0 0 0 0 -3.5395 -5.9451 0.0034 O 0 0 0 0 0 0 0 0 0 6.5025 4.1741 -0.0389 C 0 0 0 0 0 0 0 0 0 6.0727 6.9151 -0.0257 C 0 0 0 0 0 0 0 0 0 7.5678 5.0456 -0.0446 C 0 0 0 0 0 0 0 0 0 7.3666 6.4143 -0.0373 C 0 0 0 0 0 0 0 0 0 -0.5536 2.4568 -0.8695 H 0 0 0 0 0 0 0 0 0 -0.5366 2.4472 0.9104 H 0 0 0 0 0 0 0 0 0 1.9192 2.2738 -0.8941 H 0 0 0 0 0 0 0 0 0 2.1460 -2.4806 -0.0433 H 0 0 0 0 0 0 0 0 0 0.8610 4.5162 -0.8719 H 0 0 0 0 0 0 0 0 0 0.8780 4.5066 0.9079 H 0 0 0 0 0 0 0 0 0 1.8795 2.3059 3.2089 H 0 0 0 0 0 0 0 0 0 1.3723 3.7329 2.2736 H 0 0 0 0 0 0 0 0 0 0.3645 2.2657 2.2754 H 0 0 0 0 0 0 0 0 0 0.1776 -6.6257 -0.0359 H 0 0 0 0 0 0 0 0 0 -1.2597 -7.1655 0.8647 H 0 0 0 0 0 0 0 0 0 -1.2765 -7.1557 -0.9151 H 0 0 0 0 0 0 0 0 0 -4.9543 -2.4913 0.0371 H 0 0 0 0 0 0 0 0 0 -5.1586 -4.0153 -0.8597 H 0 0 0 0 0 0 0 0 0 -5.1412 -4.0254 0.9202 H 0 0 0 0 0 0 0 0 0 6.6801 3.1088 -0.0459 H 0 0 0 0 0 0 0 0 0 5.9042 7.9819 -0.0238 H 0 0 0 0 0 0 0 0 0 8.5748 4.6555 -0.0560 H 0 0 0 0 0 0 0 0 0 8.2106 7.0881 -0.0445 H 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 2 4 1 0 0 0 0 2 32 1 0 0 0 0 2 33 1 0 0 0 0 3 5 1 0 0 0 0 3 6 1 0 0 0 0 4 8 1 1 0 0 0 4 7 1 0 0 0 0 4 34 1 6 0 0 0 5 9 2 0 0 0 0 5 10 1 0 0 0 0 6 11 2 0 0 0 0 6 35 1 0 0 0 0 7 12 1 0 0 0 0 7 36 1 0 0 0 0 7 37 1 0 0 0 0 8 13 1 0 0 0 0 9 14 1 0 0 0 0 9 11 1 0 0 0 0 10 15 2 0 0 0 0 10 16 1 0 0 0 0 12 17 1 0 0 0 0 12 18 1 0 0 0 0 13 38 1 0 0 0 0 13 39 1 0 0 0 0 13 40 1 0 0 0 0 14 19 1 0 0 0 0 14 20 1 0 0 0 0 16 21 1 0 0 0 0 16 20 1 0 0 0 0 17 22 2 0 0 0 0 17 23 1 0 0 0 0 18 24 2 0 0 0 0 18 25 2 0 0 0 0 18 26 1 0 0 0 0 19 41 1 0 0 0 0 19 42 1 0 0 0 0 19 43 1 0 0 0 0 20 27 2 0 0 0 0 21 44 1 0 0 0 0 21 45 1 0 0 0 0 21 46 1 0 0 0 0 23 28 2 0 0 0 0 23 26 1 0 0 0 0 26 29 2 0 0 0 0 28 30 1 0 0 0 0 28 47 1 0 0 0 0 29 31 1 0 0 0 0 29 48 1 0 0 0 0 30 31 2 0 0 0 0 30 49 1 0 0 0 0 31 50 1 0 0 0 0 M END ---------------------------------8<----------------- ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024 |
From: <we...@in...> - 2002-11-22 10:30:24
|
Hello, > I am a supporter of SAX - see last message to this list. And see > http://www.megginson.com But it is best suited for individual > applications rather than libraries. SAX is designed to discard > unnecessary elements and structures and I don't think a generic library > can easily make those decisions for all users. Is that link correct ? (or do i have net problems ?) > > If you plan to develop a SMARTS parser for this task i would recommend > > techniques like JavaCC or other JavaCompilerCompiler tools. But here a > > good computer scientist will be needed for coding BNF norm ... > > or somebody with much time !;-) > > > I don't intend to write a SMARTS parser. Instead we are developing a > generic CML query language which encompasses the concepts in "most" of > the current chemical systems and generic structure representation. Until now i've often found SMARTS patterns in the literature for coding substucture patterns. I think there should be at least an additional SMARTS -> CML query converter. Of sure there could be some different opinions about standards, but many people use SMARTS ... i think that's a similar problem like: LaTeX equation <-> MathML > > Best > > P. Regards, Joerg > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Battle your brains against the best > in the Thawte Crypto Challenge. Be the first to crack the code - > register now: http://www.gothawte.com/rd521.html > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > -- Dipl. Chem. Joerg K. Wegner Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de |
From: Peter Murray-R. <pm...@ca...> - 2002-11-21 20:57:24
|
The following uses of "PseudoAtoms" may be relevant: 1. a "sparkle" in a molecular orbitals calculation 2. a list of atoms (Hl = halogen). This could be because the experimental evidence wasn't clear 3. a dummy atom (usually a point in space, e.g. a centroid) 4. an abbreviation for a precise connection table fragment (Et = CH2CH3) 5. an locant in a Markush structure (R) There may be others. My guess is that you wish to use 4 and 5. For 4 you need to think carefully about how the R group is defined - IMO it must be a table external to the program. You would need a language to define the fragment. Do you cover multivalent PseudoAtoms like (say) P for -OP(=O)(-O)O- ? 5 needs careful thinking. How will it be used in CDK? [It will form part of CML Query] P. |
From: Christoph S. <ste...@ic...> - 2002-11-21 13:44:07
|
Hi there, an interesting question regarding our current RFC #8 (pseudo atoms), was brought up by L. Shymal: ---start of quote--- Was just looking at RFC 8 on the CDK site. Does your definition of PseudoAtom mean that it is bonded to other entities via only one bond or can they connect via multiple bonds(can valency violations be detected ?) and can they be realized into a real fragment and collapsed back into PseudoAtom form ? Can one have overlapping PseudoAtoms in a molecule ? The concept is definitely useful, my kludge deals with this using just a collection of atoms. I was wondering if i could do anything useful with this concept. ---end of quote--- These are certainly valuable and valid questions which we need to discuss. From my re-reading of Egons proposal in rfc #8 I would say that Pseudo atoms are a simpler construct than the concept that L. Shymal implies in his questions. Framents like those addressed in the questions are of high interest in my own area of research dealing with deterministic and stochastic structure generators. There, you can frequently detect a number of fragments from, say, spectroscopic sources, which may of course overlap. These fragments are then used in order to build chemically valid structures by putting them together in a combinatorial way. So, I would say that this aspect should be dealt with by using the regular fragment class and by having specialized classes handle the complicated issues of detecting overlap and synthesizing molecules based on this knowledge. This also complies with our policy to keep the core classes as simple as possible and have factories or engines, or what ever you'd like to call them, for the higher functionalities. Of course, there could and should be a mechanism of converting fragments into pseudo atoms and vice versa. Any further comments? Cheers, Chris -- Dr. Christoph Steinbeck (http://www.ice.mpg.de/departments/ChemInf) MPI of Chemical Ecology, Winzerlaer Str. 10, Beutenberg Campus, 07745 Jena, Germany Tel: +49(0)3641 571263 - Fax: +49(0)3641 571202 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: Peter Murray-R. <pm...@ca...> - 2002-11-20 22:44:11
|
At 14:11 19/11/2002 +0100, E.L. Willighagen wrote: >Hi Peter, HI - I have snipped some of this to keep it shorter > > My suggestions for library development are: > > - keep it as small as is necessary. If a library is too large it loses > > coherence, and becomes unnavigable. I have found this in parts of CDK > > lib. I discussed this with Christoph and he agreed that there were modules > > which weren't mainstream to the purpose of CDK. The library would benefit > > from being smaller. > >Agreed. The CDK project is solving this by defining modules. At this moment >only a few modules are defined. Most importantly, the core module which >consists of the CDK storage classes. By use of modularization, the library >benefits from consisting of small and easier to grasp modules. Good. I would find it useful to have a core module (which had been used by people other than the author OR in a test harness) and a development area. Otherwise people do not know what they can use in their applications safely. A library has to be something which is integrated and then forgotten - if you keep worrying about whether the library does what you want it inhibits program design. >Modules that I planned to add are: >- file IO >- structure generation >- rendering >etc ><snip/> > > - do not introduce anything until there is a test harness and it has been > > tested. The natural assumption of a library is that it is fit for use. This > > is probably more important than for an application, since the application > > author is used to using libraries and does not expect to have to debug > > them. Of course there are and will be bugs, but a test harness will be an > > enormous help. > >I tend to disagree with this... i.e. considering this: > >1. It has not happened yet, but I've been working on a stable release of the >core module. The biggest issue at this moment is the lack of documentation. >See the archive for more info on this. >2. I like to distinguish between the stable release and the development >release. The latter is unstable: things like API changes can and will happen. >The development release is kind of a prototype release. I would agree on this distinction. My point is that the core is clearly identified as such and that it is not only stable but relatively fit for purpose. Otherwise a potential user (my main interest in CDK at the moment) will not find it worthwhile to spend the time to find what works and what doesn't >Like many open source software projects with a developers communities with >different backgrounds, prototyping and prove-of-concept implementations speed >up the development of the project. I would not feel much for loosing this >great feature of such development. I agree with this. However there is a difference between applications and toolkits/libraries. Libraries have to have releases which are more stable IMO. >As such, I welcome unharnessed/new/exciting new implementations in the >development release. However, and that's what important in your comment, the >stable release is different. In that release only proven concepts should be >used, that are well tested and have proper documentation. > > > - try to avoid "partial implementations". These can also be very > > frustrating for the user since they often don't show up until the library > > has been well integrated into the project. Typical examples are fileReaders > > which include "just the bits that the original author needed" (I have > > mentioned examples) and important items missing from the data structure. > > For example CDK seems to have very little support for charged molecules > > (see below) > >More or less the same as above. But with the note that I agree that a full >implementation is favored. But, as you know, the chemical software >development community is not that large, and for those who do development, it >is often not even their core business... Fundamentally we agree. We all get tired at 0200 when we know we have to add one more error trapping routine, etc. But the user doesn't know what we have left out. I spent many frustrating nights with the early Swing versions. I couldn't get some of the routines (e.g. text editing) to work. I then talked to a SUN developer who said that several of the routines were basically no-oops and the (licensed) developers were expected to implement them. But it destroyed my faith in Swing and I have only now felt that I can use it with any reliability. Same for Hava3D, Java2D, etc. We don't want CDK (or any other library) to get a reputation for bugginess >Thus, if a partial implementation already solves a often encountered problem, >this is more than interesting too add... As such, the MDLReader only has >limited features yet. Uptil now, mostly used from MDL files were atoms, bonds >and coordinates... The problem is that it becomes non-standard wrt MDL's description. We already see hundreds of partial MDL readers and PDB readers. Many are seriously broken. I don't think a library should produce partial implementations which is why I have worked hard on these. (There are parts of MDL which cannot be interpreted outside MDL software but apart from these I think they should be honoured). It is because CDK does not read the 2D/3D flag (which is part of MDL's spec) that it can garble 2D and 3D coordinates badly and the CML it produces is unusable. >Moreover, if a user needs additional features he can request them (they are >often no difficult to add), or implement them theirselves... And this is also >a common and proven mechanism in open source development. Yes, but it depends on being confident that there is a willing and responsive development community which can respond in reasonable time. In Linux there were hundreds of developers - CDK has 10, many other projects have fewer. >Missing features, or a list of supported features, should, however, always be >mentioned in the documentation. And this is currently, unfortunately, not >really the case. But that is a different problem, IMO. It is related. Partial implementations should always be documented and in some cases that is acceptable. > > - make non-exposed classes "private" to avoid them appearing in the > Javadoc > > where they confuse the user <snip/> > > - make sure that all code is used by someone other than the author. This > > isn't easy in a small community, but I suggest it for new routines. Or put > > routines "on probation" until someone else uses them and adds some > > comments. Like Amazon book reviews. > >This is a very interesting comment. Any idea on how to practically do this? I >do not feel much for manually checking such things... there might be programs >to do such statistics... I imagine the sourceforge tools could be used for this - specific modules could be assigned to particular developers who could add comments. But I still argue that there should be at least one call to each module. > > > I think it would be useful to choose those areas which CDK does well and > > > which are used by someone other than the author. My own selection would > > be: > > - maths, geometry > > (This is independent of the molecular data structure) > > - molecular representation and data structure, molecular perception > > (aromaticity...), > > - topological analysis, fingerprints, graphs, substructure searching > > - layout > > - rendering, interfaces with graphical systems, events > > (This is dependent on the molecular data structure) > > > > At present I don't see how the data structure supports: > > - formal charge (although it deals with partial charges - which are fuzzy > > because their origin is not defined). This is a serious limitation - we > > can't read the NCI data set into CDK as it has charges. > >See CDK RFC #6: >http://cdk.sourceforge.net/rfc6.html Thanks. Many readers absolutely rely on formal charges for atoms and some (e.g. IChI) also have formal charges on molecules. IMO this is more important than formal charge which is difficult to define without a dictionary. > > It is agreed that it doesn't yet support > > - tautomerism > > - stereochemistry > > > - extract all the information and offer it through a reader-specific > > interface. This is much more work but is the formally correct way of > > solving the problem. Thus a PDBReader should incorporate the BIB, CRYST > > SEQR, HET, CONECT etc as well as just the ATOM records. > >Yes, but this might also be something the user does not want... If only >interested in the coordinates, I do not want to read the other 5Mb of data on >that molecule... (in the extremen case ;) A filter mechanism has to be developed. If CDK declares that it is definitely not aimed at crystals, sequences, molecular formulae, connection tables and that only atom coordinates are important I would accept that this is consistent. But it isn't consistent to read crystal parameters from one file and not from another. >I've been thinking about a similar problem for Jmol (*)... in Jaguar output >files, the first frame contains the structures that was taken as input by the >Jaguar program... in some cases the user want to read that, on other cases he >does not (e.g. when the other frame define an animation)... > >Therefore, I want to add a customization layer to the file IO, where the user >can define which information it wants to have read, and which not... The most general architecture is the DOM. Another is XSLT. SAX is fine IF you know exactly what data structure you are supporting, or you lose structure and information. > > The problem with each program or toolkit writing its own file readers is > > that it multiplies the pieces of code that have to be maintained. The > point <snip/> >However, I do not feel much for dropping SAX support whatsoever.... >at least one person, Joerg (of JOELib), needs a SAX based interface to CML >files... he has very, very large files which cannot be read with DOM... I am a supporter of SAX - after all I got it started on XML-DEV! And my CMLDOM is based on SAX (at present). But SAX is designed to make it easy to discard information and we have to consider very carefully what information we want to keep in CDK. >Egon > >(*) The proper casing of chars for Jmol is: upper case J, lower case m-o-l. Noted P. >------------------------------------------------------- >This sf.net email is sponsored by: To learn the basics of securing >your web site with SSL, click here to get a FREE TRIAL of a Thawte >Server Certificate: http://www.gothawte.com/rd524.html >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Peter Murray-R. <pm...@ca...> - 2002-11-20 21:51:30
|
At 14:52 19/11/2002 +0100, J=F6rg K. Wegner wrote: >Hi all, > >> >>However, I do not feel much for dropping SAX support whatsoever.... >>at least one person, Joerg (of JOELib), needs a SAX based interface to CML >>files... he has very, very large files which cannot be read with DOM... > >I think some other persons with BIG files (every company in our SOL=20 >project and the Gasteiger group) will need SAX, too ... if they plan to=20 >use CML ... >also it's necessary to access molecules one after another and not the >complete stream at once. We talk about files with 200.000 up to 2 millions= =20 >molecules. I am a supporter of SAX - see last message to this list. And see=20 http://www.megginson.com But it is best suited for individual applications= =20 rather than libraries. SAX is designed to discard unnecessary elements and= =20 structures and I don't think a generic library can easily make those=20 decisions for all users. >> >>>- the atomType seems fragile. I have tried to use >> >>>SaturationChecker.saturateWithHydrogen() and this throws a number of >> >>>nullPEs which appear to be because the AtomTypeFactory doesn't return > >That's my opinion, too. > >If you plan to develop a SMARTS parser for this task i would recommend >techniques like JavaCC or other JavaCompilerCompiler tools. But here a=20 >good computer scientist will be needed for coding BNF norm ... >or somebody with much time !;-) I don't intend to write a SMARTS parser. Instead we are developing a=20 generic CML query language which encompasses the concepts in "most" of the= =20 current chemical systems and generic structure representation. Best P. |
From: <no...@so...> - 2002-11-19 15:46:29
|
Bugs item #640750, was opened at 2002-11-19 16:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640750&group_id=20024 Category: org.openscience.cdk.ringsearch Group: None Status: Open Resolution: None Priority: 5 Submitted By: Christoph Steinbeck (steinbeck) Assigned to: Christoph Steinbeck (steinbeck) Summary: Inefficient ring search? Initial Comment: Luo Cao wrote: As to the algorithm SSSR(function findSSSR() in the SSSRFinder.java), I think it can be more efficent. In the function,As one ring is fould, then break the bonds . After that, I think you should compute the minimum number of rings again. If ring exists,then continue; if not,break. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640750&group_id=20024 |
From: <no...@so...> - 2002-11-19 15:44:00
|
Bugs item #640748, was opened at 2002-11-19 16:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640748&group_id=20024 Category: org.openscience.cdk.layout Group: None Status: Open Resolution: None Priority: 5 Submitted By: Christoph Steinbeck (steinbeck) Assigned to: Christoph Steinbeck (steinbeck) Summary: SDG: Inconsistencies in placeRing Initial Comment: luo cao wrote: When I read the CDK source, I found someting I could not understand. It is located in the StructureDiagramGenerator.java, in the function layoutRingSet(). The code is : .... int thisRing; Ring ring = rs.getMostComplexRing(); sharedAtoms = placeFirstBond(ring.getBondAt(0),firstBondVector); ... In the function placeFirstBond(Bond bond, Vector2d bondVector): ... sharedAtoms = new AtomContainer(); sharedAtoms.addBond(bond); sharedAtoms.addAtom(bond.getAtomAt(0)); sharedAtoms.addAtom(bond.getAtomAt(1)); ... Then the function placeFirstBond() returns sharedAtoms, so the variable sharedAtoms in the function layoutRingSet() just has 2 atoms and 1 bond. Just 4 lines below in the function layoutRingSet() , the function placeRing() use the variable shareAtoms. Some code of placeRing() is: int sharedAtomCount = sharedAtoms.getAtomCount(); if (sharedAtomCount > 2) { placeBridgedRing(ring, sharedAtoms, sharedAtomsCenter, ringCenterVector, bondLength); } else if (sharedAtomCount == 2) { placeFusedRing(ring, sharedAtoms, sharedAtomsCenter, ringCenterVector, bondLength); } else if (sharedAtomCount == 1) { placeSpiroRing(ring, sharedAtoms, sharedAtomsCenter, ringCenterVector, bondLength); } But sharedAtomCount will always be 2, and placeFusedRing() will lways be executed. I think this may be an error. The sharedAtoms can not be gotten by the function placeFirstBond(). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640748&group_id=20024 |
From: <we...@in...> - 2002-11-19 13:51:14
|
Hi all, > >I am > >currently working on CMLDOM for CML2 - about to be released in the > next few > >days. I would rather that we developed interfaces/adapters to CMLDOM2 > >rather than multiple implementations. > > > Yes, that has been on my wishlist for quite some time now... Once > CMLDOM2 is > out, I'll start to work on a convertor for CDK <-> CMLDOM2... > > However, I do not feel much for dropping SAX support whatsoever.... > at least one person, Joerg (of JOELib), needs a SAX based interface to > CML > files... he has very, very large files which cannot be read with DOM... I think some other persons with BIG files (every company in our SOL project and the Gasteiger group) will need SAX, too ... if they plan to use CML ... also it's necessary to access molecules one after another and not the complete stream at once. We talk about files with 200.000 up to 2 millions molecules. > >>>- the atomType seems fragile. I have tried to use > >>>SaturationChecker.saturateWithHydrogen() and this throws a number of > >>>nullPEs which appear to be because the AtomTypeFactory doesn't return That's my opinion, too. If you plan to develop a SMARTS parser for this task i would recommend techniques like JavaCC or other JavaCompilerCompiler tools. But here a good computer scientist will be needed for coding BNF norm ... or somebody with much time !;-) > Taken as such ;) > > There is so much to do on the CDK library... We basically lack the man > hours > to do many important stuff like unit testing and documentation... I would not overestimate unit testing. It's pretty good for extended refactorings in algorithms and data structures and could be a mercy for cheminformatics algorithms if there would be public available datasets for testing protonation models, SMARTS, SMILES, file types etc. ... but there are no test sets with a standard available (Let me know if you know some ...). > > Egon Regards, Joerg > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: To learn the basics of securing > your web site with SSL, click here to get a FREE TRIAL of a Thawte > Server Certificate: http://www.gothawte.com/rd524.html > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > -- Dipl. Chem. Joerg K. Wegner Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de |
From: E.L. W. <eg...@sc...> - 2002-11-19 13:12:02
|
Hi Peter, here is my more elaborate response to your valuable email. Peter Murray-Rust wrote: > This is a critical point and we need to explore it as a community. Creating > toolkits is very hard work and in many cases very boring. It is rather like > building a gearbox. Applications usually do something interesting and > provide more motivation. Many of the toolkits have really emerged from > applications that demanded them. > > There is a slightly cyclic problem here. Without good toolkits application > developers will hack their own and we shall see incomplete and incompatible > approaches. JUMBOLib has emerged from the need to read complex XML > documents and has a library of similar fragmentation and quality to CDK. > Some good bits, some partially finished, partially tested etc. > > Please treat the following sympathetically. It comes from ca 15 years of > writing chemical software libraries. My own code does not measure well > against these either! > My suggestions for library development are: > - keep it as small as is necessary. If a library is too large it loses > coherence, and becomes unnavigable. I have found this in parts of CDK > lib. I discussed this with Christoph and he agreed that there were modules > which weren't mainstream to the purpose of CDK. The library would benefit > from being smaller. Agreed. The CDK project is solving this by defining modules. At this moment only a few modules are defined. Most importantly, the core module which consists of the CDK storage classes. By use of modularization, the library benefits from consisting of small and easier to grasp modules. Modules that I planned to add are: - file IO - structure generation - rendering etc > - keep the parts simple. We discussed the ChemSequence/ChemModel approach > for example. It is complex and confused me considerably, especially since > it wasn't documented. If it *is* being actively used then give some > examples. If not, then consider removing it until it is needed. And give > some convenience methods like ChemFile.getMolecule() I agree on that too. It confuses me often too... As suggested, documentation is the problem here... > - do not introduce anything until there is a test harness and it has been > tested. The natural assumption of a library is that it is fit for use. This > is probably more important than for an application, since the application > author is used to using libraries and does not expect to have to debug > them. Of course there are and will be bugs, but a test harness will be an > enormous help. I tend to disagree with this... i.e. considering this: 1. It has not happened yet, but I've been working on a stable release of the core module. The biggest issue at this moment is the lack of documentation. See the archive for more info on this. 2. I like to distinguish between the stable release and the development release. The latter is unstable: things like API changes can and will happen. The development release is kind of a prototype release. Like many open source software projects with a developers communities with different backgrounds, prototyping and prove-of-concept implementations speed up the development of the project. I would not feel much for loosing this great feature of such development. As such, I welcome unharnessed/new/exciting new implementations in the development release. However, and that's what important in your comment, the stable release is different. In that release only proven concepts should be used, that are well tested and have proper documentation. > - try to avoid "partial implementations". These can also be very > frustrating for the user since they often don't show up until the library > has been well integrated into the project. Typical examples are fileReaders > which include "just the bits that the original author needed" (I have > mentioned examples) and important items missing from the data structure. > For example CDK seems to have very little support for charged molecules > (see below) More or less the same as above. But with the note that I agree that a full implementation is favored. But, as you know, the chemical software development community is not that large, and for those who do development, it is often not even their core business... Thus, if a partial implementation already solves a often encountered problem, this is more than interesting too add... As such, the MDLReader only has limited features yet. Uptil now, mostly used from MDL files were atoms, bonds and coordinates... Moreover, if a user needs additional features he can request them (they are often no difficult to add), or implement them theirselves... And this is also a common and proven mechanism in open source development. Missing features, or a list of supported features, should, however, always be mentioned in the documentation. And this is currently, unfortunately, not really the case. But that is a different problem, IMO. > - make non-exposed classes "private" to avoid them appearing in the Javadoc > where they confuse the user Agreed. And to my knowledge most developers do this... But this may not be the case... > - choose names carefully. There are some misspelt classes and modules in > CDK (e.g. isArromatic). Abstract words such as Model, Sequence, Property > etc are not selfexplanatory. Names such as "saturateWithHydrogen" are > misleading (this means hydrogenating double bonds) - > "addHydrogensToSatisfyValency" would be clearer. Names alike this are > sometimes a useful way of documenting a system. I totally agree. I've just renamed the mentioned method. In my programming education not properly named methods were reason to fail an exam. > - document, document. Do not assume that the purpose of any routine and its > arguments is obvious. True. I want to note in addition that the documentation not just describes what *is* done by the method, but, more importantly, describes what the method *should* do! > - give examples. The test examples are useful here and I would have > struggled without them. Yes, in the past months I started adding more examples, but we need many more included in the JavaDoc documentation... > - make sure that all code is used by someone other than the author. This > isn't easy in a small community, but I suggest it for new routines. Or put > routines "on probation" until someone else uses them and adds some > comments. Like Amazon book reviews. This is a very interesting comment. Any idea on how to practically do this? I do not feel much for manually checking such things... there might be programs to do such statistics... > > I think it would be useful to choose those areas which CDK does well and > > which are used by someone other than the author. My own selection would > be: > - maths, geometry > (This is independent of the molecular data structure) > - molecular representation and data structure, molecular perception > (aromaticity...), > - topological analysis, fingerprints, graphs, substructure searching > - layout > - rendering, interfaces with graphical systems, events > (This is dependent on the molecular data structure) > > At present I don't see how the data structure supports: > - formal charge (although it deals with partial charges - which are fuzzy > because their origin is not defined). This is a serious limitation - we > can't read the NCI data set into CDK as it has charges. See CDK RFC #6: http://cdk.sourceforge.net/rfc6.html > It is agreed that it doesn't yet support > - tautomerism > - stereochemistry There some holders for stereochemistry.... Aren't they used at this moment? > >I've noted the remarks about the CMLReader and will fix that soon (I > consider > >not reading some info from file a bug...) About the MDLReader I've got > bigger > >plans... Recently, a newer version has been "published" (on their website), > >being V3000 (indeed, I also do not know why they did not just use V2002 > ;)... > >Anyway, I'll update the MDLReader soon and include reading of much more > >fields... > > There is a problem with generic file readers. There are two approaches: > - read those parts that are interesting. This is what most authors do > including CDK and OpenBabel. For example most non-molecular concepts are > discarded and several atom properties are ignored. It works within the > library or application but may be frustrating if users want other > information See comment above about users missing features... > - extract all the information and offer it through a reader-specific > interface. This is much more work but is the formally correct way of > solving the problem. Thus a PDBReader should incorporate the BIB, CRYST > SEQR, HET, CONECT etc as well as just the ATOM records. Yes, but this might also be something the user does not want... If only interested in the coordinates, I do not want to read the other 5Mb of data on that molecule... (in the extremen case ;) I've been thinking about a similar problem for Jmol (*)... in Jaguar output files, the first frame contains the structures that was taken as input by the Jaguar program... in some cases the user want to read that, on other cases he does not (e.g. when the other frame define an animation)... Therefore, I want to add a customization layer to the file IO, where the user can define which information it wants to have read, and which not... > The problem with each program or toolkit writing its own file readers is > that it multiplies the pieces of code that have to be maintained. The point > of CML is that it defines a semantic interface which is loss-free. Not only > must all the information be transmitted but also the meaning and syntax of > all the components should be identically. Please don't take this personally > but the CMLReader in CDK is currently too limited for what I need. > It > doesn't extract formalCharge, atomParity or bondStereo. The bondorders are > not consistent with the CML (which does not use fractional numbers). Not consistent with CML2 that is, right? > I am > currently working on CMLDOM for CML2 - about to be released in the next few > days. I would rather that we developed interfaces/adapters to CMLDOM2 > rather than multiple implementations. Yes, that has been on my wishlist for quite some time now... Once CMLDOM2 is out, I'll start to work on a convertor for CDK <-> CMLDOM2... However, I do not feel much for dropping SAX support whatsoever.... at least one person, Joerg (of JOELib), needs a SAX based interface to CML files... he has very, very large files which cannot be read with DOM... Also, one design goal for CDK is the its function in teaching computational/informational programming in chemistry... as such multiple implementation using different design is *very* valuable... > > > - the atomType seems fragile. I have tried to use > > > SaturationChecker.saturateWithHydrogen() and this throws a number of > > > nullPEs which appear to be because the AtomTypeFactory doesn't return > > Hope this gives some ideas to discuss - it is intended to be constructive Taken as such ;) There is so much to do on the CDK library... We basically lack the man hours to do many important stuff like unit testing and documentation... Egon (*) The proper casing of chars for Jmol is: upper case J, lower case m-o-l. |
From: Peter Murray-R. <pm...@ca...> - 2002-11-18 14:53:46
|
At 14:35 18/11/2002 +0100, Christoph Steinbeck wrote: >Although my subject implies a long and important email, >I just wanted to welcome Peter's lengthy message, which contains so many >important issues for the future development of the CDK that it should be >converted into a strategy paper. Thanks very much - I was nervous that it might be seen as merely critical. The issues apply generally to the OpenSource chemistry community. We don't want to stifle innovation and diversity of approach but we do want to use common semantics if possible. How far do we all agree on common semantics (independently of the details of implementation). For example we all seem to have settled on molecule contains atoms and bonds. Many of these contain common concepts (x2Coordinate, elementType) but systems differ in how the represent partialCharge, formalCharge, atomTypes, etc. All bonds support two-atom links but some extend the concept (In CML2 we allow that a bond may be between not only atoms but also bonds or even electrons to support organometallics). These semantics are independent of the language used and so apply to all opensource projects in chemistry. How many are there? We should probably start with those that expose their data structure in a systematic manner. (Editors and renderers like JChempaint , JMOL, Rasmol, BKChem, XDrawChem, primarily expose graphical interfaces and data structures may be difficult to locate. Ghemical, GROMACS and abinit are applications or interfaces to them. The following list is NOT exhaustive so please don't feel offended if you aren't here. The main Open toolkits I am familiar with are: JUMBOLib Java DOM for molecules, crystals, documents and spectra CDK Java molecular representation perception, support for editors and renderers JOELib Java molecular representation and perception, etc. Descendant of OELib OpenBabel C++ molecular representation and perception, etc. Descendant of OELib IChI C++ canonicalization of molecular structure A lot of work has gone into all of these. How should they develop? We are too small to offer a diversity of semantics and will confuse the community. Unlike Linux we do not have zillions of developers, but our task is harder in many ways - we have to develop a global semantics for chemistry. A useful start could be to systematize the semantics and functionality of these and any other libraries not included. (We are also aware of several non-Open toolkits which expose some of their data structure but we do not have the whole picture. They may inform us , but cannot be included). I am particularly keen that the difficult areas are not fragmented. These include: - atomTypes. - aromaticity - tautomerism - stereochemistry The best way forward is to represent these independently of the implementation. Thus OpenBabel has external documents describing the semantics of these. It seems reasonable that we should share this approach. If the documents were rewritten in XML both groups could use the same reference. The same applies to elementTypes, valencies, radii, etc. I suspect that OB and CDK differ in their aromaticity methodology and are not easily reconciled. It could be a useful time to bring the various groups closer together. P. >Let's try to work this out in greater detail in the near future. > >Cheers, > >Chris > >-- >Dr. Christoph Steinbeck (http://www.ice.mpg.de/departments/ChemInf) >MPI of Chemical Ecology, Winzerlaer Str. 10, Beutenberg Campus, 07745 >Jena, Germany >Tel: +49(0)3641 571263 - Fax: +49(0)3641 571202 > >What is man but that lofty spirit - that sense of enterprise. >... Kirk, "I, Mudd," stardate 4513.3.. > > > >------------------------------------------------------- >This sf.net email is sponsored by: To learn the basics of securing your >web site with SSL, click here to get a FREE TRIAL of a Thawte Server >Certificate: http://www.gothawte.com/rd524.html >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel |