cdk-devel Mailing List for The Chemistry Development Kit (Page 480)

cdk-devel — Developers forum for discussion on the Chemistry Development Kit

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (28)	Nov (13)	Dec (25)
2002	Jan (14)	Feb (30)	Mar (8)	Apr (24)	May (13)	Jun (8)	Jul (12)	Aug (46)	Sep (30)	Oct (40)	Nov (68)	Dec (15)
2003	Jan (20)	Feb (93)	Mar (56)	Apr (21)	May (28)	Jun (78)	Jul (58)	Aug (54)	Sep (213)	Oct (162)	Nov (81)	Dec (54)
2004	Jan (139)	Feb (227)	Mar (87)	Apr (150)	May (107)	Jun (70)	Jul (42)	Aug (87)	Sep (17)	Oct (34)	Nov (60)	Dec (93)
2005	Jan (45)	Feb (76)	Mar (67)	Apr (109)	May (90)	Jun (46)	Jul (39)	Aug (78)	Sep (67)	Oct (32)	Nov (81)	Dec (86)
2006	Jan (85)	Feb (76)	Mar (85)	Apr (84)	May (144)	Jun (78)	Jul (55)	Aug (55)	Sep (85)	Oct (71)	Nov (60)	Dec (30)
2007	Jan (27)	Feb (74)	Mar (48)	Apr (183)	May (33)	Jun (50)	Jul (83)	Aug (37)	Sep (110)	Oct (109)	Nov (78)	Dec (126)
2008	Jan (112)	Feb (81)	Mar (58)	Apr (38)	May (167)	Jun (115)	Jul (143)	Aug (164)	Sep (173)	Oct (143)	Nov (98)	Dec (134)
2009	Jan (185)	Feb (116)	Mar (125)	Apr (201)	May (59)	Jun (110)	Jul (56)	Aug (85)	Sep (109)	Oct (129)	Nov (315)	Dec (93)
2010	Jan (49)	Feb (93)	Mar (207)	Apr (123)	May (114)	Jun (63)	Jul (111)	Aug (160)	Sep (70)	Oct (254)	Nov (11)	Dec (91)
2011	Jan (34)	Feb (155)	Mar (92)	Apr (15)	May (82)	Jun (191)	Jul (102)	Aug (71)	Sep (113)	Oct (44)	Nov (66)	Dec (84)
2012	Jan (51)	Feb (95)	Mar (31)	Apr (100)	May (133)	Jun (73)	Jul (103)	Aug (90)	Sep (84)	Oct (217)	Nov (113)	Dec (30)
2013	Jan (9)	Feb (18)	Mar (10)	Apr (17)	May (26)	Jun (30)	Jul	Aug (10)	Sep (13)	Oct (65)	Nov (22)	Dec (30)
2014	Jan (55)	Feb (19)	Mar (31)	Apr (21)	May (15)	Jun (5)	Jul (16)	Aug (29)	Sep (37)	Oct (9)	Nov (7)	Dec (22)
2015	Jan (4)	Feb (22)	Mar (24)	Apr (18)	May (41)	Jun (13)	Jul (2)	Aug (7)	Sep (10)	Oct (43)	Nov (14)	Dec (18)
2016	Jan (7)	Feb (22)	Mar (12)	Apr (9)	May (10)	Jun (24)	Jul (10)	Aug (13)	Sep (1)	Oct (5)	Nov	Dec (3)
2017	Jan (1)	Feb (8)	Mar	Apr (2)	May (8)	Jun (4)	Jul (9)	Aug (2)	Sep (1)	Oct	Nov	Dec
2018	Jan (3)	Feb	Mar (10)	Apr	May	Jun	Jul	Aug (4)	Sep	Oct (3)	Nov (1)	Dec (3)
2019	Jan (13)	Feb (3)	Mar	Apr (6)	May (1)	Jun	Jul	Aug (1)	Sep (3)	Oct	Nov (1)	Dec
2020	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct (1)	Nov	Dec
2021	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (7)	Sep	Oct	Nov (1)	Dec
2022	Jan (2)	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 478 479 480 481 482 .. 493 > >> (Page 480 of 493)

Re: [Cdk-devel] [ cdk-Bugs-639455 ] CML bonder orders have wrong format

From: E.L. W. <eg...@sc...> - 2002-11-23 16:09:20

On Saturday 23 November 2002 14:45, Peter Murray-Rust wrote:
> >The current reader only parses CML1. The 1,S,2,D,3,T is part
> >of CML2. (right?)
>
> CML V1.0 implicitly defined the convention 1,S,2,D,3,T  but did not
> deliberately mandate it. The paper encouraged the use of the above
> convention but made it clear that other conventions were allowable (e.g.
> some systems have bond orders of 4, -5, etc.) 

It has been some time that I actually read the article... most of the times, I 
check the DTD... i.e. the explicit definition... but, I understand that, DTD 
is too limited to allow 1,S,etc... Ok, bug accepted ;)

> All the CML writers that I
> have encountered other than CDK have adopted the 1,S,2,D,3,T convention.
> This means that all CML files except those from CDK are interoperable.

Ok. CDK will be no different then... 

> If CDK wishes to have its own convention for bond orders it is welcome to
> do so, but they should be labelled as such. So you could write
> <string builtin="order" convention="CDK">1.5</string>
> and this would be acceptable. However you would need a CMLReader that
> understood CDK bond order conventions to read this and you would have to
> convince the other software writers that this was useful.
>
> The CML convention was designed to be extensible so that if you wished to
> have a bond order of (say) 2.5 you could write:
> <string builtin="order" >A</string>
> <...>
> <string builtin="order" convention="CDK">2.5</string>

The CDK output will preferable not use any non default convention. That's the 
idea... if it does not, then it actually is a bug in the CML writer...

> Most CMLReaders would understand the first. I don't know what CDK would do
> with it. Note that order is a <string> and not a <float>. This is
> deliberate To understand the second they would require to implement a CDK
> convention reader. I suspect most would default to "unknown bond order".
>
> >  This will be fixed when a CML2 reader is
> >written.
>
> CML2 does not mandate a controlled vocabulary for bond orders.

Ah, then it won't .... 

> It is important that CML Readers and Writers produce valid CML and it is
> important to work for interoperability. 

Agreed. That's the whole point of CML...

> CMLWriters are easier to create
> than CMLReaders especially if they have to deal with multiple conventions.

Ah, yes... kind of you to mention that... ;) the strenght of my CML reader is 
the flexibility in reading conventions... Programs can very easily write and 
add a handler for parsing a specific convention.

Egon

Re: [Cdk-devel] [ cdk-Bugs-642456 ] IChI reader does not map atoms

From: Peter Murray-R. <pm...@ca...> - 2002-11-23 13:52:28

At 11:59 22/11/2002 -0800, no...@so... wrote:
>Bugs item #642456, was opened at 2002-11-22 19:46
>You can respond by visiting:
>https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024
>
> >Category: org.openscience.cdk.io
>Group: None
>Status: Open
>Resolution: None
>Priority: 5
>Submitted By: Peter Murray-Rust (petermr)
> >Assigned to: Egon Willighagen (egonw)
>Summary: IChI reader does not map atoms
>
>Initial Comment:
>The IChI reader (V1.3) does not appear to read the
><atom.orig-nbr>
>element so that atoms in the final structure are garbled. This
>means that IChI does not output correct connection tables.
>
>
>
>----------------------------------------------------------------------
>
> >Comment By: Egon Willighagen (egonw)
>Date: 2002-11-22 20:59
>
>Message:
>Logged In: YES
>user_id=25678

First, I am using IChI V0.9beta. Is this what you have used?

>The documentation states that this informaion is auxiliary.
>As such it should not be necessary for the identifier
>itself.

agreed

>  Description of the <basic> element describes how
>atoms are numbered. The output by the IChI program even says
>"Auxiliary info is not a part of the identifier, it is not
>unique".

agreed

>In any case, v1.3 does indeed not read
><atom.orig-nbr>. Could you elaborate why this is a bug, i.e.
>explain why the garbling is actually caused by the missing
><atom.orig-nbr>?
We created a number of files using IChI where the dbonds did not map onto 
the given bonds and appeared to require the atom.orig-nbr. I agree that 
this shouldn't be necessary. Here is a file distributed with IChI which 
shows a dbond (8-7-) not originally in the bond list:
--------------------8<--------------
<IChI version="0.9Beta">

  <structure number="1" id.name="" id.value="">

   <identifier version="0.9Beta" tautomeric="0">
    <basic>C*6C1*16CC, 2-1 4-3 6-5 7-1 8-2 9-3 10-4 11-7 12-8 13-9 14-10 
15-11 16-12 17-13 18-14 19-15 20-16 21-17 22-18 23-5-19-21 24-6-20-22</basic>
    <charge></charge>
    <stereo>
     <dbond>8-7- 13-9- 14-10- 15-11+ 16-12+ 21-17+ 22-18+ 23-19+ 24-20+</dbond>
     <sp3></sp3>
    </stereo>
   </identifier>

   <identifier.auxiliary-info version="0.9Beta" tautomeric="0">
    <!-- Auxiliary info is not a part of the identifier, it is not unique -->
    <atom.orig-nbr>13 14 23 24 10 11 9 15 1 22 8 18 2 21 7 17 3 20 6 16 4 
19 5 12</atom.orig-nbr>
    <atom.equivalence>(1 2 3 4)(5 6)(7 8 9 10)(11 12 13 14)(15 16 17 18)(19 
20 21 22)(23 24)</atom.equivalence>
   </identifier.auxiliary-info>
  </structure>
</IChI>
--------------------8<--------------

I have a suspicion that this is a problem with IChIV0.9beta . It suggests 
that there needs to be more error checking in an IChIReader.

I do not have the problem files with me but should be able to send some 
samples on Monday. I suspect this may also be responsible for one of the 
other bugs I reported. It also emphasises the need to have test sets that 
are available so that we can agree on  what are bugs and whose 
responsibility it

P.


>----------------------------------------------------------------------
>
>You can respond by visiting:
>https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Cdk-devel mailing list
>Cdk...@li...
>https://lists.sourceforge.net/lists/listinfo/cdk-devel

Re: [Cdk-devel] CDK and JUMBO (Meeting between Christoph and PMR)

From: Peter Murray-R. <pm...@ca...> - 2002-11-23 13:51:20

At 11:27 22/11/2002 +0100, J=F6rg K. Wegner wrote:
>Hello,

<pmr>

>>I don't intend to write a SMARTS parser. Instead we are developing a
>>generic CML query language which encompasses the concepts in "most" of
>>the current chemical systems and generic structure representation.
>
>Until now i've often found SMARTS patterns in the literature for
>coding substucture patterns. I think there should be at least an=20
>additional SMARTS -> CML query converter. Of sure there could be some
>different opinions about standards, but many people use SMARTS ...
>i think that's a similar problem like:
>LaTeX equation <-> MathML

We are committed to producing legacy converters to CML wherever possible so=
=20
SMARTS2CQL would be a natural. But the problem with all the proprietary=20
methods are that they have opaque semantics. For example the concept of=20
aromatic atoms will differ between Daylight and CDK. That means that a=20
SMARTS concept will run differently on the two systems.

P.


>>Best
>>
>>P.
>
>Regards, Joerg
>
>>
>>
>>
>>-------------------------------------------------------
>>This sf.net email is sponsored by: Battle your brains against the best
>>in the Thawte Crypto Challenge. Be the first to crack the code -
>>register now: http://www.gothawte.com/rd521.html
>>_______________________________________________
>>Cdk-devel mailing list
>>Cdk...@li...
>>https://lists.sourceforge.net/lists/listinfo/cdk-devel
>
>
>--
>Dipl. Chem. Joerg K. Wegner
>Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany
>Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091
>E-Mail: mailto:we...@in...
>WWW:    http://www-ra.informatik.uni-tuebingen.de
>
>
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Cdk-devel mailing list
>Cdk...@li...
>https://lists.sourceforge.net/lists/listinfo/cdk-devel

Re: [Cdk-devel] [ cdk-Bugs-639455 ] CML bonder orders have wrong format

From: Peter Murray-R. <pm...@ca...> - 2002-11-23 13:50:57

At 12:03 22/11/2002 -0800, no...@so... wrote:
>Bugs item #639455, was opened at 2002-11-16 22:29
>You can respond by visiting:
>https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024
>
>Category: org.openscience.cdk.io
>Group: None
>Status: Open
> >Resolution: Postponed
>Priority: 5
>Submitted By: Peter Murray-Rust (petermr)
>Assigned to: Egon Willighagen (egonw)
>Summary: CML bonder orders have wrong format
>
>Initial Comment:
>The bond orders emitted as CML are floats (1.0, 1.5). This
>is inconsistent with the CML specification which uses
>strings 1,S,2,D,3,T and A. For anything else a convention
>attribute is required.
>
>
>
>----------------------------------------------------------------------
>
> >Comment By: Egon Willighagen (egonw)
>Date: 2002-11-22 21:03
>
>Message:
>Logged In: YES
>user_id=25678
>
>The current reader only parses CML1. The 1,S,2,D,3,T is part
>of CML2. (right?)

CML V1.0 implicitly defined the convention 1,S,2,D,3,T  but did not 
deliberately mandate it. The paper encouraged the use of the above 
convention but made it clear that other conventions were allowable (e.g. 
some systems have bond orders of 4, -5, etc.) All the CML writers that I 
have encountered other than CDK have adopted the 1,S,2,D,3,T convention. 
This means that all CML files except those from CDK are interoperable.

If CDK wishes to have its own convention for bond orders it is welcome to 
do so, but they should be labelled as such. So you could write
<string builtin="order" convention="CDK">1.5</string>
and this would be acceptable. However you would need a CMLReader that 
understood CDK bond order conventions to read this and you would have to 
convince the other software writers that this was useful.

The CML convention was designed to be extensible so that if you wished to 
have a bond order of (say) 2.5 you could write:
<string builtin="order" >A</string>
<...>
<string builtin="order" convention="CDK">2.5</string>

Most CMLReaders would understand the first. I don't know what CDK would do 
with it. Note that order is a <string> and not a <float>. This is deliberate
To understand the second they would require to implement a CDK convention 
reader. I suspect most would default to "unknown bond order".

>  This will be fixed when a CML2 reader is
>written.

CML2 does not mandate a controlled vocabulary for bond orders.

It is important that CML Readers and Writers produce valid CML and it is 
important to work for interoperability. CMLWriters are easier to create 
than CMLReaders especially if they have to deal with multiple conventions.

P.

>----------------------------------------------------------------------
>
>You can respond by visiting:
>https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Cdk-devel mailing list
>Cdk...@li...
>https://lists.sourceforge.net/lists/listinfo/cdk-devel

[Cdk-devel] [ cdk-Bugs-642365 ] MDLReader fails on certain files

From: <no...@so...> - 2002-11-22 20:54:43

Bugs item #642365, was opened at 2002-11-22 17:24
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024

Category: org.openscience.cdk.io
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: MDLReader fails on certain files

Initial Comment:
In reading the NCI diversity set (as SDF, converted to 
MOL), certain files return a null molecule from MDLReader. 
In org.openscience.cdk.applications.Viewer no molecule is 
displayed. The molecules all appear to contain Pt or Hg but 
I haven't found the cause. I enclose a typical MOL file - 
there are ca 10 others I can send.

---------------------------------8<-----------------
20410                                                                           
           8 49915 33D                        20410       
                                                                        
 50 53        1                 1 V2000
    0.0021   -0.0041    0.0020 Hg  0  0  0  0  0  0  0  0  0
   -0.0264    2.0957    0.0136 C   0  0  0  0  0  0  0  0  0
    0.0297   -2.0438   -0.0092 N   0  0  0  0  0  0  0  0  0
    1.4090    2.6253    0.0027 C   0  0  1  0  0  0  0  0  0
   -1.0583   -2.8742   -0.0033 C   0  0  0  0  0  0  0  0  0
    1.1256   -2.8340   -0.0298 C   0  0  0  0  0  0  0  0  0
    1.3882    4.1551    0.0111 C   0  0  0  0  0  0  0  0  0
    2.1001    2.1519    1.1604 O   0  0  0  0  0  0  0  0  0
   -0.5575   -4.1807   -0.0156 C   0  0  0  0  0  0  0  0  0
   -2.4506   -2.6291    0.0118 C   0  0  0  0  0  0  0  0  0
    0.7634   -4.0958   -0.0316 N   0  0  0  0  0  0  0  0  0
    2.7626    4.6622    0.0007 N   0  0  0  0  0  0  0  0  0
    1.3868    2.6433    2.2970 C   0  0  0  0  0  0  0  0  0
   -1.4205   -5.2676   -0.0134 N   0  0  0  0  0  0  0  0  0
   -2.9031   -1.4984    0.0222 O   0  0  0  0  0  0  0  0  0
   -3.2712   -3.7426    0.0134 N   0  0  0  0  0  0  0  0  0
    3.9098    3.9326   -0.0146 C   0  0  0  0  0  0  0  0  0
    3.2529    6.2616   -0.0022 S   0  0  0  0  0  0  0  0  0
   -0.9122   -6.6416   -0.0257 C   0  0  0  0  0  0  0  0  0
   -2.7650   -5.0077    0.0012 C   0  0  0  0  0  0  0  0  0
   -4.7243   -3.5567    0.0287 C   0  0  0  0  0  0  0  0  0
    3.8817    2.7174   -0.0211 O   0  0  0  0  0  0  0  0  0
    5.1848    4.6630   -0.0236 C   0  0  0  0  0  0  0  0  0
    2.8070    6.9314   -1.1734 O   0  0  0  0  0  0  0  0  0
    2.8258    6.9299    1.1769 O   0  0  0  0  0  0  0  0  0
    5.0125    6.0532   -0.0171 C   0  0  0  0  0  0  0  0  0
   -3.5395   -5.9451    0.0034 O   0  0  0  0  0  0  0  0  0
    6.5025    4.1741   -0.0389 C   0  0  0  0  0  0  0  0  0
    6.0727    6.9151   -0.0257 C   0  0  0  0  0  0  0  0  0
    7.5678    5.0456   -0.0446 C   0  0  0  0  0  0  0  0  0
    7.3666    6.4143   -0.0373 C   0  0  0  0  0  0  0  0  0
   -0.5536    2.4568   -0.8695 H   0  0  0  0  0  0  0  0  0
   -0.5366    2.4472    0.9104 H   0  0  0  0  0  0  0  0  0
    1.9192    2.2738   -0.8941 H   0  0  0  0  0  0  0  0  0
    2.1460   -2.4806   -0.0433 H   0  0  0  0  0  0  0  0  0
    0.8610    4.5162   -0.8719 H   0  0  0  0  0  0  0  0  0
    0.8780    4.5066    0.9079 H   0  0  0  0  0  0  0  0  0
    1.8795    2.3059    3.2089 H   0  0  0  0  0  0  0  0  0
    1.3723    3.7329    2.2736 H   0  0  0  0  0  0  0  0  0
    0.3645    2.2657    2.2754 H   0  0  0  0  0  0  0  0  0
    0.1776   -6.6257   -0.0359 H   0  0  0  0  0  0  0  0  0
   -1.2597   -7.1655    0.8647 H   0  0  0  0  0  0  0  0  0
   -1.2765   -7.1557   -0.9151 H   0  0  0  0  0  0  0  0  0
   -4.9543   -2.4913    0.0371 H   0  0  0  0  0  0  0  0  0
   -5.1586   -4.0153   -0.8597 H   0  0  0  0  0  0  0  0  0
   -5.1412   -4.0254    0.9202 H   0  0  0  0  0  0  0  0  0
    6.6801    3.1088   -0.0459 H   0  0  0  0  0  0  0  0  0
    5.9042    7.9819   -0.0238 H   0  0  0  0  0  0  0  0  0
    8.5748    4.6555   -0.0560 H   0  0  0  0  0  0  0  0  0
    8.2106    7.0881   -0.0445 H   0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  2  4  1  0  0  0  0
  2 32  1  0  0  0  0
  2 33  1  0  0  0  0
  3  5  1  0  0  0  0
  3  6  1  0  0  0  0
  4  8  1  1  0  0  0
  4  7  1  0  0  0  0
  4 34  1  6  0  0  0
  5  9  2  0  0  0  0
  5 10  1  0  0  0  0
  6 11  2  0  0  0  0
  6 35  1  0  0  0  0
  7 12  1  0  0  0  0
  7 36  1  0  0  0  0
  7 37  1  0  0  0  0
  8 13  1  0  0  0  0
  9 14  1  0  0  0  0
  9 11  1  0  0  0  0
 10 15  2  0  0  0  0
 10 16  1  0  0  0  0
 12 17  1  0  0  0  0
 12 18  1  0  0  0  0
 13 38  1  0  0  0  0
 13 39  1  0  0  0  0
 13 40  1  0  0  0  0
 14 19  1  0  0  0  0
 14 20  1  0  0  0  0
 16 21  1  0  0  0  0
 16 20  1  0  0  0  0
 17 22  2  0  0  0  0
 17 23  1  0  0  0  0
 18 24  2  0  0  0  0
 18 25  2  0  0  0  0
 18 26  1  0  0  0  0
 19 41  1  0  0  0  0
 19 42  1  0  0  0  0
 19 43  1  0  0  0  0
 20 27  2  0  0  0  0
 21 44  1  0  0  0  0
 21 45  1  0  0  0  0
 21 46  1  0  0  0  0
 23 28  2  0  0  0  0
 23 26  1  0  0  0  0
 26 29  2  0  0  0  0
 28 30  1  0  0  0  0
 28 47  1  0  0  0  0
 29 31  1  0  0  0  0
 29 48  1  0  0  0  0
 30 31  2  0  0  0  0
 30 49  1  0  0  0  0
 31 50  1  0  0  0  0
M  END
---------------------------------8<-----------------

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024

[Cdk-devel] [ cdk-Bugs-639455 ] CML bonder orders have wrong format

From: <no...@so...> - 2002-11-22 20:53:13

Bugs item #639455, was opened at 2002-11-16 22:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024

Category: org.openscience.cdk.io
Group: None
>Status: Pending
Resolution: Postponed
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Egon Willighagen (egonw)
Summary: CML bonder orders have wrong format

Initial Comment:
The bond orders emitted as CML are floats (1.0, 1.5). This 
is inconsistent with the CML specification which uses 
strings 1,S,2,D,3,T and A. For anything else a convention 
attribute is required.



----------------------------------------------------------------------

Comment By: Egon Willighagen (egonw)
Date: 2002-11-22 21:03

Message:
Logged In: YES 
user_id=25678

The current reader only parses CML1. The 1,S,2,D,3,T is part
of CML2. (right?) This will be fixed when a CML2 reader is
written.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024

[Cdk-devel] [ cdk-Bugs-642456 ] IChI reader does not map atoms

From: <no...@so...> - 2002-11-22 20:52:24

Bugs item #642456, was opened at 2002-11-22 19:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024

Category: org.openscience.cdk.io
Group: None
>Status: Pending
Resolution: None
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Egon Willighagen (egonw)
Summary: IChI reader does not map atoms

Initial Comment:
The IChI reader (V1.3) does not appear to read the
<atom.orig-nbr>
element so that atoms in the final structure are garbled. This 
means that IChI does not output correct connection tables.



----------------------------------------------------------------------

Comment By: Egon Willighagen (egonw)
Date: 2002-11-22 20:59

Message:
Logged In: YES 
user_id=25678

The documentation states that this informaion is auxiliary.
As such it should not be necessary for the identifier
itself. Description of the <basic> element describes how
atoms are numbered. The output by the IChI program even says
"Auxiliary info is not a part of the identifier, it is not
unique". In any case, v1.3 does indeed not read
<atom.orig-nbr>. Could you elaborate why this is a bug, i.e.
explain why the garbling is actually caused by the missing
<atom.orig-nbr>?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024

[Cdk-devel] [ cdk-Bugs-642429 ] IChIReader fails with multiple molecules

From: <no...@so...> - 2002-11-22 20:51:40

Bugs item #642429, was opened at 2002-11-22 18:58
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024

Category: org.openscience.cdk.io
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Egon Willighagen (egonw)
Summary: IChIReader fails with multiple molecules

Initial Comment:
IChIReader fails on strings such as 
<identifier version="0.9Beta" tautomeric="0">
   <basic>N2*4SSCCN1N1Cu, 7-5 8-6 9-7 10-8 11-1-2-3-4-
9-10;C1*7NCC, 2-1 4-3 5-1 6-3 7-2 8-4 9-5-6 10-7-8-
9;NS1C, 3-1-2</basic>
   <charge>+2;;</charge>
   
  </identifier>
where ";" is used to separate discrete molecules.
It also fails with ArrayIndexOutOfBoundsException (20) on 
large molecules:

Exception in thread "main" 
java.lang.ArrayIndexOutOfBoundsException: 20
        at org.openscience.cdk.AtomContainer.getAtomAt
(AtomContainer.java:241)
        at 
org.openscience.cdk.io.ichi.IChIHandler.analyseBondsEnco
ding(IChIHand
ler.java:269)
        at 
org.openscience.cdk.io.ichi.IChIHandler.endElement
(IChIHandler.java:1
24)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024

[Cdk-devel] [ cdk-Bugs-642429 ] IChIReader fails with multiple molecules

From: <no...@so...> - 2002-11-22 20:03:32

Bugs item #642429, was opened at 2002-11-22 18:58
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024

Category: org.openscience.cdk.io
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
>Assigned to: Egon Willighagen (egonw)
Summary: IChIReader fails with multiple molecules

Initial Comment:
IChIReader fails on strings such as 
<identifier version="0.9Beta" tautomeric="0">
   <basic>N2*4SSCCN1N1Cu, 7-5 8-6 9-7 10-8 11-1-2-3-4-
9-10;C1*7NCC, 2-1 4-3 5-1 6-3 7-2 8-4 9-5-6 10-7-8-
9;NS1C, 3-1-2</basic>
   <charge>+2;;</charge>
   
  </identifier>
where ";" is used to separate discrete molecules.
It also fails with ArrayIndexOutOfBoundsException (20) on 
large molecules:

Exception in thread "main" 
java.lang.ArrayIndexOutOfBoundsException: 20
        at org.openscience.cdk.AtomContainer.getAtomAt
(AtomContainer.java:241)
        at 
org.openscience.cdk.io.ichi.IChIHandler.analyseBondsEnco
ding(IChIHand
ler.java:269)
        at 
org.openscience.cdk.io.ichi.IChIHandler.endElement
(IChIHandler.java:1
24)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024

[Cdk-devel] [ cdk-Bugs-639455 ] CML bonder orders have wrong format

From: <no...@so...> - 2002-11-22 20:03:11

Bugs item #639455, was opened at 2002-11-16 22:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024

Category: org.openscience.cdk.io
Group: None
Status: Open
>Resolution: Postponed
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Egon Willighagen (egonw)
Summary: CML bonder orders have wrong format

Initial Comment:
The bond orders emitted as CML are floats (1.0, 1.5). This 
is inconsistent with the CML specification which uses 
strings 1,S,2,D,3,T and A. For anything else a convention 
attribute is required.



----------------------------------------------------------------------

>Comment By: Egon Willighagen (egonw)
Date: 2002-11-22 21:03

Message:
Logged In: YES 
user_id=25678

The current reader only parses CML1. The 1,S,2,D,3,T is part
of CML2. (right?) This will be fixed when a CML2 reader is
written.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=639455&group_id=20024

[Cdk-devel] [ cdk-Bugs-642456 ] IChI reader does not map atoms

From: <no...@so...> - 2002-11-22 19:59:35

Bugs item #642456, was opened at 2002-11-22 19:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024

>Category: org.openscience.cdk.io
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
>Assigned to: Egon Willighagen (egonw)
Summary: IChI reader does not map atoms

Initial Comment:
The IChI reader (V1.3) does not appear to read the
<atom.orig-nbr>
element so that atoms in the final structure are garbled. This 
means that IChI does not output correct connection tables.



----------------------------------------------------------------------

>Comment By: Egon Willighagen (egonw)
Date: 2002-11-22 20:59

Message:
Logged In: YES 
user_id=25678

The documentation states that this informaion is auxiliary.
As such it should not be necessary for the identifier
itself. Description of the <basic> element describes how
atoms are numbered. The output by the IChI program even says
"Auxiliary info is not a part of the identifier, it is not
unique". In any case, v1.3 does indeed not read
<atom.orig-nbr>. Could you elaborate why this is a bug, i.e.
explain why the garbling is actually caused by the missing
<atom.orig-nbr>?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024

[Cdk-devel] [ cdk-Bugs-642456 ] IChI reader does not map atoms

From: <no...@so...> - 2002-11-22 18:46:14

Bugs item #642456, was opened at 2002-11-22 18:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Nobody/Anonymous (nobody)
Summary: IChI reader does not map atoms

Initial Comment:
The IChI reader (V1.3) does not appear to read the
<atom.orig-nbr>
element so that atoms in the final structure are garbled. This 
means that IChI does not output correct connection tables.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642456&group_id=20024

[Cdk-devel] [ cdk-Bugs-642429 ] IChIReader fails with multiple molecules

From: <no...@so...> - 2002-11-22 17:58:31

Bugs item #642429, was opened at 2002-11-22 17:58
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024

Category: org.openscience.cdk.io
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Nobody/Anonymous (nobody)
Summary: IChIReader fails with multiple molecules

Initial Comment:
IChIReader fails on strings such as 
<identifier version="0.9Beta" tautomeric="0">
   <basic>N2*4SSCCN1N1Cu, 7-5 8-6 9-7 10-8 11-1-2-3-4-
9-10;C1*7NCC, 2-1 4-3 5-1 6-3 7-2 8-4 9-5-6 10-7-8-
9;NS1C, 3-1-2</basic>
   <charge>+2;;</charge>
   
  </identifier>
where ";" is used to separate discrete molecules.
It also fails with ArrayIndexOutOfBoundsException (20) on 
large molecules:

Exception in thread "main" 
java.lang.ArrayIndexOutOfBoundsException: 20
        at org.openscience.cdk.AtomContainer.getAtomAt
(AtomContainer.java:241)
        at 
org.openscience.cdk.io.ichi.IChIHandler.analyseBondsEnco
ding(IChIHand
ler.java:269)
        at 
org.openscience.cdk.io.ichi.IChIHandler.endElement
(IChIHandler.java:1
24)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642429&group_id=20024

[Cdk-devel] [ cdk-Bugs-642426 ] MDLReader fails on certain files

From: <no...@so...> - 2002-11-22 17:55:33

Bugs item #642426, was opened at 2002-11-22 17:55
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642426&group_id=20024

Category: org.openscience.cdk.io
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Peter Murray-Rust (petermr)
Assigned to: Nobody/Anonymous (nobody)
Summary: MDLReader fails on certain files

Initial Comment:
In reading the NCI diversity set (as SDF, converted to 
MOL), certain files return a null molecule from MDLReader. 
In org.openscience.cdk.applications.Viewer no molecule is 
displayed. The molecules all appear to contain Pt or Hg but 
I haven't found the cause. I enclose a typical MOL file - 
there are ca 10 others I can send.

---------------------------------8<-----------------
20410                                                                           
           8 49915 33D                        20410       
                                                                        
 50 53        1                 1 V2000
    0.0021   -0.0041    0.0020 Hg  0  0  0  0  0  0  0  0  0
   -0.0264    2.0957    0.0136 C   0  0  0  0  0  0  0  0  0
    0.0297   -2.0438   -0.0092 N   0  0  0  0  0  0  0  0  0
    1.4090    2.6253    0.0027 C   0  0  1  0  0  0  0  0  0
   -1.0583   -2.8742   -0.0033 C   0  0  0  0  0  0  0  0  0
    1.1256   -2.8340   -0.0298 C   0  0  0  0  0  0  0  0  0
    1.3882    4.1551    0.0111 C   0  0  0  0  0  0  0  0  0
    2.1001    2.1519    1.1604 O   0  0  0  0  0  0  0  0  0
   -0.5575   -4.1807   -0.0156 C   0  0  0  0  0  0  0  0  0
   -2.4506   -2.6291    0.0118 C   0  0  0  0  0  0  0  0  0
    0.7634   -4.0958   -0.0316 N   0  0  0  0  0  0  0  0  0
    2.7626    4.6622    0.0007 N   0  0  0  0  0  0  0  0  0
    1.3868    2.6433    2.2970 C   0  0  0  0  0  0  0  0  0
   -1.4205   -5.2676   -0.0134 N   0  0  0  0  0  0  0  0  0
   -2.9031   -1.4984    0.0222 O   0  0  0  0  0  0  0  0  0
   -3.2712   -3.7426    0.0134 N   0  0  0  0  0  0  0  0  0
    3.9098    3.9326   -0.0146 C   0  0  0  0  0  0  0  0  0
    3.2529    6.2616   -0.0022 S   0  0  0  0  0  0  0  0  0
   -0.9122   -6.6416   -0.0257 C   0  0  0  0  0  0  0  0  0
   -2.7650   -5.0077    0.0012 C   0  0  0  0  0  0  0  0  0
   -4.7243   -3.5567    0.0287 C   0  0  0  0  0  0  0  0  0
    3.8817    2.7174   -0.0211 O   0  0  0  0  0  0  0  0  0
    5.1848    4.6630   -0.0236 C   0  0  0  0  0  0  0  0  0
    2.8070    6.9314   -1.1734 O   0  0  0  0  0  0  0  0  0
    2.8258    6.9299    1.1769 O   0  0  0  0  0  0  0  0  0
    5.0125    6.0532   -0.0171 C   0  0  0  0  0  0  0  0  0
   -3.5395   -5.9451    0.0034 O   0  0  0  0  0  0  0  0  0
    6.5025    4.1741   -0.0389 C   0  0  0  0  0  0  0  0  0
    6.0727    6.9151   -0.0257 C   0  0  0  0  0  0  0  0  0
    7.5678    5.0456   -0.0446 C   0  0  0  0  0  0  0  0  0
    7.3666    6.4143   -0.0373 C   0  0  0  0  0  0  0  0  0
   -0.5536    2.4568   -0.8695 H   0  0  0  0  0  0  0  0  0
   -0.5366    2.4472    0.9104 H   0  0  0  0  0  0  0  0  0
    1.9192    2.2738   -0.8941 H   0  0  0  0  0  0  0  0  0
    2.1460   -2.4806   -0.0433 H   0  0  0  0  0  0  0  0  0
    0.8610    4.5162   -0.8719 H   0  0  0  0  0  0  0  0  0
    0.8780    4.5066    0.9079 H   0  0  0  0  0  0  0  0  0
    1.8795    2.3059    3.2089 H   0  0  0  0  0  0  0  0  0
    1.3723    3.7329    2.2736 H   0  0  0  0  0  0  0  0  0
    0.3645    2.2657    2.2754 H   0  0  0  0  0  0  0  0  0
    0.1776   -6.6257   -0.0359 H   0  0  0  0  0  0  0  0  0
   -1.2597   -7.1655    0.8647 H   0  0  0  0  0  0  0  0  0
   -1.2765   -7.1557   -0.9151 H   0  0  0  0  0  0  0  0  0
   -4.9543   -2.4913    0.0371 H   0  0  0  0  0  0  0  0  0
   -5.1586   -4.0153   -0.8597 H   0  0  0  0  0  0  0  0  0
   -5.1412   -4.0254    0.9202 H   0  0  0  0  0  0  0  0  0
    6.6801    3.1088   -0.0459 H   0  0  0  0  0  0  0  0  0
    5.9042    7.9819   -0.0238 H   0  0  0  0  0  0  0  0  0
    8.5748    4.6555   -0.0560 H   0  0  0  0  0  0  0  0  0
    8.2106    7.0881   -0.0445 H   0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  2  4  1  0  0  0  0
  2 32  1  0  0  0  0
  2 33  1  0  0  0  0
  3  5  1  0  0  0  0
  3  6  1  0  0  0  0
  4  8  1  1  0  0  0
  4  7  1  0  0  0  0
  4 34  1  6  0  0  0
  5  9  2  0  0  0  0
  5 10  1  0  0  0  0
  6 11  2  0  0  0  0
  6 35  1  0  0  0  0
  7 12  1  0  0  0  0
  7 36  1  0  0  0  0
  7 37  1  0  0  0  0
  8 13  1  0  0  0  0
  9 14  1  0  0  0  0
  9 11  1  0  0  0  0
 10 15  2  0  0  0  0
 10 16  1  0  0  0  0
 12 17  1  0  0  0  0
 12 18  1  0  0  0  0
 13 38  1  0  0  0  0
 13 39  1  0  0  0  0
 13 40  1  0  0  0  0
 14 19  1  0  0  0  0
 14 20  1  0  0  0  0
 16 21  1  0  0  0  0
 16 20  1  0  0  0  0
 17 22  2  0  0  0  0
 17 23  1  0  0  0  0
 18 24  2  0  0  0  0
 18 25  2  0  0  0  0
 18 26  1  0  0  0  0
 19 41  1  0  0  0  0
 19 42  1  0  0  0  0
 19 43  1  0  0  0  0
 20 27  2  0  0  0  0
 21 44  1  0  0  0  0
 21 45  1  0  0  0  0
 21 46  1  0  0  0  0
 23 28  2  0  0  0  0
 23 26  1  0  0  0  0
 26 29  2  0  0  0  0
 28 30  1  0  0  0  0
 28 47  1  0  0  0  0
 29 31  1  0  0  0  0
 29 48  1  0  0  0  0
 30 31  2  0  0  0  0
 30 49  1  0  0  0  0
 31 50  1  0  0  0  0
M  END
---------------------------------8<-----------------

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642426&group_id=20024

[Cdk-devel] [ cdk-Bugs-642365 ] MDLReader fails on certain files

From: <no...@so...> - 2002-11-22 16:24:29

Bugs item #642365, was opened at 2002-11-22 08:24
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024

Category: org.openscience.cdk.io
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: MDLReader fails on certain files

Initial Comment:
In reading the NCI diversity set (as SDF, converted to 
MOL), certain files return a null molecule from MDLReader. 
In org.openscience.cdk.applications.Viewer no molecule is 
displayed. The molecules all appear to contain Pt or Hg but 
I haven't found the cause. I enclose a typical MOL file - 
there are ca 10 others I can send.

---------------------------------8<-----------------
20410                                                                           
           8 49915 33D                        20410       
                                                                        
 50 53        1                 1 V2000
    0.0021   -0.0041    0.0020 Hg  0  0  0  0  0  0  0  0  0
   -0.0264    2.0957    0.0136 C   0  0  0  0  0  0  0  0  0
    0.0297   -2.0438   -0.0092 N   0  0  0  0  0  0  0  0  0
    1.4090    2.6253    0.0027 C   0  0  1  0  0  0  0  0  0
   -1.0583   -2.8742   -0.0033 C   0  0  0  0  0  0  0  0  0
    1.1256   -2.8340   -0.0298 C   0  0  0  0  0  0  0  0  0
    1.3882    4.1551    0.0111 C   0  0  0  0  0  0  0  0  0
    2.1001    2.1519    1.1604 O   0  0  0  0  0  0  0  0  0
   -0.5575   -4.1807   -0.0156 C   0  0  0  0  0  0  0  0  0
   -2.4506   -2.6291    0.0118 C   0  0  0  0  0  0  0  0  0
    0.7634   -4.0958   -0.0316 N   0  0  0  0  0  0  0  0  0
    2.7626    4.6622    0.0007 N   0  0  0  0  0  0  0  0  0
    1.3868    2.6433    2.2970 C   0  0  0  0  0  0  0  0  0
   -1.4205   -5.2676   -0.0134 N   0  0  0  0  0  0  0  0  0
   -2.9031   -1.4984    0.0222 O   0  0  0  0  0  0  0  0  0
   -3.2712   -3.7426    0.0134 N   0  0  0  0  0  0  0  0  0
    3.9098    3.9326   -0.0146 C   0  0  0  0  0  0  0  0  0
    3.2529    6.2616   -0.0022 S   0  0  0  0  0  0  0  0  0
   -0.9122   -6.6416   -0.0257 C   0  0  0  0  0  0  0  0  0
   -2.7650   -5.0077    0.0012 C   0  0  0  0  0  0  0  0  0
   -4.7243   -3.5567    0.0287 C   0  0  0  0  0  0  0  0  0
    3.8817    2.7174   -0.0211 O   0  0  0  0  0  0  0  0  0
    5.1848    4.6630   -0.0236 C   0  0  0  0  0  0  0  0  0
    2.8070    6.9314   -1.1734 O   0  0  0  0  0  0  0  0  0
    2.8258    6.9299    1.1769 O   0  0  0  0  0  0  0  0  0
    5.0125    6.0532   -0.0171 C   0  0  0  0  0  0  0  0  0
   -3.5395   -5.9451    0.0034 O   0  0  0  0  0  0  0  0  0
    6.5025    4.1741   -0.0389 C   0  0  0  0  0  0  0  0  0
    6.0727    6.9151   -0.0257 C   0  0  0  0  0  0  0  0  0
    7.5678    5.0456   -0.0446 C   0  0  0  0  0  0  0  0  0
    7.3666    6.4143   -0.0373 C   0  0  0  0  0  0  0  0  0
   -0.5536    2.4568   -0.8695 H   0  0  0  0  0  0  0  0  0
   -0.5366    2.4472    0.9104 H   0  0  0  0  0  0  0  0  0
    1.9192    2.2738   -0.8941 H   0  0  0  0  0  0  0  0  0
    2.1460   -2.4806   -0.0433 H   0  0  0  0  0  0  0  0  0
    0.8610    4.5162   -0.8719 H   0  0  0  0  0  0  0  0  0
    0.8780    4.5066    0.9079 H   0  0  0  0  0  0  0  0  0
    1.8795    2.3059    3.2089 H   0  0  0  0  0  0  0  0  0
    1.3723    3.7329    2.2736 H   0  0  0  0  0  0  0  0  0
    0.3645    2.2657    2.2754 H   0  0  0  0  0  0  0  0  0
    0.1776   -6.6257   -0.0359 H   0  0  0  0  0  0  0  0  0
   -1.2597   -7.1655    0.8647 H   0  0  0  0  0  0  0  0  0
   -1.2765   -7.1557   -0.9151 H   0  0  0  0  0  0  0  0  0
   -4.9543   -2.4913    0.0371 H   0  0  0  0  0  0  0  0  0
   -5.1586   -4.0153   -0.8597 H   0  0  0  0  0  0  0  0  0
   -5.1412   -4.0254    0.9202 H   0  0  0  0  0  0  0  0  0
    6.6801    3.1088   -0.0459 H   0  0  0  0  0  0  0  0  0
    5.9042    7.9819   -0.0238 H   0  0  0  0  0  0  0  0  0
    8.5748    4.6555   -0.0560 H   0  0  0  0  0  0  0  0  0
    8.2106    7.0881   -0.0445 H   0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  2  4  1  0  0  0  0
  2 32  1  0  0  0  0
  2 33  1  0  0  0  0
  3  5  1  0  0  0  0
  3  6  1  0  0  0  0
  4  8  1  1  0  0  0
  4  7  1  0  0  0  0
  4 34  1  6  0  0  0
  5  9  2  0  0  0  0
  5 10  1  0  0  0  0
  6 11  2  0  0  0  0
  6 35  1  0  0  0  0
  7 12  1  0  0  0  0
  7 36  1  0  0  0  0
  7 37  1  0  0  0  0
  8 13  1  0  0  0  0
  9 14  1  0  0  0  0
  9 11  1  0  0  0  0
 10 15  2  0  0  0  0
 10 16  1  0  0  0  0
 12 17  1  0  0  0  0
 12 18  1  0  0  0  0
 13 38  1  0  0  0  0
 13 39  1  0  0  0  0
 13 40  1  0  0  0  0
 14 19  1  0  0  0  0
 14 20  1  0  0  0  0
 16 21  1  0  0  0  0
 16 20  1  0  0  0  0
 17 22  2  0  0  0  0
 17 23  1  0  0  0  0
 18 24  2  0  0  0  0
 18 25  2  0  0  0  0
 18 26  1  0  0  0  0
 19 41  1  0  0  0  0
 19 42  1  0  0  0  0
 19 43  1  0  0  0  0
 20 27  2  0  0  0  0
 21 44  1  0  0  0  0
 21 45  1  0  0  0  0
 21 46  1  0  0  0  0
 23 28  2  0  0  0  0
 23 26  1  0  0  0  0
 26 29  2  0  0  0  0
 28 30  1  0  0  0  0
 28 47  1  0  0  0  0
 29 31  1  0  0  0  0
 29 48  1  0  0  0  0
 30 31  2  0  0  0  0
 30 49  1  0  0  0  0
 31 50  1  0  0  0  0
M  END
---------------------------------8<-----------------

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=642365&group_id=20024

Re: [Cdk-devel] CDK and JUMBO (Meeting between Christoph and PMR)

From: <we...@in...> - 2002-11-22 10:30:24

Hello,

> I am a supporter of SAX - see last message to this list. And see
> http://www.megginson.com  But it is best suited for individual
> applications rather than libraries. SAX is designed to discard
> unnecessary elements and structures and I don't think a generic library
> can easily make those decisions for all users.

Is that link correct ?  (or do i have net problems ?)

> > If you plan to develop a SMARTS parser for this task i would recommend
> > techniques like JavaCC or other JavaCompilerCompiler tools. But here a
> > good computer scientist will be needed for coding BNF norm ...
> > or somebody with much time !;-)
>
>
> I don't intend to write a SMARTS parser. Instead we are developing a
> generic CML query language which encompasses the concepts in "most" of
> the current chemical systems and generic structure representation.

Until now i've often found SMARTS patterns in the literature for
coding substucture patterns. I think there should be at least an 
additional SMARTS -> CML query converter. Of sure there could be some
different opinions about standards, but many people use SMARTS ...
i think that's a similar problem like:
LaTeX equation <-> MathML

>
> Best
>
> P.

Regards, Joerg

>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: Battle your brains against the best
> in the Thawte Crypto Challenge. Be the first to crack the code -
> register now: http://www.gothawte.com/rd521.html
> _______________________________________________
> Cdk-devel mailing list
> Cdk...@li...
> https://lists.sourceforge.net/lists/listinfo/cdk-devel
>


-- 
Dipl. Chem. Joerg K. Wegner
Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany
Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de

Re: [Cdk-devel] Pseudo atom concept

From: Peter Murray-R. <pm...@ca...> - 2002-11-21 20:57:24

The following uses of "PseudoAtoms" may be relevant:

1. a "sparkle" in a molecular orbitals calculation

2. a list of atoms (Hl = halogen). This could be because the experimental 
evidence wasn't clear

3. a dummy atom (usually a point in space, e.g. a centroid)

4. an abbreviation for a precise connection table fragment (Et = CH2CH3)

5. an locant in a Markush structure (R)

There may be others.

My guess is that you wish to use 4 and 5.  For 4 you need to think 
carefully about how the R group is defined - IMO it must be a table 
external to the program. You would need a language to define the fragment. 
Do you cover multivalent PseudoAtoms like (say) P for -OP(=O)(-O)O- ?

5 needs careful thinking. How will it be used in CDK? [It will form part of 
CML Query]

P.

[Cdk-devel] Pseudo atom concept

From: Christoph S. <ste...@ic...> - 2002-11-21 13:44:07

Hi there,

an interesting question regarding our current RFC #8 (pseudo atoms), was 
brought up by L. Shymal:

---start of quote---
Was just looking at RFC 8 on the CDK site. Does your definition of
PseudoAtom mean that it is bonded to other entities via only one bond or 
can they connect via multiple bonds(can valency violations be detected 
?) and can they be realized into a real fragment and collapsed back into 
PseudoAtom form ?
Can one have overlapping PseudoAtoms in a molecule ?

The concept is definitely useful, my kludge deals with this using just a
collection of atoms. I was wondering if i could do anything useful with 
this concept.
---end of quote---

These are certainly valuable and valid questions which we need to discuss.

 From my re-reading of Egons proposal in rfc #8 I would say that Pseudo 
atoms are a simpler construct than the concept that L. Shymal implies in 
his questions.

Framents like those addressed in the questions are of high interest in 
my own area of research dealing with deterministic and stochastic 
structure generators. There, you can frequently detect a number of 
fragments from, say, spectroscopic sources, which may of course overlap.
These fragments are then used in order to build chemically valid 
structures by putting them together in a combinatorial way.

So, I would say that this aspect should be dealt with by using the 
regular fragment class and by having specialized classes handle the 
complicated issues of detecting overlap and synthesizing molecules based 
on this knowledge.
This also complies with our policy to keep the core classes as simple as 
possible and have factories or engines, or what ever you'd like to call 
them, for the higher functionalities.
Of course, there could and should be a mechanism of converting fragments 
into pseudo atoms and vice versa.

Any further comments?

Cheers,

Chris
-- 
Dr. Christoph Steinbeck (http://www.ice.mpg.de/departments/ChemInf)
MPI of Chemical Ecology, Winzerlaer Str. 10, Beutenberg Campus, 07745 
Jena, Germany
Tel: +49(0)3641 571263 - Fax: +49(0)3641 571202

What is man but that lofty spirit - that sense of enterprise.
... Kirk, "I, Mudd," stardate 4513.3..

Re: [Cdk-devel] CDK and JUMBO (Meeting between Christoph and PMR)

From: Peter Murray-R. <pm...@ca...> - 2002-11-20 22:44:11

At 14:11 19/11/2002 +0100, E.L. Willighagen wrote:
>Hi Peter,

HI - I have snipped some of this to keep it shorter

> > My suggestions for library development are:
> > - keep it as small as is necessary. If a library is too large it loses
> > coherence, and becomes unnavigable. I have found this in parts of CDK
> > lib.  I discussed this with Christoph and he agreed that there were modules
> > which weren't mainstream to the purpose of CDK. The library would benefit
> > from being smaller.
>
>Agreed. The CDK project is solving this by defining modules. At this moment
>only a few modules are defined. Most importantly, the core module which
>consists of the CDK storage classes. By use of modularization, the library
>benefits from consisting of small and easier to grasp modules.

Good. I would find it useful to have a core  module (which had been used by 
people other than the author OR in a test harness) and a development area. 
Otherwise people do not know what they can use in their applications 
safely. A library has to be something which is integrated and then 
forgotten - if you keep worrying about whether the library does what you 
want it inhibits program design.

>Modules that I planned to add are:
>- file IO
>- structure generation
>- rendering
>etc

><snip/>

> > - do not introduce anything until there is a test harness and it has been
> > tested. The natural assumption of a library is that it is fit for use. This
> > is probably more important than for an application, since the application
> > author is used to using libraries and does not expect to have to debug
> > them. Of course there are and will be bugs, but a test harness will be an
> > enormous help.
>
>I tend to disagree with this... i.e. considering this:
>
>1. It has not happened yet, but I've been working on a stable release of the
>core module. The biggest issue at this moment is the lack of documentation.
>See the archive for more info on this.
>2. I like to distinguish between the stable release and the development
>release. The latter is unstable: things like API changes can and will happen.
>The development release is kind of a prototype release.

I would agree on this distinction. My point is that the core is clearly 
identified as such and that it is not only stable but relatively fit for 
purpose. Otherwise a potential user (my main interest in CDK at the moment) 
will not find it worthwhile to spend the time to find what works and what 
doesn't

>Like many open source software projects with a developers communities with
>different backgrounds, prototyping and prove-of-concept implementations speed
>up the development of the project. I would not feel much for loosing this
>great feature of such development.

I agree with this. However there is a difference between applications and 
toolkits/libraries. Libraries have to have releases which are more stable IMO.

>As such, I welcome unharnessed/new/exciting new implementations in the
>development release. However, and that's what important in your comment, the
>stable release is different. In that release only proven concepts should be
>used, that are well tested and have proper documentation.
>
> > - try to avoid "partial implementations". These can also be very
> > frustrating for the user since they often don't show up until the library
> > has been well integrated into the project. Typical examples are fileReaders
> > which include "just the bits that the original author needed" (I have
> > mentioned examples) and important items missing from the data structure.
> > For example CDK seems to have very little support for charged molecules
> > (see below)
>
>More or less the same as above. But with the note that I agree that a full
>implementation is favored. But, as you know, the chemical software
>development community is not that large, and for those who do development, it
>is often not even their core business...

Fundamentally we agree. We all get tired at 0200 when we know we have to 
add one more error trapping routine, etc. But the user doesn't know what we 
have left out.

I spent many frustrating nights with the early Swing versions. I couldn't 
get some of the routines (e.g. text editing) to work. I then talked to a 
SUN developer who said that several of the routines were basically no-oops 
and the (licensed) developers were expected to implement them. But it 
destroyed my faith in Swing and I have only now felt that I can use it with 
any reliability. Same for Hava3D, Java2D, etc. We don't want CDK (or any 
other library) to get a reputation for bugginess

>Thus, if a partial implementation already solves a often encountered problem,
>this is more than interesting too add... As such, the MDLReader only has
>limited features yet. Uptil now, mostly used from MDL files were atoms, bonds
>and coordinates...

The problem is that it becomes non-standard wrt MDL's description. We 
already see hundreds of partial MDL readers and PDB readers. Many are 
seriously broken. I don't think a library should produce partial 
implementations which is why I have worked hard on these. (There are parts 
of MDL which cannot be interpreted outside MDL software but apart from 
these I think they should be honoured). It is because CDK does not read the 
2D/3D flag (which is part of MDL's spec) that it can garble 2D and 3D 
coordinates badly and the CML it produces is unusable.

>Moreover, if a user needs additional features he can request them (they are
>often no difficult to add), or implement them theirselves... And this is also
>a common and proven mechanism in open source development.

Yes, but it depends on being confident that there is a willing and 
responsive development community which can respond in reasonable time. In 
Linux there were hundreds of developers - CDK has 10, many other projects 
have fewer.

>Missing features, or a list of supported features, should, however, always be
>mentioned in the documentation. And this is currently, unfortunately, not
>really the case. But that is a different problem, IMO.

It is related. Partial implementations should always be documented and in 
some cases that is acceptable.

> > - make non-exposed classes "private" to avoid them appearing in the 
> Javadoc
> > where they confuse the user

<snip/>

> > - make sure that all code is used by someone other than the author. This
> > isn't easy in a small community, but I suggest it for new routines. Or put
> > routines "on probation" until someone else uses them and adds some
> > comments. Like Amazon book reviews.
>
>This is a very interesting comment. Any idea on how to practically do this? I
>do not feel much for manually checking such things... there might be programs
>to do such statistics...

I imagine the sourceforge tools could be used for this - specific modules 
could be assigned to particular developers who could add comments. But I 
still argue that there should be at least one call to each module.

> > > I think it would be useful to choose  those areas which CDK does well and
> > > which are used by someone other than the author. My own selection would
> > be:
> > - maths, geometry
> >   (This is independent of the molecular data structure)
> > - molecular representation and data structure, molecular perception
> > (aromaticity...),
> > - topological analysis, fingerprints, graphs, substructure searching
> > - layout
> > - rendering, interfaces with graphical systems, events
> > (This is dependent on the molecular data structure)
> >
> > At present I don't see how the data structure supports:
> > - formal charge (although it deals with partial charges - which are fuzzy
> > because their origin is not defined). This is a serious limitation - we
> > can't read the NCI data set into CDK as it has charges.
>
>See CDK RFC #6:
>http://cdk.sourceforge.net/rfc6.html

Thanks. Many readers absolutely rely on formal charges for atoms and some 
(e.g. IChI) also have formal charges on molecules. IMO this is more 
important than formal charge which is difficult to define without a dictionary.

> > It is agreed that it doesn't yet support
> > - tautomerism
> > - stereochemistry
>
> > - extract all the information and offer it through a reader-specific
> > interface. This is much more work but is the formally correct way of
> > solving the problem. Thus a PDBReader should incorporate the BIB, CRYST
> > SEQR, HET, CONECT etc as well as just the ATOM records.
>
>Yes, but this might also be something the user does not want... If only
>interested in the coordinates, I do not want to read the other 5Mb of data on
>that molecule... (in the extremen case ;)

A filter mechanism has to be developed. If CDK declares that it is 
definitely not aimed at crystals, sequences, molecular formulae, connection 
tables and that only atom coordinates are important I would accept that 
this is consistent. But it isn't consistent to read crystal parameters from 
one file and not from another.

>I've been thinking about a similar problem for Jmol (*)... in Jaguar output
>files, the first frame contains the structures that was taken as input by the
>Jaguar program... in some cases the user want to read that, on other cases he
>does not (e.g. when the other frame define an animation)...
>
>Therefore, I want to add a customization layer to the file IO, where the user
>can define which information it wants to have read, and which not...

The most general architecture is the DOM. Another is XSLT. SAX is fine IF 
you know exactly what data structure you are supporting, or you lose 
structure and information.

> > The problem with each program or toolkit writing its own file readers is
> > that it multiplies the pieces of code that have to be maintained. The 
> point

<snip/>

>However, I do not feel much for dropping SAX support whatsoever....
>at least one person, Joerg (of JOELib), needs a SAX based interface to CML
>files... he has very, very large files which cannot be read with DOM...

I am a supporter of SAX - after all I got it started on XML-DEV! And my 
CMLDOM is based on SAX (at present). But SAX is designed to make it easy to 
discard information and we have to consider very carefully what information 
we want to keep in CDK.

>Egon
>
>(*) The proper casing of chars for Jmol is: upper case J, lower case m-o-l.

Noted

P.

>-------------------------------------------------------
>This sf.net email is sponsored by: To learn the basics of securing
>your web site with SSL, click here to get a FREE TRIAL of a Thawte
>Server Certificate: http://www.gothawte.com/rd524.html
>_______________________________________________
>Cdk-devel mailing list
>Cdk...@li...
>https://lists.sourceforge.net/lists/listinfo/cdk-devel

Re: [Cdk-devel] CDK and JUMBO (Meeting between Christoph and PMR)

From: Peter Murray-R. <pm...@ca...> - 2002-11-20 21:51:30

At 14:52 19/11/2002 +0100, J=F6rg K. Wegner wrote:
>Hi all,
>
>>
>>However, I do not feel much for dropping SAX support whatsoever....
>>at least one person, Joerg (of JOELib), needs a SAX based interface to CML
>>files... he has very, very large files which cannot be read with DOM...
>
>I think some other persons with BIG files (every company in our SOL=20
>project and the Gasteiger group) will need SAX, too ... if they plan to=20
>use CML ...
>also it's necessary to access molecules one after another and not the
>complete stream at once. We talk about files with 200.000 up to 2 millions=
=20
>molecules.

I am a supporter of SAX - see last message to this list. And see=20
http://www.megginson.com  But it is best suited for individual applications=
=20
rather than libraries. SAX is designed to discard unnecessary elements and=
=20
structures and I don't think a generic library can easily make those=20
decisions for all users.

>> >>>- the atomType seems fragile. I have tried to use
>> >>>SaturationChecker.saturateWithHydrogen() and this throws a number of
>> >>>nullPEs which appear to be because the AtomTypeFactory doesn't return
>
>That's my opinion, too.
>
>If you plan to develop a SMARTS parser for this task i would recommend
>techniques like JavaCC or other JavaCompilerCompiler tools. But here a=20
>good computer scientist will be needed for coding BNF norm ...
>or somebody with much time !;-)

I don't intend to write a SMARTS parser. Instead we are developing a=20
generic CML query language which encompasses the concepts in "most" of the=
=20
current chemical systems and generic structure representation.

Best

P.

[Cdk-devel] [ cdk-Bugs-640750 ] Inefficient ring search?

From: <no...@so...> - 2002-11-19 15:46:29

Bugs item #640750, was opened at 2002-11-19 16:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640750&group_id=20024

Category: org.openscience.cdk.ringsearch
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Christoph Steinbeck (steinbeck)
Assigned to: Christoph Steinbeck (steinbeck)
Summary: Inefficient ring search?

Initial Comment:
Luo Cao wrote:
As to the algorithm SSSR(function findSSSR() in the
SSSRFinder.java), I think it can be more efficent. In
the function,As one ring is fould, then break the bonds
. After that, I think you should compute the minimum
number of rings again. If ring exists,then continue; if
not,break.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640750&group_id=20024

[Cdk-devel] [ cdk-Bugs-640748 ] SDG: Inconsistencies in placeRing

From: <no...@so...> - 2002-11-19 15:44:00

Bugs item #640748, was opened at 2002-11-19 16:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640748&group_id=20024

Category: org.openscience.cdk.layout
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Christoph Steinbeck (steinbeck)
Assigned to: Christoph Steinbeck (steinbeck)
Summary: SDG: Inconsistencies in placeRing

Initial Comment:
luo cao wrote:
 When I read the CDK source, I found someting I could
not understand. It is located in the
StructureDiagramGenerator.java, in the function 
layoutRingSet().
The code is :
 ....
int thisRing;
Ring ring = rs.getMostComplexRing();    sharedAtoms = 
placeFirstBond(ring.getBondAt(0),firstBondVector);    
...

In the function placeFirstBond(Bond bond, Vector2d
bondVector):
  ...
      sharedAtoms = new AtomContainer();
      sharedAtoms.addBond(bond);
      sharedAtoms.addAtom(bond.getAtomAt(0));
       sharedAtoms.addAtom(bond.getAtomAt(1));
  ...
Then the function placeFirstBond() returns sharedAtoms,
so the variable sharedAtoms in the function
layoutRingSet() just has 2 atoms and 1 bond.   
Just 4 lines below in the function layoutRingSet() ,
the function placeRing() use the variable shareAtoms.
Some code of placeRing() is:
 
    int sharedAtomCount = sharedAtoms.getAtomCount();
    if (sharedAtomCount > 2)         {
             placeBridgedRing(ring, sharedAtoms,
sharedAtomsCenter, 
 ringCenterVector, bondLength);
         }
    else if (sharedAtomCount == 2)
         {
             placeFusedRing(ring, sharedAtoms, 
sharedAtomsCenter, 
 ringCenterVector, bondLength);
         }
    else if (sharedAtomCount == 1)
         {
             placeSpiroRing(ring, sharedAtoms,
sharedAtomsCenter, 
 ringCenterVector, bondLength);
         }
But sharedAtomCount will always be 2, and 
placeFusedRing() will lways be executed. I think this
may be an error. The sharedAtoms can not be gotten by
the function placeFirstBond().


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=120024&aid=640748&group_id=20024

Re: [Cdk-devel] CDK and JUMBO (Meeting between Christoph and PMR)

From: <we...@in...> - 2002-11-19 13:51:14

Hi all,

> >I am
> >currently working on CMLDOM for CML2 - about to be released in the 
> next few
> >days. I would rather that we developed interfaces/adapters to CMLDOM2
> >rather than multiple implementations.
>
>
> Yes, that has been on my wishlist for quite some time now... Once 
> CMLDOM2 is
> out, I'll start to work on a convertor for CDK <-> CMLDOM2...
>
> However, I do not feel much for dropping SAX support whatsoever....
> at least one person, Joerg (of JOELib), needs a SAX based interface to 
> CML
> files... he has very, very large files which cannot be read with DOM...

I think some other persons with BIG files (every company in our SOL 
project and the Gasteiger group) will need SAX, too ... if they plan to 
use CML ...
also it's necessary to access molecules one after another and not the
complete stream at once. We talk about files with 200.000 up to 2 
millions molecules.

> >>>- the atomType seems fragile. I have tried to use
> >>>SaturationChecker.saturateWithHydrogen() and this throws a number of
> >>>nullPEs which appear to be because the AtomTypeFactory doesn't return

That's my opinion, too.

If you plan to develop a SMARTS parser for this task i would recommend
techniques like JavaCC or other JavaCompilerCompiler tools. But here a 
good computer scientist will be needed for coding BNF norm ...
or somebody with much time !;-)

> Taken as such ;)
>
> There is so much to do on the CDK library... We basically lack the man 
> hours
> to do many important stuff like unit testing and documentation...

I would not overestimate unit testing. It's pretty good for extended 
refactorings in algorithms and data structures and could be a mercy
for cheminformatics algorithms if there would be public available
datasets for testing protonation models, SMARTS, SMILES, file types
etc. ... but there are no test sets with a standard available (Let
me know if you know some ...).

>
> Egon

Regards, Joerg

>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: To learn the basics of securing
> your web site with SSL, click here to get a FREE TRIAL of a Thawte
> Server Certificate: http://www.gothawte.com/rd524.html
> _______________________________________________
> Cdk-devel mailing list
> Cdk...@li...
> https://lists.sourceforge.net/lists/listinfo/cdk-devel
>


-- 
Dipl. Chem. Joerg K. Wegner
Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany
Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de

Re: [Cdk-devel] CDK and JUMBO (Meeting between Christoph and PMR)

From: E.L. W. <eg...@sc...> - 2002-11-19 13:12:02

Hi Peter,

here is my more elaborate response to your valuable email.

Peter Murray-Rust wrote:
> This is a critical point and we need to explore it as a community. Creating 
> toolkits is very hard work and in many cases very boring. It is rather like 
> building a gearbox. Applications usually do something interesting and 
> provide more motivation. Many of the toolkits have really emerged from 
> applications that demanded them.
> 
> There is a slightly cyclic problem here. Without good toolkits application
> developers will hack their own and we shall see incomplete and incompatible
> approaches. JUMBOLib has emerged from the need to read complex XML 
> documents and has a library of similar fragmentation and quality to CDK.
> Some good bits, some partially finished, partially tested etc.
> 
> Please treat the following sympathetically. It comes from ca 15 years of
> writing chemical software libraries. My own code does not measure well
> against these either! 
> My suggestions for library development are:
> - keep it as small as is necessary. If a library is too large it loses
> coherence, and becomes unnavigable. I have found this in parts of CDK
> lib.  I discussed this with Christoph and he agreed that there were modules
> which weren't mainstream to the purpose of CDK. The library would benefit
> from being smaller.

Agreed. The CDK project is solving this by defining modules. At this moment 
only a few modules are defined. Most importantly, the core module which 
consists of the CDK storage classes. By use of modularization, the library 
benefits from consisting of small and easier to grasp modules.

Modules that I planned to add are:
- file IO
- structure generation
- rendering
etc

> - keep the parts simple. We discussed the ChemSequence/ChemModel approach
> for example. It is complex and confused me considerably, especially since
> it wasn't documented. If it *is* being actively used then give some
> examples. If not, then consider removing it until it is needed. And give
> some convenience methods like ChemFile.getMolecule()

I agree on that too. It confuses me often too... As suggested, documentation 
is the problem here...

> - do not introduce anything until there is a test harness and it has been
> tested. The natural assumption of a library is that it is fit for use. This
> is probably more important than for an application, since the application
> author is used to using libraries and does not expect to have to debug
> them. Of course there are and will be bugs, but a test harness will be an
> enormous help.

I tend to disagree with this... i.e. considering this:

1. It has not happened yet, but I've been working on a stable release of the 
core module. The biggest issue at this moment is the lack of documentation.
See the archive for more info on this.
2. I like to distinguish between the stable release and the development 
release. The latter is unstable: things like API changes can and will happen.
The development release is kind of a prototype release.

Like many open source software projects with a developers communities with 
different backgrounds, prototyping and prove-of-concept implementations speed 
up the development of the project. I would not feel much for loosing this 
great feature of such development.

As such, I welcome unharnessed/new/exciting new implementations in the 
development release. However, and that's what important in your comment, the 
stable release is different. In that release only proven concepts should be 
used, that are well tested and have proper documentation.

> - try to avoid "partial implementations". These can also be very
> frustrating for the user since they often don't show up until the library 
> has been well integrated into the project. Typical examples are fileReaders
> which include "just the bits that the original author needed" (I have 
> mentioned examples) and important items missing from the data structure.
> For example CDK seems to have very little support for charged molecules 
> (see below)

More or less the same as above. But with the note that I agree that a full 
implementation is favored. But, as you know, the chemical software 
development community is not that large, and for those who do development, it 
is often not even their core business... 

Thus, if a partial implementation already solves a often encountered problem, 
this is more than interesting too add... As such, the MDLReader only has 
limited features yet. Uptil now, mostly used from MDL files were atoms, bonds 
and coordinates... 

Moreover, if a user needs additional features he can request them (they are 
often no difficult to add), or implement them theirselves... And this is also 
a common and proven mechanism in open source development.

Missing features, or a list of supported features, should, however, always be 
mentioned in the documentation. And this is currently, unfortunately, not 
really the case. But that is a different problem, IMO.

> - make non-exposed classes "private" to avoid them appearing in the Javadoc 
> where they confuse the user

Agreed. And to my knowledge most developers do this... But this may not be the 
case...

> - choose names carefully. There are some misspelt classes and modules in 
> CDK (e.g. isArromatic). Abstract words such as Model, Sequence,  Property
> etc are not selfexplanatory.  Names such as "saturateWithHydrogen" are 
> misleading (this means hydrogenating double bonds) -
> "addHydrogensToSatisfyValency" would be clearer. Names alike this are 
> sometimes a useful way of documenting a system.

I totally agree. I've just renamed the mentioned method. In my programming 
education not properly named methods were reason to fail an exam.

> - document, document. Do not assume that the purpose of any routine and its
> arguments is obvious.

True. I want to note in addition that the documentation not just describes 
what *is* done by the method, but, more importantly, describes what the 
method *should* do!

> - give examples. The test examples are useful here and I would have
> struggled without them.

Yes, in the past months I started adding more examples, but we need many more 
included in the JavaDoc documentation...

> - make sure that all code is used by someone other than the author. This
> isn't easy in a small community, but I suggest it for new routines. Or put
> routines "on probation" until someone else uses them and adds some
> comments. Like Amazon book reviews.

This is a very interesting comment. Any idea on how to practically do this? I 
do not feel much for manually checking such things... there might be programs 
to do such statistics...

> > I think it would be useful to choose  those areas which CDK does well and
> > which are used by someone other than the author. My own selection would 
> be:
> - maths, geometry
>   (This is independent of the molecular data structure)
> - molecular representation and data structure, molecular perception
> (aromaticity...),
> - topological analysis, fingerprints, graphs, substructure searching
> - layout
> - rendering, interfaces with graphical systems, events
> (This is dependent on the molecular data structure)
> 
> At present I don't see how the data structure supports:
> - formal charge (although it deals with partial charges - which are fuzzy 
> because their origin is not defined). This is a serious limitation - we
> can't read the NCI data set into CDK as it has charges.

See CDK RFC #6: 
http://cdk.sourceforge.net/rfc6.html

> It is agreed that it doesn't yet support
> - tautomerism
> - stereochemistry

There some holders for stereochemistry.... Aren't they used at this moment?

> >I've noted the remarks about the CMLReader and will fix that soon (I
> consider
> >not reading some info from file a bug...) About the MDLReader I've got
> bigger
> >plans... Recently, a newer version has been "published" (on their website),
> >being V3000 (indeed, I also do not know why they did not just use V2002
>  ;)...
> >Anyway, I'll update the MDLReader soon and include reading of much more
> >fields...
> 
> There is a problem with generic file readers. There are two approaches:
> - read those parts that are interesting. This is what most authors do 
> including CDK and OpenBabel. For example most non-molecular concepts are 
> discarded and several atom properties are ignored. It works within the
> library or application but may be frustrating if users want other
> information

See comment above about users missing features...

> - extract all the information and offer it through a reader-specific 
> interface. This is much more work but is the formally correct way of
> solving the problem. Thus a PDBReader should incorporate the BIB, CRYST 
> SEQR, HET, CONECT etc as well as just the ATOM records.

Yes, but this might also be something the user does not want... If only 
interested in the coordinates, I do not want to read the other 5Mb of data on 
that molecule... (in the extremen case ;)

I've been thinking about a similar problem for Jmol (*)... in Jaguar output 
files, the first frame contains the structures that was taken as input by the 
Jaguar program... in some cases the user want to read that, on other cases he 
does not (e.g. when the other frame define an animation)... 

Therefore, I want to add a customization layer to the file IO, where the user 
can define which information it wants to have read, and which not...

> The problem with each program or toolkit writing its own file readers is 
> that it multiplies the pieces of code that have to be maintained. The point 
> of CML is that it defines a semantic interface which is loss-free. Not only
> must all the information be transmitted but also the meaning and syntax of
> all the components should be identically. Please don't take this personally
> but the CMLReader in CDK is currently too limited for what I need.

> It
> doesn't extract formalCharge, atomParity or bondStereo. The bondorders are 
> not consistent with the CML (which does not use fractional numbers). 

Not consistent with CML2 that is, right?

> I am
> currently working on CMLDOM for CML2 - about to be released in the next few 
> days. I would rather that we developed interfaces/adapters to CMLDOM2
> rather than multiple implementations.

Yes, that has been on my wishlist for quite some time now... Once CMLDOM2 is 
out, I'll start to work on a convertor for CDK <-> CMLDOM2...

However, I do not feel much for dropping SAX support whatsoever....
at least one person, Joerg (of JOELib), needs a SAX based interface to CML 
files... he has very, very large files which cannot be read with DOM... 

Also, one design goal for CDK is the its function in teaching 
computational/informational programming in chemistry... as such multiple 
implementation using different design is *very* valuable...

> > > - the atomType seems fragile. I have tried to use
> > > SaturationChecker.saturateWithHydrogen() and this throws a number of
> > > nullPEs which appear to be because the AtomTypeFactory doesn't return
>
> Hope this gives some ideas to discuss - it is intended to be constructive

Taken as such ;)

There is so much to do on the CDK library... We basically lack the man hours 
to do many important stuff like unit testing and documentation...

Egon

(*) The proper casing of chars for Jmol is: upper case J, lower case m-o-l.

Re: [Cdk-devel] CDK design, shortcomings, future developments

From: Peter Murray-R. <pm...@ca...> - 2002-11-18 14:53:46

At 14:35 18/11/2002 +0100, Christoph Steinbeck wrote:
>Although my subject implies a long and important email,
>I just wanted to welcome Peter's lengthy message, which contains so many 
>important issues for the future development of the CDK that it should be 
>converted into a strategy paper.

Thanks very much - I was nervous that it might be seen as merely critical.

The issues apply generally to the OpenSource chemistry community. We don't 
want to stifle innovation and diversity of approach but we do want to use 
common semantics if possible.

How far do we all agree on common semantics (independently of the details 
of implementation). For example we all seem to have settled on molecule 
contains atoms and bonds. Many of these contain common concepts 
(x2Coordinate, elementType) but systems differ in how the represent 
partialCharge, formalCharge, atomTypes, etc. All bonds support two-atom 
links but some extend the concept (In CML2 we allow that a bond may be 
between not only atoms but also bonds or even electrons to support 
organometallics).

These semantics are independent of the  language used and so apply to all 
opensource projects in chemistry. How many are there? We should probably 
start with those that expose their data structure in a systematic manner. 
(Editors and renderers like JChempaint , JMOL, Rasmol, BKChem,  XDrawChem, 
primarily expose graphical interfaces and data structures may be difficult 
to locate. Ghemical, GROMACS and abinit are applications or interfaces to 
them. The following list is NOT exhaustive so please don't feel offended if 
you aren't here. The main Open toolkits I am familiar with are:

JUMBOLib        Java    DOM for molecules, crystals, documents and spectra
CDK             Java    molecular representation perception, support for 
editors and renderers
JOELib  Java    molecular representation and perception, etc. Descendant of 
OELib
OpenBabel       C++     molecular representation and perception, etc. 
Descendant of OELib
IChI            C++     canonicalization of molecular structure

A lot of work has gone into all of these. How should they develop? We are 
too small to offer a diversity of semantics and will confuse the community. 
Unlike Linux we do not have zillions of developers, but our task is harder 
in many ways - we have to develop a global semantics for chemistry.

A useful start could be to systematize the semantics and functionality of 
these and any other libraries not included.  (We are also aware of several 
non-Open toolkits which expose some of their data structure but we do not 
have the whole picture. They may inform us , but cannot be included).

I am particularly keen that the difficult areas are not fragmented. These 
include:
- atomTypes.
- aromaticity
- tautomerism
- stereochemistry

The best way forward is to represent these independently of the 
implementation. Thus OpenBabel has external documents describing the 
semantics of these. It seems reasonable that we should share this approach. 
If the documents were rewritten in XML both groups could use the same 
reference. The same applies to elementTypes, valencies, radii, etc. I 
suspect that OB and CDK differ in their aromaticity methodology and are not 
easily reconciled.

It could be a useful time to bring the various groups closer together.

P.

>Let's try to work this out in greater detail in the near future.
>
>Cheers,
>
>Chris
>
>--
>Dr. Christoph Steinbeck (http://www.ice.mpg.de/departments/ChemInf)
>MPI of Chemical Ecology, Winzerlaer Str. 10, Beutenberg Campus, 07745 
>Jena, Germany
>Tel: +49(0)3641 571263 - Fax: +49(0)3641 571202
>
>What is man but that lofty spirit - that sense of enterprise.
>... Kirk, "I, Mudd," stardate 4513.3..
>
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by: To learn the basics of securing your 
>web site with SSL, click here to get a FREE TRIAL of a Thawte Server 
>Certificate: http://www.gothawte.com/rd524.html
>_______________________________________________
>Cdk-devel mailing list
>Cdk...@li...
>https://lists.sourceforge.net/lists/listinfo/cdk-devel

252 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 478 479 480 481 482 .. 493 > >> (Page 480 of 493)