From: E.L. W. <eg...@sc...> - 2004-06-30 08:04:15
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 30 June 2004 04:55, rich apodaca wrote: > Egon, I realized that CDKMolecule would not work with MDLReader because I > failed to override addBond(Bond). So I fixed that and now it works. > > The changes are found in CDKTools 0.2.1, which can be downloaded at: > http://sourceforge.net/project/showfiles.php?group_id=3D96108 > Javadoc is at: > http://octet.sourceforge.net/api.html Ok, I'll have a look at it tomorrow... > In addition, I added a unit test (MolfileTest.java) that loads a series of > molfiles using one of three approaches: (1) a CDKMolecule using MDLReader; > (2) a org.openscience.cdk.Molecule using MDLReader; and (3) a CDKMolecule > using Ocet's MolfileReader and CDKMoleculeBuilder. > > The isomorphism of molecule pairs (1)-(3) and (2)-(3) are verified with > CDK's UniversalIsomorphismTester. Everything passes. I took the molfiles > from CDK's "data" directory. Nice. > I think this little unit test is also a good demonstration of how to > actually get Octet and CDK to work together. I like your idea of either > posting this code or code like it on the QSAR site. Yes, I'll see how that can be most easily accomplished... > BTW, did you update all CDK io classes analogously to MDLReader?=20 No, not yet... > If not,=20 > would it help to put this new behavior into DefaultChemObjectReader?=20 The read() methods are often too specific... so I would need to clean up th= e=20 code much anyway... > Or maybe add an explicit method like read(Molecule molecule)? Those are mostly private... to simplify the API...=20 > Regarding where we go from here.... > > I was interested in your thoughts on the dict idea. But I must confess, I > still don't know what a dict is. It looks like it has something to do with > defining molecular descriptors. But I'm struggling to understand how one > would use it in QSAR. A dictionary is a look up table that allows a program/a user to be exactly= =20 sure what descriptorX is. It serves as a major documentation tool, but also= =20 provides unique identifiers that point to only one description of that=20 descriptor. This should ensure that there is little ambiguity on what the descriptor is, and how it is calculated... An example. Say 'partialAtomicCharge'... It is roughly possible to understa= nd=20 what this (atomic) descriptor is, but it does not say how it is calculated= =20 (Gasteiger charges or Gaussian03 charges?) and possibly what parameters are used to calculate it... (which electronegativity table or which basis=20 set?)... A dictionary should clarify these things. So given a certain descriptor name, each program implementing this descript= or=20 from that dictionary, should always give the exact same outcome. To ensure= =20 reproducibility of model building. > Nevertheless, I think going in the direction of specifying the components > and behaviors that go into descriptors is the next logical direction. I'm > especially interested in drafting some kind of specification outlining the > requirements for such a system - maybe using RFE. And of course, I'm keen= ly > interested in reducing this spec. into a set of Java interfaces defined in > terms of QSAR model-level objects. Octet functionality will probably need > to be developed to fill in the gaps, which I'm happy to do. I'm not sure I understand what you're trying to say here... > If it is at all possible to take a composite approach where descriptors or > descriptor components could easily be combined to make new descriptors, I > think this could be useful. > > But my expertise in the area of molecular descriptors is about the same as > Matt Foley's experise as a motivational speaker. So - I'd like to get some > perspectives on this from developers, potential developers, or amused > onlookers. :) Egon =2D --=20 eg...@sc... PhD on Molecular Representation in Chemometrics Nijmegen University http://www.cac.sci.kun.nl/people/egonw/ GPG: 1024D/D6336BA6 "Again a chemist did something useful with a computer" =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (SunOS) iD8DBQFA4nPyd9R8I9Yza6YRAplUAJ9XehQNR+zO3hapfkn/utBeklYrxwCdE01n DhRbP7pf2iIQasttVmhKAjY=3D =3DdHRU =2D----END PGP SIGNATURE----- |