octet-devel Mailing List for Octet (Page 2)
Status: Alpha
Brought to you by:
r_apodaca
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(11) |
Jun
(7) |
Jul
(12) |
Aug
(10) |
Sep
|
Oct
(2) |
Nov
(10) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(5) |
Oct
(31) |
Nov
(13) |
Dec
|
From: rich a. <che...@ya...> - 2004-11-27 18:56:22
|
I would like to propose some additional changes to the Octet API (http://octet.sf.net). These methods would be removed from the Atom interface: iterateNeighbors() countNeighbors() isConnectedTo(Atom atom) countElectrons() iterateBondingSystems() countBondingSystems() countReservedElectrons() toNeighborArray() toBondingSystemArray() getConfiguration() ... and their equivalents would be placed into either the Molecule or AtomGraph interface. For example, Atom.getConfiguration() would become Molecule.getConfiguration(Atom atom). AtomGraph already has iterateNeighbors(Atom atom) and countNeighbors(Atom atom). I believe these changes will result in a more consistent, robust system. The methods in question report state that is only meaningful within a Molecule or AtomGraph context. It is confusing, for example, for an AtomGraph to contain Atoms that can report on their electronic configuration because that is a Molecule-specific property. Or to have an AtomGraph whose Atoms can report different connectivity than the hosting AtomGraph itself. In addition, this approach makes it convenient for MoleculeDecorator to override nearly all Molecule functionality without the need for an AtomDecorator. The following methods would remain in the Atom interface because they do not depend on a Molecule or AtomGraph context: getNucleus() getLabel() These changes would mean that both an Atom and its enclosing AtomGraph or Molecule would need to be passed as parameters to most methods operating on Atoms. Most code has already moved in this direction, so the changes to Octet itself will be minimized. For consistency, the countElectrons() method of BondingSystem would be moved into Molecule.countElectrons(BondingSystem system). This would result in the BondingSystem interface supporting no methods beyond those inherited from AtomGraph. As a result, BondingSystem, BondingSystemCollection, and BondingSystemIterator could actually all be deleted and replaced by comparable AtomGraph counterparts. But I'm not sure I want to go that far, yet. If there are no objections in the next few days, I will go ahead and make these changes. best, r __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo |
From: rich a. <che...@ya...> - 2004-11-26 23:12:24
|
I have just committed changes to the Octet CVS that enable the specification and comparison of arbitrary molecular conformations (http://octet.sf.net). Example code can be found in net.sourceforge.octet.junit.ConformationTest. As with atomic Configuration, molecular Conformation is specified using a system originally outlined by Dietz (J. Chem. Inf. Comput. Sci. 1995, 35, 787). This paper has served as the blueprint for many of Octet's bonding and stereochemical concepts. As an example of the use of this system, consider (E)-2-pentene. It's structure is produced from the following code (taken from net.sourceforge.octet.util.TestMolecules): public static void buildTrans2Pentene(MoleculeBuilder builder) { String c = "C"; AtomHandle c0 = builder.addAtom(c); AtomHandle c1 = builder.addAtom(c); AtomHandle c2 = builder.addAtom(c); AtomHandle c3 = builder.addAtom(c); AtomHandle c4 = builder.addAtom(c); builder.connect(c0, c1, 1); builder.connect(c1, c2, 2); builder.connect(c2, c3, 1); builder.connect(c3, c4, 1); GammaSequenceHandle gamma = builder.addGammaSequence(c1); builder.connect(c2, gamma); builder.configure(gamma, c0, StereoKit.getTrigonalAngle(), 0); builder.configure(gamma, c3, StereoKit.getTrigonalAngle() / 2, Math.PI); } The method for specifying the (E)/(Z) stereochemistry of double bonds, as above, is identical to that for specifying the axial chirality of biaryls (see net.sourceforge.octet.util.TestMolecules.buildRBinaphthyl()) and allenes. This method, in turn, closely resembles the method for specifying atomic Configuration. With the addition of Conformation, it is now possible to use Octet to do the following: (1) Specifiy and query any conceivable molecular bonding arrangement. (2) Specify and query any conceivable atomic Configuration. (3) Specify and query any conceivable molecular Conformation. It is furthermore possible to do (1)-(3) without resorting to special handling or ad hoc rules. In fact, this system enables the consistent and faithful representation of bonding arrangements and/or stereochemical features that are simply not repesentable by nearly all other toolkit or file format. This means that clients needs to know very little about the intent of the programmer (or file format) that built a particular Molecule, making client code easier to develop, interpret, and debug. This sytem does not currently support fully specifying topologically "exotic" molecules such as knots, moebius strips, or rotaxanes, although these may be supported in Octet 2.0 (only half-joking). The underlying default implementation will still require some refactoring/debugging in the weeks ahead. But I think the system in its current state gives a good flavor for the generality, accuracy, and consistency that is possible. The next steps from here will consist of: (1) a round of refactoring that will include API changes previously proposed on these lists, and likely other changes (to be announced) as well; (2) a concerted effort to extensively test and debug all subsystems; and (3) an API freeze in preparation for the release of Octet 1.0.0. We're almost there! Of course, the reason I'm cross-posting all of this to the qsar-devel list is because of earlier interest in using Octet as the starting point for a molecular abstraction layer for the QSAR project. What core functionality does Octet still need to serve in this capacity? best, r __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com |
From: rich a. <che...@ya...> - 2004-11-21 20:49:40
|
Octet-0.4.0 has been released (http://octet.sf.net). This release marks the first version to support the specification and comparison of atomic stereochemical configuration. Several new interfaces have been defined, and the MoleculeBuilder and Atom interfaces have been updated. BasicQueryBuilder now returns a MoleculeQuery that compares atomic Configuration in addition to constitution. A bug in the UllmanIsomorphismTraverser that caused the same model Atoms to be multiply traversed in certain situations was fixed. Unit tests that demonstrate the functionality of the stereochemistry subsystem are included (net.sourceforge.octet.junit.StereoTest). These unit tests differentiate (R)- and (S)-isobutanol as well as (R,R)- (S,S)-, and meso-2,3-butanediol. In addition, both cisplatin and transplatin are distinguishable. The cisplatin example demonstrates how Octet enables non-tetrahedral configurational stereochemistry to be compared using the same flexible formalism as tetrahedral configurational stereochemistry. No ad-hoc rules or special treatments are necessary. Some code remains stereochemically unaware. For example, MolfileReader and AdapterMolecule do not recognise stereochemical configuration. Also, this release of Octet breaks compatibility with CDKTools 0.3.0. The next release of Octet should complete the stereochemical subsystem by enabling the specification of molecular conformation. The mechanism for doing so will be analogous to that for atomic configuration. If you'd like to help, there is plenty to do. For example, we really need to develop more tests of the stereochemistry subsystem. Any code fragments that create a Molecule with a configuration would be helpful. A usability layer that can derive Cahn-Ingold-Prelog stereodescriptors (or maybe, conversely, use such a stereodescriptor to configure a Molecule) is within reach but will still require some effort. Documentation can always be used ;-). best, r __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com |
From: rich a. <che...@ya...> - 2004-11-20 22:42:55
|
I would like to propose removing getMolecule() from the Atom and BondingSystem interfaces. The rationale for providing these methods originally was convenience. A method using an Atom would not need the Molecule passed as a parameter as well. So, methods that might require two parameters doFoo(Atom, Molecule) only required one doFoo(Atom). However, this convenience comes at the price of extensibility and consistency. For example, it's increasingly clear that the Decorator Pattern will play a big role in extending Octet. What if we want to develop a cannonicalization scheme? We could extend MoleculeDecorator: public class CannonicalizationMolecule extends MoleculeDecorator { public CannonicalizationMolecule(Molecule molecule, Map atomMap) { // implementation } public int getAtomIndex(Atom atom) { // return cannonicalized index } // .. other overrides } The problem arises when methods that are ignorant of this cannonicalization try to use Atom.getMolecule() and end up getting the wrong (undecorated) Molecule that has the uncannonicalized numbering scheme. There are other situations. For example, I've been batting around the idea of using Atoms and possibly other Molecule components as Flyweights to enable the efficient manipulation of extremely large Molecule sets.Such a system is very difficult if each Atom needs to return a unique Molecule. I think the resulting system would encourage greater consistency. Interface methods that require an Atom's Molecule context will need to be designated as such, rather than leaving implementations to their own devices. Practically speaking, I've always found that Atom.getMolecule() could be replaced one way or another with minimal fuss. This proposal would only require small changes to Octet itself, specifically in MoleculePrinter. If there are no objections in the next few days, I will go ahead and make the changes. best, r __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com |
From: rich a. <che...@ya...> - 2004-11-20 21:46:06
|
I would like to propose removing the releaseMolecule() method from the MoleculeBuilder interface. The releaseMolecule() method is inappropriate for certain MoleculeBuilder implementations. For example, QueryBuilder should never release a Molecule. Currently, BasicQueryBuilder throws an UnsupportedOperationException when releaseMolecule() is called. But this approach lacks the clarity and sturdiness of simply removing the method from the interface. This change would require clients to use an implementation-specific releaseFoo() method to release an item Foo from a MoleculeBuilder. So, for example, BasicQueryBuilder would define releaseSubstructureQuery() and releaseExactStructureQuery(). Similarly, CDKMoleculeBuilder (cdktools package) would only define releaseCDKMolecule(), but not releaseMolecule() - defining both methods would only be confusing and slightly redundant. This change would result in only minor modifications to Octet itself. In particular, TestMolecules would require buildFoo(MoleculeBuilder) methods and all createFoo(MoleculeBuilder) methods would be deleted. I will go ahead and make these changes if there are no objections in the next few days. best, r __________________________________ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com |
From: Joerg K. W. <we...@in...> - 2004-11-17 16:03:50
|
Excellent, i will have a look. 1. BTW, i've started to refactor JOELib to separate the coding and the implementation. I'm still not at the discussion level, but i'm on the way. I'm still working on the externalization from assigning all those nasty properties in Atoms, Bonds and the Molecule. If i have succeded i will be happy to discuss how to define the interface. The new openend branch in the CVS is called 'joelib2-redesign'. 2. I've heard from another person that the Beilstein institute likes Richs approach very much, because its the most general one in contrast to JOELib and CDK. So found this a convincing argument for starting a complete refactoring. Kind regards, Joerg > I have committed the first set of changes to Octet > (http://octet.sf.net) that enable the specification > and comparison of atomic stereochemical configuration. > > As I've written previously, this code is based on a > specification provided by Andreas Dietz (J. Chem. Inf. > Comput. Sci. 1995, 787). > > I view this code as a first step and very rough. I > don't expect the interface definitions to change much, > but the implementation is quite inefficient and > probably buggy. In particular, the spherical polar > coordinate manipulations in BasicMoleculeBuilder are > probably more complicated than necessary. > > Nevertheless, I have included a unit test > (net.sourceforge.octet.junit.StereoTest) that > demonstrates that two enantiomers of isobutanol can be > identified as having opposite configurations by this > system. This unit test also demonstrates how clients > will use the updated MoleculeBuilder interface to > define atomic configuration using spherical polar > coordinates. > > I am unaware of any other implementation of this > stereochemistry specification in any language, so it > will be interesting to see how it evolves. The system > is a significant departure from every other method, > but I believe the payoff is well worth the steep > learning curve. > > After I have cleaned up the implementation a bit, I > plan to tackle molecular conformation next. Once that > phase is complete, it should be possible to define > practically any stereochemical arrangement using a > single flexible formalism. An abstraction layer may be > helpful to simplify the use of this system for > standard cases (ie. tetrahedral carbon). It is at that > point that the API will be frozen in preparation for > the release of Octet 1.0 (and hopefully progress on > the QSAR project). > > As always, comments and feedback are welcome. I did > my best with the documentation of this code, but one > really needs to read Dietz' paper carefully (and > repeatedly :-)) to understand how the system works. > I'm also more than willing to try to explain how I > understand it. > > best, > r > > > > __________________________________ > Do you Yahoo!? > Check out the new Yahoo! Front Page. > www.yahoo.com > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: InterSystems CACHE > FREE OODBMS DOWNLOAD - A multidimensional database that combines > robust object and relational technologies, making it a perfect match > for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 > _______________________________________________ > Qsar-devel mailing list > Qsa...@li... > https://lists.sourceforge.net/lists/listinfo/qsar-devel > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: rich a. <che...@ya...> - 2004-11-15 03:52:31
|
I have committed the first set of changes to Octet (http://octet.sf.net) that enable the specification and comparison of atomic stereochemical configuration. As I've written previously, this code is based on a specification provided by Andreas Dietz (J. Chem. Inf. Comput. Sci. 1995, 787). I view this code as a first step and very rough. I don't expect the interface definitions to change much, but the implementation is quite inefficient and probably buggy. In particular, the spherical polar coordinate manipulations in BasicMoleculeBuilder are probably more complicated than necessary. Nevertheless, I have included a unit test (net.sourceforge.octet.junit.StereoTest) that demonstrates that two enantiomers of isobutanol can be identified as having opposite configurations by this system. This unit test also demonstrates how clients will use the updated MoleculeBuilder interface to define atomic configuration using spherical polar coordinates. I am unaware of any other implementation of this stereochemistry specification in any language, so it will be interesting to see how it evolves. The system is a significant departure from every other method, but I believe the payoff is well worth the steep learning curve. After I have cleaned up the implementation a bit, I plan to tackle molecular conformation next. Once that phase is complete, it should be possible to define practically any stereochemical arrangement using a single flexible formalism. An abstraction layer may be helpful to simplify the use of this system for standard cases (ie. tetrahedral carbon). It is at that point that the API will be frozen in preparation for the release of Octet 1.0 (and hopefully progress on the QSAR project). As always, comments and feedback are welcome. I did my best with the documentation of this code, but one really needs to read Dietz' paper carefully (and repeatedly :-)) to understand how the system works. I'm also more than willing to try to explain how I understand it. best, r __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com |
From: rich a. <che...@ya...> - 2004-11-02 15:21:24
|
In the next week I propose modifying the Octet Molecle, MoleculeBuilder, and Atom interfaces to support molecular stereochemical configuration and conformation (http://octet.sourceforge.net). The specification for these changes can be found in the article by Dietz: J. Chem. Inf. Comput. Sci. 1995, 35, 787. The philosphy of this approach can be summed up in a quote from the paper: "Note that a molecular structure representation cannot free the user from the task to decide how a chemical structure should be represented. However, a lack of versatility might force the user to represent a chemical structure in a certain manner, even if he would prefer to represent it differently." At its core, this system will use the object oriented equivalent of Dietz's "pencil of planes" idea. This system should enable the unambiguous assignment of stereochemical configuration to any atom. For example, the mechanism of assigning the configuration of a tetrahedral carbon is identical to assigning the configuration of an octahedral or trigonal bipyramidal metal center or transition state. It should also be possible to unambigously specify all forms of conformational stereochemistry such as that found in allenes and biaryls, and E/Z isomerism unsing a mechanism analogous to that for configuration. The boat and chair forms of cyclohexane could also be distinguishable through this mechanism, if required by clients. Because of its generality, this system represents something of a departure from other methods for handling conformation and configuration. To help flatten the learning curve, I propose one or more helper classes that can do such things as report a Cahn-Ingold-Prelog stereodescriptor for a tetrahedral carbon and determine arbitrary atomic configurations as being identical, enantiomeric, or completely different. MoleculeBuilder will be updated to enable this specification. Here, a spherical polar coordinate system with an atom as its origin will be used. This allows each atomic configuration to be set independently from the others in a molecule. I was actually surprised how straightforward it is to specify configurations and conformations using spherical polar coordinates. These changes should enable just about any type of stereochemistry to be consistently represented and queried. One limitation is that stereochemistry resulting from topological chirality will still be undefinable, such as that in helicenes, or knots. Then again, no system I'm aware of does this and the demand for this will be just about nill for the forseeable future. For now these changes will be limited to the Octet CVS. I don't think they will begin to appear in releases for about a month or so. Any feedback on this proposal would be welcomed. cheers, rich __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com |
From: Dr P. Murray-R. <pm...@ca...> - 2004-10-18 07:22:05
|
On Oct 18 2004, rich apodaca wrote: > I'm in the process of trying to introduce > stereochemistry into Octet. In particular, I want to > enhance the Molecule (or Atom) interface to enable > clients to specify and query molecular stereochemical > information. I think atom and bond stereochemistry is tractable. I think most of the rest is problematic > > So my simple question is: What is the most useful way > to represent stereochemistry in a cheminformatics > framework? > > Ideally, any system that gets implemented should allow > the following: > > (1) Unambiguous representation of any chiral > configuration: alkenes, yes allenes, possible if central atom is given tetrahedral status biaryls, possible and messy if a dummy at is placed at centre of bond > metallocenes, I know of no system and I would argue against developing one tetrahedral carbon, yes etc. helicenes, etc no current method 6-coordinates complexes no method in common use > > (2) A uniform method for querying molecules to obtain > stereochemical information. The method used for biaryl > chirality, for example, should be identical to that > for tetrahedral carbon chirality. It canb't be identical as there isn't a central atom, but it could be similar > > (3) Stereochemistry should be specified without > reference to a 3-D coordinate system. Stereochemistry can be deduced from 3D coords. JUMBO already does this > > (4) The solution should be as "intuitive" as possible. The only useful approach is common usage. I am on a IUPAC committee on this topic. We have now agreed what a wedge and hatch bond means and how to use them. In most cases > > (5) The solution should be flexible enough to never > require special handling for unusual types of > stereochemistry. I don't think this is possible > > For concrete stereochemistry implementations, I have > looked mainly at CDK, OpenBabel, and JOELib, all of > which appear to have some level of support for > stereochemistry. All three appear to use a system of > chiral flags on Atoms, Bonds, or both. Unfortunately, > I haven't been able to find detailed documentation on > many aspects of these approaches. In addition, it > appears to me that the chiral flag approach is > fundamentally not general enough to enable point (1). > > I have been quite interested in a model for > stereochemistry outlined by Akutsu: > > J. Chem. Inf. Comput. Sci. 1991, 31, 414-417 > > The idea behind this paper is to transform a molecular > graph with an ordered adjacency list representation > for atomic neighbors into another unique graph > representation in which the stereochemical topology is > automatically encoded in the graph. The major drawback > I see with this approach is the production of some > potentially very large graphs. In addition, I'm not > sure how to apply this approach to chirality with no > stereocenter, as with biaryls. I doubt it can be done > > I have also been looking at another approach outlined > by Dietz: > > J. Chem. Inf. Comput. Sci. 1995, 35, 787 > > Unfortunately, this approach seems to require as a > starting point at least partial knowledge of 3-D > coordinates in order to specify chirality, which is > not consistent with point (3) above. I believe that > dealing with 3-D coordinates in any form greatly > increases the complexity of specifying and using > chirality. On the other hand, this system allows for > the complete specification and differentiation of all > chiral configurations of any molecule. And it may be > possible to provide some kind of developer tool that > makes it easier to use this approach. Another > potential drawback of this approach my be the need to > use only non-hydrogen-suppressed graphs. There are some molecules where the only realistic method of describing the structures themselves is to give the 3D coordinates. Examples are mertal clusters. and what would you do with fluxional molecules? > > Any info to help me move forward would be helpful. Moreover there are few systems that can author anything other than atom and bond stereo I would stick with atom-centered and bond-based. JUMBO does all the required conversions between 2D and 3D P. > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > ------------------------------------------------------- This SF.net > email is sponsored by: IT Product Guide on ITManagersJournal Use IT > products in your business? Tell us what you think of them. Give us Your > Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ Qsar-devel mailing list > Qsa...@li... > https://lists.sourceforge.net/lists/listinfo/qsar-devel |
From: rich a. <che...@ya...> - 2004-10-17 23:25:42
|
I'm in the process of trying to introduce stereochemistry into Octet. In particular, I want to enhance the Molecule (or Atom) interface to enable clients to specify and query molecular stereochemical information. So my simple question is: What is the most useful way to represent stereochemistry in a cheminformatics framework? Ideally, any system that gets implemented should allow the following: (1) Unambiguous representation of any chiral configuration: alkenes, allenes, biaryls, metallocenes, tetrahedral carbon, etc. (2) A uniform method for querying molecules to obtain stereochemical information. The method used for biaryl chirality, for example, should be identical to that for tetrahedral carbon chirality. (3) Stereochemistry should be specified without reference to a 3-D coordinate system. (4) The solution should be as "intuitive" as possible. (5) The solution should be flexible enough to never require special handling for unusual types of stereochemistry. For concrete stereochemistry implementations, I have looked mainly at CDK, OpenBabel, and JOELib, all of which appear to have some level of support for stereochemistry. All three appear to use a system of chiral flags on Atoms, Bonds, or both. Unfortunately, I haven't been able to find detailed documentation on many aspects of these approaches. In addition, it appears to me that the chiral flag approach is fundamentally not general enough to enable point (1). I have been quite interested in a model for stereochemistry outlined by Akutsu: J. Chem. Inf. Comput. Sci. 1991, 31, 414-417 The idea behind this paper is to transform a molecular graph with an ordered adjacency list representation for atomic neighbors into another unique graph representation in which the stereochemical topology is automatically encoded in the graph. The major drawback I see with this approach is the production of some potentially very large graphs. In addition, I'm not sure how to apply this approach to chirality with no stereocenter, as with biaryls. I have also been looking at another approach outlined by Dietz: J. Chem. Inf. Comput. Sci. 1995, 35, 787 Unfortunately, this approach seems to require as a starting point at least partial knowledge of 3-D coordinates in order to specify chirality, which is not consistent with point (3) above. I believe that dealing with 3-D coordinates in any form greatly increases the complexity of specifying and using chirality. On the other hand, this system allows for the complete specification and differentiation of all chiral configurations of any molecule. And it may be possible to provide some kind of developer tool that makes it easier to use this approach. Another potential drawback of this approach my be the need to use only non-hydrogen-suppressed graphs. Any info to help me move forward would be helpful. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: rich a. <che...@ya...> - 2004-08-13 02:43:59
|
Hello Ola, I can see how the similarities and differences among the half dozen or so Java cheminformatics frameworks/applications might be difficult to pick out (CDK, JChemPaint, JOELib, Octet, Structure, JUMBO, Marvin, soon QSAR, others) . The good news is most of them are open source. I'd like to offer some thoughts on what we're trying to do with Octet (http://octet.sourceforge.net) and Structure (http://structure.sourceforge.net) because I'm most involved with those. Learning a new API is difficult, especially with something as multifaceted as cheminformatics. In part to flatten the learning curve, Octet is designed to deliver the minimal functionality that will be needed in all cheminformatics contexts. This is actually a lot harder than it might sound - the temptation to add just a little bit of extra functionality here and there is very powerful. As a result, Octet's major functionality consists of: (1) The representation of Molecules with any bonding arrangement, from nonclassical carbocations to coordination complexes to inorganics to simple organic compounds is possible. All Molecules are queried using a unified interface that requires no exceptional handling for "wierd" molecules. (2) All model-level objects (Atom, Molecule, etc.) are defined in terms of Java interfaces. The ways that concrete Molecules are implemented can vary drastically, but as long as the interface methods give consistent results, Octet can handle them all. This enables Octet users to fine-tune the Molecule implementation to their particular needs. For example, when dealing with large numbers of Molecules, low memory usage may be a high priority. When working with a limited number of very large Molecules such as proteins, the ability to speedily address and manipulate Atoms and bonding arrangements may be critical. An implementation that works in one case may fail miserably for the other case. So the flexibility to choose is essential for a robust framework. (3) Simplified SMILES, Molfile, and SD file format readers and writers. (4) An API for traversal of Molecules as graph objects. Breadth-first, depth-first, cycle, and isomorphism traversal are all possible via a consistent API. (5) An API for substructure, exact-structure, and query atom queries. (6) Identification of essential Molecule properies such as hydrogen atom count, formal bond order, and electron count. (7) To be implemented in the near future, definition and manipulation of molecular stereochemistry. And that's it for the functionality itself. Of course, this narrow focus leaves many specialized areas untouched, but the features above will be essential for most cheminformatics problems. Recently, support for the use of CDK Molecules within Octet and the use of Octet Molecules within CDK has been developed. This package is called CDKTools. A copy with source code and unit tests can be downloaded here: https://sourceforge.net/project/showfiles.php?group_id=96108 By keeping the API small and simple, we hope to increase the probability that Octet will become a stable framework that is easy to learn, use, and especially extend. Structure extends Octet's capabilities by enabling 2D structure drawing of Molecules. Not much progress has been made on this project recently - due mainly to efforts to move Octet closer to an API freeze and eventual 1.0 release, but it is indeed still alive. The overall approach to Structure is similar to the approach taken with Octet: to deliver the minimal functionality that will be needed in the majority of 2D molecular rendering contexts. 2D coordinate generation falls into that category, and so it is a goal for the project. CDK has has done a wonderful job with 2D structure layout. But there is clearly room for a variety of new approaches in this largely neglected area, especially given the complementary functionality that Octet provides. Regarding JChemPaint and Structure, both are aimed at 2D molecular rendering. However, they address the problem from different perspectives (feel free to correct me if I'm misstating, Egon). JChemPaint is a client-side application/applet that enables both rendering and editing of molecules, and has features that can be used as a library. Structure is solely a framework for 2D Molecule rendering that will provide the functionality on which rendering applications can be built. This may sound like a minor distinction at first, but it results in a very different set of design decisions that need to be made, bugs that need to be fixed, and resource committment. Well, that's a long-winded attempt to try to answer your questions. Let me know if I can give you any further info. best, rich Ola Spjuth <ola...@lc...> wrote: Hello, I am a little confused and don't know how these projects overlap and their licenses. CDK LGPL JOElib GPL Octet LGPL Jmol LGPL Jchempaint GPL Structure LGPL 1) Have I understood the licenses above correctly? On some SF pages (joelib & jchempaint) it says GPL or LGPL. What does that mean? May I choose? 2) How much do CDK and JOElib overlap? I know you can use them together, what are the benefits of this? Descriptors? Will descriptors not be implemented in CDK? 3) What does Octet add to this mix (except that it's LGPL and JOElib is not)? Can it be used with CDK? Overlap? Are the projects competing against each other? 4) What does the Structure project add to all this (except that it's built on Octet and LGPL)? The homepage says they are working on SDG, isn't that already present in CDK? Doesn't JchemPaint do the same thing as Structure? I am posting this question in the CDK, Octet and JOElib mailinglists in order to get more extensive information. Best regards, .../Ola Spjuth -- --- Ola Spjuth, PhD student Dept of Pharmacology & Linnaeus Centre for Bioinformatics Uppsala University, Sweden ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ octet-devel mailing list oct...@li... https://lists.sourceforge.net/lists/listinfo/octet-devel --------------------------------- Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. |
From: Joerg K. W. <we...@in...> - 2004-08-12 15:14:49
|
Hi Rich, that are good news, indeed. I've still the long term refactoring idea to create a refactored JOELib2. So my plan is to use Octet, JGraphT and JOElib as base. So this is nothing which can be done in a short time period, my plan is to provide a primitive implementation until the middle of the next year. I can understand that you want not to focus on JGraphT, but from my standpoint of view we must focus on a graph implementation soon or things will get to wilde. Especially, because i will need a default implementation to grant a stable basis for descriptor calculation algorithms. A framework is fine, but a working application has also its benefits :-) Both will grant a high maintenance-ability with, hopefully, many, many users ... Kind regards, Joerg > Hello Joerg, > > I have good news for you. You *can* use JGraphT, any other existing graph framework, or any new graph framework, as a base for implementing the Molecule interface. Clients are free to choose the most optimized graph implementation they can find. > > This is the big advantage of having business objects inherit an interface definition rather than a concrete class. > > The way I would do this is to first implement the MoleculeBuilder interface, say JGraphTMoleculeBuilder. The implementation would build a JGraphT concrete Graph implementation behind the scenes (i.e. as a private instance) as MoleculeBuilder methods are invoked. When releaseMolecule() is invoked, JGraphTMoleculeBuilder would then wrap its JGraphT Graph in a Molecule implementation defined as a private inner class of JGraphTMoleculeBuilder. Hopefully I've explained this clearly, if not let me know and I can supply some skeletal source code. > > You could even add a special method in JGraphTMoleculeBuilder that would release the JGraphT Graph for further manipulation. So then you could use Octet's SimpleSmilesReader, StuctureDataReader, or MolfileReader to build a JGraphT Graph instead of an Octet Molecule. This graph could then be used with all of the rich functionality in the JGraphT package, including traversers. > > And all of this can happen without Octet (or QSAR) needing to know about JGraphT directly and without changing Octet in any way. > > I actually considered using JGraphT (and other graph frameworks as well) for the default Molecule implementation. I decided against it mainly for simplicity. I didn't want Octet to require a lot of external dependencies. Not only that, but JGraphT comes with much more functionality, and in some cases not the correct functionality, for what I wanted to do with Octet. > > Now, if the idea was to have a Molecule interface definition that extends the JGraphT Graph interface, I'm not in favor of that. The main reason is immutability. JGraphT's Graph interface is loaded with public mutator methods - meaning that any client can change a Graph representation at any time. To get around the inconsistencies this can lead to, JGraphT introduces GraphListener. But this means that every class that wants to be informed of a change to a graph needs to add itself as a listener - something that is easy to forget to do. It's also easy to forget to remove a class as a listener - preventing the garbage collector from deleting it, a form of "memory leak". > > Since I could think of almost no situation that would require a Molecule to be modified once it was created, I decided to make Molecule and all of the interfaces it depends on (AtomPair, BondingSystem, Atom) immutable. The leads to simplification of the interface, more streamlined client code, no need for copy constructors or clone() methods, and also makes it harder to create bugs deriving from inconsistent Molecule state. > > In summary, Octet supports Molecules with any underlying graph representation. But I would leave this kind of optimization up to users and wouldn't want to make it part of Octet. I would not favor Molecule inheriting JGraphT's Graph interface. > > best, > rich > > "Joerg K. Wegner" <we...@in...> wrote: > Hi Rich, > > i know that my idea might be unpopular, but i think we should use also > jgrapht (LGPL) as base for octet, because they provide already some > graph-algorithms and traversers. > > The 'simple graph' can be the default base for a molecule: > org._3pq.jgrapht.graph.SimpleGraph > > The implementation looks fine, the only thing i'm missing is the > labeling functionality for edges and vertexes. I've added a feature > request to theri tracking system: > http://sourceforge.net/tracker/index.php?func=detail&aid=1007815&group_id=86459&atid=579690 > > 1. vertexes are no problem, because they are handled as Objects and > efficience for storing and removing is O(1), by accessing them by their > hashCode- and equals-method (unique identifier, e.g. pointer or index > number). > An vertex interface with labels could be helpfull. > public void put(VertexKey key, Object value) { > keys.put(key,value); > } > > with > public class VertexKey extends java.lang.Object > { > } > > 2. edges contains no labels via label-key, so here we must contact the > jgrapht-team or modify their edge interface. > public void put(EdgeKey key, Object value) { > keys.put(key,value); > } > > /** > * Return value associated with key in this edge > */ > public Object get(EdgeKey key) { > return keys.get(key); > } > > public void release(EdgeKey key){ > keys.remove(key); > } > with > public class EdgeKey extends java.lang.Object > { > } > > As i've seen that this functionality is missing in octet. > Adding atoms or atompairs is O(1) and removing is missing completely. > Furthermore, if following the actual implementation we will obtain for > removing O(N) instead of O(1), because you are using A List instead of a > map. > > Kind regards, Joerg > > >>Hello All, >> >>Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable. >> >>This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS. >> >>If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0. >> >>In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet. >> >> >>cheers, >>rich >> >> >>--------------------------------- >>Do you Yahoo!? >>Yahoo! Mail is new and improved - Check it out! > > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: rich a. <che...@ya...> - 2004-08-12 14:50:51
|
Hello Joerg, I have good news for you. You *can* use JGraphT, any other existing graph framework, or any new graph framework, as a base for implementing the Molecule interface. Clients are free to choose the most optimized graph implementation they can find. This is the big advantage of having business objects inherit an interface definition rather than a concrete class. The way I would do this is to first implement the MoleculeBuilder interface, say JGraphTMoleculeBuilder. The implementation would build a JGraphT concrete Graph implementation behind the scenes (i.e. as a private instance) as MoleculeBuilder methods are invoked. When releaseMolecule() is invoked, JGraphTMoleculeBuilder would then wrap its JGraphT Graph in a Molecule implementation defined as a private inner class of JGraphTMoleculeBuilder. Hopefully I've explained this clearly, if not let me know and I can supply some skeletal source code. You could even add a special method in JGraphTMoleculeBuilder that would release the JGraphT Graph for further manipulation. So then you could use Octet's SimpleSmilesReader, StuctureDataReader, or MolfileReader to build a JGraphT Graph instead of an Octet Molecule. This graph could then be used with all of the rich functionality in the JGraphT package, including traversers. And all of this can happen without Octet (or QSAR) needing to know about JGraphT directly and without changing Octet in any way. I actually considered using JGraphT (and other graph frameworks as well) for the default Molecule implementation. I decided against it mainly for simplicity. I didn't want Octet to require a lot of external dependencies. Not only that, but JGraphT comes with much more functionality, and in some cases not the correct functionality, for what I wanted to do with Octet. Now, if the idea was to have a Molecule interface definition that extends the JGraphT Graph interface, I'm not in favor of that. The main reason is immutability. JGraphT's Graph interface is loaded with public mutator methods - meaning that any client can change a Graph representation at any time. To get around the inconsistencies this can lead to, JGraphT introduces GraphListener. But this means that every class that wants to be informed of a change to a graph needs to add itself as a listener - something that is easy to forget to do. It's also easy to forget to remove a class as a listener - preventing the garbage collector from deleting it, a form of "memory leak". Since I could think of almost no situation that would require a Molecule to be modified once it was created, I decided to make Molecule and all of the interfaces it depends on (AtomPair, BondingSystem, Atom) immutable. The leads to simplification of the interface, more streamlined client code, no need for copy constructors or clone() methods, and also makes it harder to create bugs deriving from inconsistent Molecule state. In summary, Octet supports Molecules with any underlying graph representation. But I would leave this kind of optimization up to users and wouldn't want to make it part of Octet. I would not favor Molecule inheriting JGraphT's Graph interface. best, rich "Joerg K. Wegner" <we...@in...> wrote: Hi Rich, i know that my idea might be unpopular, but i think we should use also jgrapht (LGPL) as base for octet, because they provide already some graph-algorithms and traversers. The 'simple graph' can be the default base for a molecule: org._3pq.jgrapht.graph.SimpleGraph The implementation looks fine, the only thing i'm missing is the labeling functionality for edges and vertexes. I've added a feature request to theri tracking system: http://sourceforge.net/tracker/index.php?func=detail&aid=1007815&group_id=86459&atid=579690 1. vertexes are no problem, because they are handled as Objects and efficience for storing and removing is O(1), by accessing them by their hashCode- and equals-method (unique identifier, e.g. pointer or index number). An vertex interface with labels could be helpfull. public void put(VertexKey key, Object value) { keys.put(key,value); } with public class VertexKey extends java.lang.Object { } 2. edges contains no labels via label-key, so here we must contact the jgrapht-team or modify their edge interface. public void put(EdgeKey key, Object value) { keys.put(key,value); } /** * Return value associated with key in this edge */ public Object get(EdgeKey key) { return keys.get(key); } public void release(EdgeKey key){ keys.remove(key); } with public class EdgeKey extends java.lang.Object { } As i've seen that this functionality is missing in octet. Adding atoms or atompairs is O(1) and removing is missing completely. Furthermore, if following the actual implementation we will obtain for removing O(N) instead of O(1), because you are using A List instead of a map. Kind regards, Joerg > Hello All, > > Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable. > > This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS. > > If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0. > > In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet. > > > cheers, > rich > > > --------------------------------- > Do you Yahoo!? > Yahoo! Mail is new and improved - Check it out! -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Qsar-devel mailing list Qsa...@li... https://lists.sourceforge.net/lists/listinfo/qsar-devel --------------------------------- Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! |
From: Joerg K. W. <we...@in...> - 2004-08-12 11:42:46
|
Hi all, the developers are right, so we should implement our own EdgeFactory. https://sourceforge.net/tracker/?func=detail&atid=579690&aid=1007815&group_id=86459 Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: Joerg K. W. <we...@in...> - 2004-08-12 09:57:23
|
Hi All, > (3) The ability to generate a set of all Molecules that satisfy the model (i.e., solving the inverse QSAR problem). > ... has hardly been considered, and would be, in my opionion, a significant advance over existing software with immediate payoff for the bench chemist. Mmmh, this was already considered, but as far as i know this is an NP complete combinatorial optimization problem, so this is really hard. Especially under the context of molecules which should be synthesizable. So 'de novo'-design is still not uncritical but more popular for such things. So, in fact, we work on a graph which holds molecular fragments at its nodes in this special case. > The key point is that Signature is not a number - it is a behavior. > Signatures have many useful properties. A Signature can be used to construct a set of Molecules described by it. This constrast to the vast majority of descriptors in which too much information is lost, and is a key element in solving the inverse-QSAR problem. Signature can be used to construct many of the commonly-used topological descriptors. Signatures are also less degenerate than many other descriptors. A BFS is not the solution for all problems (No-Free-Lunch-Theorem for optimization), but i agree that this is more general than basic hard coded graph traversing descriptors. But at the moment i can't see, that this will help with Atom-Pair descriptors or with matrix descriptors. Furthermore i'm not sure if things like RDF are possible with this approach. > Before a Signature implementation can be built, a mechansim for Breadth First Traversal must be in place. This is the current focus of Octet. BreadthFirstTraverser will use the same Handler/Controller architecture as other Traversers, and so will be a good starting point for a Signature builder, among other uses. A set of possible traversers is more general, so i would prefer here a traverser factory to pick the traverser. Or more exact i would prefer parameters for the Signature object. > I'm not sure if this was the original intent of the QSAR Project, which seemed more oriented toward building a QSAR gui. However, I believe that the system I'm proposing would be a critical component of that goal. If we are only interested in a Descriptor-GUI, we can write a wrapper for joelib.test.DescriptorCalculation For the data mining step i would still prefer the conversion to a Weka data structure and then apply the Weka-GUI directly, at least for the primitive descriptor types. For the complexer ones we must at first modify the data mining methods directly to work on the graph, subgraph and whatever metrics, because this is not a standard-data mining tasks, so i know not one project which allows such things directly. That's why we've introduced the joelib.algo.datamining.weka-package in JOElib. I think it is more important to have a good design for future scientific work to be as general as possible and allowing to mix actual: chemistry knowledge (chemo) with data mining methods and algorithms (informatics) Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: Joerg K. W. <we...@in...> - 2004-08-12 09:28:21
|
Hi Ola, all things said here are my personal opinion, so please be patient by=20 reading them. > CDK LGPL > JOElib GPL > Octet LGPL > Jmol LGPL > Jchempaint GPL > Structure LGPL >=20 > 1) Have I understood the licenses above correctly? On some SF pages > (joelib & jchempaint) it says GPL or LGPL. What does that mean? May I > choose? Not at all. GPL is harder than LGPL. So in JOELib this means the kernel (this means the chemical expert=20 systems) is GPL and contains some LGPL parts. But you can not change the=20 GPL license. The GPL license comes from the stalled OELib project, so=20 commercial users can buy a OEChem license from EyesOpen, which is the=20 official commercial successor of OELib. > 2) How much do CDK and JOElib overlap? I know you can use them together= , > what are the benefits of this? Descriptors? Will descriptors not be > implemented in CDK? They do not really overlap. Because they have different data structures=20 for molecules. There is a primitive converter class, but not more. So,=20 both have a different focus on what they provide. See documentation and=20 tutorials for details. JOELib contains also LGPL code from Egon (CML) and modified 2D rendering=20 classes from Christoph (no 2D layout, only rendering, no event model)=20 which allows also to show SMARTS matchings and to export images and PDF. Descriptors? Depends on the kind of the descriptors i would say, but=20 JOELib is here much more advanced (but i might be not objective here). > 3) What does Octet add to this mix (except that it's LGPL and JOElib is > not)? Can it be used with CDK? Overlap? Are the projects competing > against each other? No competition is the last thing we are interested in, because we are=20 too less developers to be really competitive. We are trying to combine=20 the different data structures in a general way in the octet project. But=20 this is still under discussion and far away from a concrete implementatio= n. So, on long terms this might provide a common interface. Hopefully this will faciliate the usage of a chemoinformatics tools and=20 faciliate the project maintenance, we will see ... > 4) What does the Structure project add to all this (except that it's > built on Octet and LGPL)? The homepage says they are working on SDG, > isn't that already present in CDK? Doesn't JchemPaint do the same thing > as Structure? Rich, is this project stalled or in progress ? > I am posting this question in the CDK, Octet and JOElib mailinglists in > order to get more extensive information. Crossposting causes always many e-mails for users subscribed to all=20 users. If you bear such things always in mind, this is o.k. CU, J=F6rg --=20 Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: Ola S. <ola...@lc...> - 2004-08-12 08:53:22
|
Hello, I am a little confused and don't know how these projects overlap and their licenses. CDK LGPL JOElib GPL Octet LGPL Jmol LGPL Jchempaint GPL Structure LGPL 1) Have I understood the licenses above correctly? On some SF pages (joelib & jchempaint) it says GPL or LGPL. What does that mean? May I choose? 2) How much do CDK and JOElib overlap? I know you can use them together, what are the benefits of this? Descriptors? Will descriptors not be implemented in CDK? 3) What does Octet add to this mix (except that it's LGPL and JOElib is not)? Can it be used with CDK? Overlap? Are the projects competing against each other? 4) What does the Structure project add to all this (except that it's built on Octet and LGPL)? The homepage says they are working on SDG, isn't that already present in CDK? Doesn't JchemPaint do the same thing as Structure? I am posting this question in the CDK, Octet and JOElib mailinglists in order to get more extensive information. Best regards, .../Ola Spjuth -- --- Ola Spjuth, PhD student Dept of Pharmacology & Linnaeus Centre for Bioinformatics Uppsala University, Sweden |
From: Joerg K. W. <we...@in...> - 2004-08-12 08:26:38
|
Hi Rich, i know that my idea might be unpopular, but i think we should use also jgrapht (LGPL) as base for octet, because they provide already some graph-algorithms and traversers. The 'simple graph' can be the default base for a molecule: org._3pq.jgrapht.graph.SimpleGraph The implementation looks fine, the only thing i'm missing is the labeling functionality for edges and vertexes. I've added a feature request to theri tracking system: http://sourceforge.net/tracker/index.php?func=detail&aid=1007815&group_id=86459&atid=579690 1. vertexes are no problem, because they are handled as Objects and efficience for storing and removing is O(1), by accessing them by their hashCode- and equals-method (unique identifier, e.g. pointer or index number). An vertex interface with labels could be helpfull. public void put(VertexKey key, Object value) { keys.put(key,value); } with public class VertexKey extends java.lang.Object { } 2. edges contains no labels via label-key, so here we must contact the jgrapht-team or modify their edge interface. public void put(EdgeKey key, Object value) { keys.put(key,value); } /** * Return value associated with <CODE>key</CODE> in this edge */ public Object get(EdgeKey key) { return keys.get(key); } public void release(EdgeKey key){ keys.remove(key); } with public class EdgeKey extends java.lang.Object { } As i've seen that this functionality is missing in octet. Adding atoms or atompairs is O(1) and removing is missing completely. Furthermore, if following the actual implementation we will obtain for removing O(N) instead of O(1), because you are using A List instead of a map. Kind regards, Joerg > Hello All, > > Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable. > > This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS. > > If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0. > > In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet. > > > cheers, > rich > > > --------------------------------- > Do you Yahoo!? > Yahoo! Mail is new and improved - Check it out! -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: rich a. <che...@ya...> - 2004-08-12 06:16:15
|
Hello All, Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable. This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS. If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0. In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet. cheers, rich --------------------------------- Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! |
From: Joerg K. W. <we...@in...> - 2004-08-09 07:50:55
|
Hi All, only a short comment for the moment. > (3) The ability to generate a set of all Molecules that satisfy the model (i.e., solving the inverse QSAR problem). So, this is an (combinatorial) optimization problem and our group has published last week the internal developed JavaEVA library (at the moment only as binary, because the license model is still under discussion): http://www-ra.informatik.uni-tuebingen.de/software/JavaEvA/index.html Kind regards, Joerg > > My proposal is to make the development of a QSAR system that satisfies (1)-(3) the immediate focus of the QSAR project. Many systems address (1)-(2), but point (3) has hardly been considered, and would be, in my opionion, a significant advance over existing software with immediate payoff for the bench chemist. > > I propose building this system with a single flavor of molecular descriptor called "Signature". > > Briefly, the Signature of a Molecule is composed of the individual Signatures of its Atoms. An atomic Signature is composed of the Atom itself, and the set of Atoms surrounding it at a particular distance (think Breadth-First Search). The distance, or "height" is user-definable. The key point is that Signature is not a number - it is a behavior. > > Signatures have many useful properties. A Signature can be used to construct a set of Molecules described by it. This constrast to the vast majority of descriptors in which too much information is lost, and is a key element in solving the inverse-QSAR problem. Signature can be used to construct many of the commonly-used topological descriptors. Signatures are also less degenerate than many other descriptors. > > A series of four papers has been published on Signature. The third in the series is available online here: (http://www.genomes2life.org/publications/Signature-3.pdf). This article clearly outlines how a system using Signature builds an SAR model and solves the inverse QSAR problem. The first article in the series clearly specifies what a Signature is, with an excellent review of descriptor development and use. It is available here: (http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/ci020345w). > Will building this system require many intermediate subsystems to be built? Of course. However, the blueprint is already in place. It's just a matter of constructing the software that meets the component specifications. > > The foundation for this system will be Signature itself. I would propose that Signature should be an interface that concrete Signatures implement. I won't go into the interface specification here, but it should be straightforward to develop. > > Before a Signature implementation can be built, a mechansim for Breadth First Traversal must be in place. This is the current focus of Octet. BreadthFirstTraverser will use the same Handler/Controller architecture as other Traversers, and so will be a good starting point for a Signature builder, among other uses. > > Looking further out, an object-oriented architecture that encapsulates the stages of QSAR analysis needs to be developed: building equations; solving equations; and producing Molecules that match the solutions to the equations. As I mentioned, the blueprint is available - the challenge will be to build components that meet the specification. > > I'm not sure if this was the original intent of the QSAR Project, which seemed more oriented toward building a QSAR gui. However, I believe that the system I'm proposing would be a critical component of that goal. > > As a concrete next step, I would propose developing a Signature interface based on Octet. Simultaneously, a default implementation, BasicSignature, could be developed as a reality check for the design. The construction of simple unit tests will give the effort a context. I'm not sure where this prototype should be hosted, but due to the still-fluid nature of the Octet API, I think it would be most convenient to host it in a net.sourceforge.octet.qsar package for the time being. When we're all confident that the low-level features to make this system happen are in place, it can then be moved into a QSAR Project package. > > This is one direction to take, and I'm open to any suggestions or comments. > > cheers, > rich > > > > --------------------------------- > Do you Yahoo!? > Yahoo! Mail - 50x more storage than other providers! -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: rich a. <che...@ya...> - 2004-07-30 15:05:39
|
Hello All, After thinking about Joerg's comments and the discussion regarding object oriented descriptors, I've concluded that I've been approaching the entire concept from the wrong angle. Descriptor, in the sense that I've been thinking about it is really two completely different ideas: (1) the calculation of a numerical property for a particular Molecule (clogp, TPSA, etc.); and (2) the use of an algorithm for comparing a set of Molecules and their experimentally determined properies with the ultimate goal of building a predictive model. I believe that (2) is a far more important problem to work on. The following is a proposal for a predictive QSAR system based on Octet (http://octet.sourceforge.net). It's key features will include: (1) The use of a training set consisting of Molecules and data (IC50's, logP's, boiling points, Rf values, nmr shifts, etc.) for the generation of a QSAR Model. (2) The ability to predict, based on the Model, the activity/property of any new Molecule. (3) The ability to generate a set of all Molecules that satisfy the model (i.e., solving the inverse QSAR problem). My proposal is to make the development of a QSAR system that satisfies (1)-(3) the immediate focus of the QSAR project. Many systems address (1)-(2), but point (3) has hardly been considered, and would be, in my opionion, a significant advance over existing software with immediate payoff for the bench chemist. I propose building this system with a single flavor of molecular descriptor called "Signature". Briefly, the Signature of a Molecule is composed of the individual Signatures of its Atoms. An atomic Signature is composed of the Atom itself, and the set of Atoms surrounding it at a particular distance (think Breadth-First Search). The distance, or "height" is user-definable. The key point is that Signature is not a number - it is a behavior. Signatures have many useful properties. A Signature can be used to construct a set of Molecules described by it. This constrast to the vast majority of descriptors in which too much information is lost, and is a key element in solving the inverse-QSAR problem. Signature can be used to construct many of the commonly-used topological descriptors. Signatures are also less degenerate than many other descriptors. A series of four papers has been published on Signature. The third in the series is available online here: (http://www.genomes2life.org/publications/Signature-3.pdf). This article clearly outlines how a system using Signature builds an SAR model and solves the inverse QSAR problem. The first article in the series clearly specifies what a Signature is, with an excellent review of descriptor development and use. It is available here: (http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/ci020345w). Will building this system require many intermediate subsystems to be built? Of course. However, the blueprint is already in place. It's just a matter of constructing the software that meets the component specifications. The foundation for this system will be Signature itself. I would propose that Signature should be an interface that concrete Signatures implement. I won't go into the interface specification here, but it should be straightforward to develop. Before a Signature implementation can be built, a mechansim for Breadth First Traversal must be in place. This is the current focus of Octet. BreadthFirstTraverser will use the same Handler/Controller architecture as other Traversers, and so will be a good starting point for a Signature builder, among other uses. Looking further out, an object-oriented architecture that encapsulates the stages of QSAR analysis needs to be developed: building equations; solving equations; and producing Molecules that match the solutions to the equations. As I mentioned, the blueprint is available - the challenge will be to build components that meet the specification. I'm not sure if this was the original intent of the QSAR Project, which seemed more oriented toward building a QSAR gui. However, I believe that the system I'm proposing would be a critical component of that goal. As a concrete next step, I would propose developing a Signature interface based on Octet. Simultaneously, a default implementation, BasicSignature, could be developed as a reality check for the design. The construction of simple unit tests will give the effort a context. I'm not sure where this prototype should be hosted, but due to the still-fluid nature of the Octet API, I think it would be most convenient to host it in a net.sourceforge.octet.qsar package for the time being. When we're all confident that the low-level features to make this system happen are in place, it can then be moved into a QSAR Project package. This is one direction to take, and I'm open to any suggestions or comments. cheers, rich --------------------------------- Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! |
From: rich a. <che...@ya...> - 2004-07-29 03:03:51
|
Hello All, A change to the Octet (http://octet.sourceforge.net) Molecule API has been made and committed to CVS. BondingSystem now extends AtomGraph. This means, among other things, that BondingSystems can now be traversed with Traversers (such as DepthFirstTraverser and CycleTraverser) and compared to other AtomGraphs with AtomGraphComparators (such as UllmanComparator). Several redundant BondingSystem methods were replaced as a result. An AromaticityTool is now available. It's fairly crude at this stage, simply applying the 4n + 2 rule to the electron count of a cyclic, multi-atom BondingSystem. But it does, for example, detect the seven-membered aromatic ring in the homotropylium cation. I would also like to propose that the method "iterateBondingSystems(Atom neighbor)" be removed from the Atom interface. This method has never been implemented and is largely redundant anyway. Any objections? rich --------------------------------- Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! |
From: Joerg K. W. <we...@in...> - 2004-07-27 15:51:29
|
Hi Rich, I've changed the subject to being more precisely. I agree that things are getting complex, but primitive native numeric/nominal descriptors are only a really small subset of all possible codings for molecular structures (descriptor results). descriptor (parameters, molecule): algorithm to get values descriptor result: storing object for the abstract molecule numeric,nominal value, binary nominal value, atom-pair, mcs, ... query (parameters): a search method getting a list of valid matchings e.g. SMARTS, AP, shape, whatever, ... metric (parameters, descRes1, descRes2): Getting similarity for two possibly codings > But one thing that is not clear to me is how a generic Metric (or Comparator) does its job (without violating encapsulation) of comparing two Descriptor calculations given that the way in which each Descriptor represents itself is unique. For example, a Tanamoto comparison of two fingerprints will be done one way, but a Tanamoto comparison of two TPSA's will be done very differently. A Euclidian distance comparison of Topological Torsion is straightforward, but the same comparison of clogP - that's done very differently, I imagine. Generic would not be the correct term. The basic problem we always have is that 'similarity' can and definitely should not be separated from the metric, because a metric can only interpret the features given. I've tried to find a structure for my private literature and i've now the opinion that coding and similarity are two sides of a coin. So, we can have different images on one of the two sides, but we can not split the coin. So, eventually every descriptorResult should have something like: List=descriptorResult.getPossibleMetrics(); And i've also the opinion that we should be really general here, because most model building algorithms (classification, regression, clustering) need most often only a kind of similarity and a meanValue for a set of molecules. And the primitive euclidian distance of descriptor (sub)sets is only the plain data mining approach with loosing all topologial information (inverse QSAR problem). > And then there's the problem that a generic Metric will need a much wider Descriptor interface to do a comparison than a generic DescriptorResult or Descriptor will have. Hmm, i think the result holds the: coding and the metric addresses: similarity on coding > How does JOELib handle these issues? Not good and really diverse. For general descriptor results i've recently introduced: joelib.math.similarity.DistanceMetric For basic values (numeric or nominal or binary nominal), furthermore there are some hot topics working directly on molecular structures. I will not discuss these things on the public mailing list, but i'm definitely willingly to cooperate here, if the plan is to write a paper using one of the new methods. For all methods we have the atom labelling (set) problem ! EUCLIDIAN, TANIMOTO: joelib.util.ComparisonHelper the euclidian or tanimoto metric is chosen from the kind of descriptor given to setComparisonDescriptor(String) setComparisonDescriptor(String[]) ATOM-PAIR (also unpublished work of Nikolas Fechner available, still in development) joelib.desc.types.atompair.BasicAPDistanceMetric MCS(not public, still in development, paper submitted, eventually i will publish after the paper was accepted, but i'm not sure if i'm willingly to share the implementation advantages so early) Really weird, but i will prefer the abstractest object oriented way you can provide. In fact two results (coding) and metric based on these results. But there are tons of ways you can code (parameters for MCS generation) the MCS and you can apply the metric (parameters for metric) > It almost seems like the "Descriptor" category itself is overly general and needs to be broken down further. Otherwise any Descriptor framework will have to know too much about particular Descriptor implementations with the result being a decidedly non-object-oriented framework that is difficult to extend and maintain. How can we address this? In JOElib every descriptor knows it's result, so if you call result=descriptor.calculate(molecule) you will get the correct result. Because this is done by using Java-Reflection this is not the most efficient way, but if we use result=descriptor.calculate(molecule, result) this will be efficient. Hence, standard users will have to pay a runtime-penalty, because object generation in Java is expensive (see also joelib.desc.ResultFactory). I suggest that every result should know possible metrics. I've also introduced a joelib.desc.DescriptorInfo object Additionally there exists the DescDescription object which holds informations for each descriptor. If you will try: joelib/ant> ant JOELibTestGUI And you will switch to Info-->Descriptors Panel all informations are generated and loaded on the fly by using: 111. DescriptorFactory (get all descriptors JOELib can calculate, so we know the details for them, BTW unavailable documentation will cause annoying warnings, so developers are forced to provide from the beginning documentation files) 222. Get descriptor infos for each descriptor 333. Load single HTML documentation (generated also from DocBook-XML) for each descriptor 444. show informations. Kind regards, Joerg > > rich > > "Joerg K. Wegner" <we...@in...> wrote: > Hi again, > > we should for performace issues not use (as in JOElib): > molecule.calculate("XYZ") > > we should use: > keyXYZ=KeyFactory.getKey("XYZ"); > > // and use internal caching for this descriptor > molecule.calculate(keyXYZ); > > Kind regards, Joerg > > >>Hi Rich, >> >> >>>* Molecule implements AtomGraph. In the near future, BondingSystem >>>should also implement AtomGraph to enable traversal/query with the >>>same tools used for Molecules (any objections to this?) >> >>Good. >> >> >>>* Traversers traverse the graph structure of any AtomGraph. Traversers >>>are low-level components that are helpful for building higher-level >>>functionality. Currently two types of Traverser are available: >>>DepthFirstTraverser and CycleTraverser. Both use a system of Handlers >>>and Controllers - Handlers for handling events generated at various >>>stages of a traversal algorithm and Controllers for exercising limited >>>control over the algorithm itself. This system borrows from SAX's >>>ContentHandler idea. HanserCycleTraverser is an implementation of >>>CycleTraverser that uses Hanser's algorithm for finding the set of all >>>cycles of an AtomGraph using collapsing Path-Graphs. >> >>CycleTraverser should use an interface, so that we can switch the >>traverser. >>If nothing is said a default traverser should be used. >>The traverser should also have an ID and version number analogue to >>descriptors. >> >> >> >>>* MoleculeComparator compares two AtomGraphs for isomorphism, but >>>without comparing atom/bonding properties. UllmanComparator implements >>>MoleculeComparator by using Ullman's subgraph isomorphism algorithm. >>>Like Traverser, MoleculeComparator uses a system of Handlers and >>>Controllers for fine-grained control. It should be possible to use >>>this sytem to create additional isomorphism algorithms implementing >>>MoleculeComparator. >> >>Isn't this only a formulation problem ? >>Can't we use a boolean method compareNode(LabelSet) which uses a set of >>labels to check isomorphism ? >> >> >>>* QueryBuilder enables clients to build a molecular query using the >>>same process that is used for building a Molecule with >>>MoleculeBuilder. In fact, QueryBuilder extends MoleculeBuilder and can >>>be used in many contexts calling for a MoleculeBuilder. QueryBuilder >>>is designed for building queries that are based on a template molecule >>>with constraints placed on individual Atoms with AtomQuery. >> >>Can 'pharmacophores' treated also with this approach. So are combined >>features, e.g. carbon acid group combined to a single feature and a >>distance to all other features allowed ? >> >> >> >>>* SmartsQueryFactory is in the early stages, but is intended to >>>simplify the process of using QueryBuilder by enabling clients to use >>>SMARTS Atomic Primitive strings as keys to obtain a fully functional >>>AtomQuery. Although this isn't exactly a SMARTS parser, it isn't that >>>far from being one given Octet's SmilesReader. Currenly only the >>>wildcard Atomic Primitive ("*") is supported, but other should be >>>appearing soon. The approach here has some elements in common with >>>that of CDK's growing SMARTS support, but there are also some >>>interesting differences. >> >>Same as above, so atom based (not feature based) compareNode(LabelSet) >>method, where the LabelSet is what i would call the chemical kernel atom >>labelling set. >> >> >>>Looking a little further down the road for QSAR, what are people's >>>thoughts on a framework for molecular descriptors? Of course, there >>>hundreds of descriptors, and of course we all have our ideas on what a >>>particular descriptor means or doesn't mean. What I'm actually >>>wondering about is what a descriptor facility in QSAR would look and >>>feel like. I've been looking at JOELib's descriptor framework, which >>>has some reasonable concepts. From what I can tell, there are two >>>basic kinds of descriptor: a "holistic" descriptor that is a single >>>value (i.e. TPSA) and which is primitive-like, and everything else, >>>which tends to be higher-resolution in nature (i.e. Topological >>>Torsion) and more object-like. Are there any other ideas? >> >>With respect to query i would prefer the object approach, so we can use: >>result=molecule.calculate("XYZ") >>or as in JOELib >>result1=calculator.calculate(mol1,"XYZ", Properties) >>result2=calculator.calculate(mol2,"XYZ", Properties) >> >>for matching or similarity we can then use >>// inherited from Comparator in Java API >>// applicable for euclidian, tanimoto, atom-pairs >>similarity=metricThatILike(result1,result2, Properties); >> >>For simple single value descriptors it would be also interesting to have: >>similarity=metricThatILike(ResultSet1,ResultSet2, Properties); >>Also with pharmacophore outlook or multiple graph isomorphism and not >>only pair-wise matching. >> >>So a query is from my standpoint a kind of similarity-metric which can >>only return 0 and 1. Sometimes, as in SMARTS matching we are only >>interested in subgraph isomorphism. >>result1=calculator.calculate(mol1,"XYZ", LabelSet) >>result2=calculator.calculate(mol2,"XYZ", LabelSet) >>// only applicable for this specific calculator >>// can be used for maximum common substructure search (MCS) >>matchings=matchingsThatILike(result1,result2, Properties); >> >>So, for SMARTS matching we need also: >>matchings=matchingsThatILike(query1,result2, Properties); >> >>For pharmacophores 2D/3D/Shape we can also use this appraoch, because >>the representation for the similarity/matching is the relevant point. >>matchings=matchingsThatILike(query1,result2, Properties); >>or >>similarity=metricThatILike(result1,result2, Properties); >> >>Kind regards, Joerg >> >> > > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: rich a. <che...@ya...> - 2004-07-27 14:55:32
|
Hello Joerg, CycleTraverser is an interface and HanserCycleTraverser is a concrete implementation (one of many possible). So a method (say, an aromaticity detector) can take CycleTraverser as an argument and not have to worry about how the cycle perception is done. This is the approach I plan to take with all Traversers. Your question about a node in a search molecule being a group of Atoms ("pharmacophore" search) is one I've been thinking about for a while now. No, I don't think QueryBuilder would be able to do this elegantly. But, yes, I think it can be done by using Reduced Graphs. This is a graph in which the nodes are structure fragments and the edges are connections between the fragments. It would let clients do interesting things like ask "does this molecule have a six-membered ring with a carboxylate and an amine at any relative positions? I like your ideas on the Descriptor API. I've also played around with the idea that a Query is a special case of Descriptor. I think its worthwhile moving in that direction. But one thing that is not clear to me is how a generic Metric (or Comparator) does its job (without violating encapsulation) of comparing two Descriptor calculations given that the way in which each Descriptor represents itself is unique. For example, a Tanamoto comparison of two fingerprints will be done one way, but a Tanamoto comparison of two TPSA's will be done very differently. A Euclidian distance comparison of Topological Torsion is straightforward, but the same comparison of clogP - that's done very differently, I imagine. And then there's the problem that a generic Metric will need a much wider Descriptor interface to do a comparison than a generic DescriptorResult or Descriptor will have. How does JOELib handle these issues? It almost seems like the "Descriptor" category itself is overly general and needs to be broken down further. Otherwise any Descriptor framework will have to know too much about particular Descriptor implementations with the result being a decidedly non-object-oriented framework that is difficult to extend and maintain. How can we address this? rich "Joerg K. Wegner" <we...@in...> wrote: Hi again, we should for performace issues not use (as in JOElib): molecule.calculate("XYZ") we should use: keyXYZ=KeyFactory.getKey("XYZ"); // and use internal caching for this descriptor molecule.calculate(keyXYZ); Kind regards, Joerg > Hi Rich, > >> * Molecule implements AtomGraph. In the near future, BondingSystem >> should also implement AtomGraph to enable traversal/query with the >> same tools used for Molecules (any objections to this?) > > Good. > >> * Traversers traverse the graph structure of any AtomGraph. Traversers >> are low-level components that are helpful for building higher-level >> functionality. Currently two types of Traverser are available: >> DepthFirstTraverser and CycleTraverser. Both use a system of Handlers >> and Controllers - Handlers for handling events generated at various >> stages of a traversal algorithm and Controllers for exercising limited >> control over the algorithm itself. This system borrows from SAX's >> ContentHandler idea. HanserCycleTraverser is an implementation of >> CycleTraverser that uses Hanser's algorithm for finding the set of all >> cycles of an AtomGraph using collapsing Path-Graphs. > > CycleTraverser should use an interface, so that we can switch the > traverser. > If nothing is said a default traverser should be used. > The traverser should also have an ID and version number analogue to > descriptors. > > >> * MoleculeComparator compares two AtomGraphs for isomorphism, but >> without comparing atom/bonding properties. UllmanComparator implements >> MoleculeComparator by using Ullman's subgraph isomorphism algorithm. >> Like Traverser, MoleculeComparator uses a system of Handlers and >> Controllers for fine-grained control. It should be possible to use >> this sytem to create additional isomorphism algorithms implementing >> MoleculeComparator. > > Isn't this only a formulation problem ? > Can't we use a boolean method compareNode(LabelSet) which uses a set of > labels to check isomorphism ? > >> * QueryBuilder enables clients to build a molecular query using the >> same process that is used for building a Molecule with >> MoleculeBuilder. In fact, QueryBuilder extends MoleculeBuilder and can >> be used in many contexts calling for a MoleculeBuilder. QueryBuilder >> is designed for building queries that are based on a template molecule >> with constraints placed on individual Atoms with AtomQuery. > > Can 'pharmacophores' treated also with this approach. So are combined > features, e.g. carbon acid group combined to a single feature and a > distance to all other features allowed ? > > >> * SmartsQueryFactory is in the early stages, but is intended to >> simplify the process of using QueryBuilder by enabling clients to use >> SMARTS Atomic Primitive strings as keys to obtain a fully functional >> AtomQuery. Although this isn't exactly a SMARTS parser, it isn't that >> far from being one given Octet's SmilesReader. Currenly only the >> wildcard Atomic Primitive ("*") is supported, but other should be >> appearing soon. The approach here has some elements in common with >> that of CDK's growing SMARTS support, but there are also some >> interesting differences. > > Same as above, so atom based (not feature based) compareNode(LabelSet) > method, where the LabelSet is what i would call the chemical kernel atom > labelling set. > >> Looking a little further down the road for QSAR, what are people's >> thoughts on a framework for molecular descriptors? Of course, there >> hundreds of descriptors, and of course we all have our ideas on what a >> particular descriptor means or doesn't mean. What I'm actually >> wondering about is what a descriptor facility in QSAR would look and >> feel like. I've been looking at JOELib's descriptor framework, which >> has some reasonable concepts. From what I can tell, there are two >> basic kinds of descriptor: a "holistic" descriptor that is a single >> value (i.e. TPSA) and which is primitive-like, and everything else, >> which tends to be higher-resolution in nature (i.e. Topological >> Torsion) and more object-like. Are there any other ideas? > > With respect to query i would prefer the object approach, so we can use: > result=molecule.calculate("XYZ") > or as in JOELib > result1=calculator.calculate(mol1,"XYZ", Properties) > result2=calculator.calculate(mol2,"XYZ", Properties) > > for matching or similarity we can then use > // inherited from Comparator in Java API > // applicable for euclidian, tanimoto, atom-pairs > similarity=metricThatILike(result1,result2, Properties); > > For simple single value descriptors it would be also interesting to have: > similarity=metricThatILike(ResultSet1,ResultSet2, Properties); > Also with pharmacophore outlook or multiple graph isomorphism and not > only pair-wise matching. > > So a query is from my standpoint a kind of similarity-metric which can > only return 0 and 1. Sometimes, as in SMARTS matching we are only > interested in subgraph isomorphism. > result1=calculator.calculate(mol1,"XYZ", LabelSet) > result2=calculator.calculate(mol2,"XYZ", LabelSet) > // only applicable for this specific calculator > // can be used for maximum common substructure search (MCS) > matchings=matchingsThatILike(result1,result2, Properties); > > So, for SMARTS matching we need also: > matchings=matchingsThatILike(query1,result2, Properties); > > For pharmacophores 2D/3D/Shape we can also use this appraoch, because > the representation for the similarity/matching is the relevant point. > matchings=matchingsThatILike(query1,result2, Properties); > or > similarity=metricThatILike(result1,result2, Properties); > > Kind regards, Joerg > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Qsar-devel mailing list Qsa...@li... https://lists.sourceforge.net/lists/listinfo/qsar-devel --------------------------------- Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. |
From: Joerg K. W. <we...@in...> - 2004-07-26 13:56:10
|
Hi again, we should for performace issues not use (as in JOElib): molecule.calculate("XYZ") we should use: keyXYZ=KeyFactory.getKey("XYZ"); // and use internal caching for this descriptor molecule.calculate(keyXYZ); Kind regards, Joerg > Hi Rich, > >> * Molecule implements AtomGraph. In the near future, BondingSystem >> should also implement AtomGraph to enable traversal/query with the >> same tools used for Molecules (any objections to this?) > > Good. > >> * Traversers traverse the graph structure of any AtomGraph. Traversers >> are low-level components that are helpful for building higher-level >> functionality. Currently two types of Traverser are available: >> DepthFirstTraverser and CycleTraverser. Both use a system of Handlers >> and Controllers - Handlers for handling events generated at various >> stages of a traversal algorithm and Controllers for exercising limited >> control over the algorithm itself. This system borrows from SAX's >> ContentHandler idea. HanserCycleTraverser is an implementation of >> CycleTraverser that uses Hanser's algorithm for finding the set of all >> cycles of an AtomGraph using collapsing Path-Graphs. > > CycleTraverser should use an interface, so that we can switch the > traverser. > If nothing is said a default traverser should be used. > The traverser should also have an ID and version number analogue to > descriptors. > > >> * MoleculeComparator compares two AtomGraphs for isomorphism, but >> without comparing atom/bonding properties. UllmanComparator implements >> MoleculeComparator by using Ullman's subgraph isomorphism algorithm. >> Like Traverser, MoleculeComparator uses a system of Handlers and >> Controllers for fine-grained control. It should be possible to use >> this sytem to create additional isomorphism algorithms implementing >> MoleculeComparator. > > Isn't this only a formulation problem ? > Can't we use a boolean method compareNode(LabelSet) which uses a set of > labels to check isomorphism ? > >> * QueryBuilder enables clients to build a molecular query using the >> same process that is used for building a Molecule with >> MoleculeBuilder. In fact, QueryBuilder extends MoleculeBuilder and can >> be used in many contexts calling for a MoleculeBuilder. QueryBuilder >> is designed for building queries that are based on a template molecule >> with constraints placed on individual Atoms with AtomQuery. > > Can 'pharmacophores' treated also with this approach. So are combined > features, e.g. carbon acid group combined to a single feature and a > distance to all other features allowed ? > > >> * SmartsQueryFactory is in the early stages, but is intended to >> simplify the process of using QueryBuilder by enabling clients to use >> SMARTS Atomic Primitive strings as keys to obtain a fully functional >> AtomQuery. Although this isn't exactly a SMARTS parser, it isn't that >> far from being one given Octet's SmilesReader. Currenly only the >> wildcard Atomic Primitive ("*") is supported, but other should be >> appearing soon. The approach here has some elements in common with >> that of CDK's growing SMARTS support, but there are also some >> interesting differences. > > Same as above, so atom based (not feature based) compareNode(LabelSet) > method, where the LabelSet is what i would call the chemical kernel atom > labelling set. > >> Looking a little further down the road for QSAR, what are people's >> thoughts on a framework for molecular descriptors? Of course, there >> hundreds of descriptors, and of course we all have our ideas on what a >> particular descriptor means or doesn't mean. What I'm actually >> wondering about is what a descriptor facility in QSAR would look and >> feel like. I've been looking at JOELib's descriptor framework, which >> has some reasonable concepts. From what I can tell, there are two >> basic kinds of descriptor: a "holistic" descriptor that is a single >> value (i.e. TPSA) and which is primitive-like, and everything else, >> which tends to be higher-resolution in nature (i.e. Topological >> Torsion) and more object-like. Are there any other ideas? > > With respect to query i would prefer the object approach, so we can use: > result=molecule.calculate("XYZ") > or as in JOELib > result1=calculator.calculate(mol1,"XYZ", Properties) > result2=calculator.calculate(mol2,"XYZ", Properties) > > for matching or similarity we can then use > // inherited from Comparator in Java API > // applicable for euclidian, tanimoto, atom-pairs > similarity=metricThatILike(result1,result2, Properties); > > For simple single value descriptors it would be also interesting to have: > similarity=metricThatILike(ResultSet1,ResultSet2, Properties); > Also with pharmacophore outlook or multiple graph isomorphism and not > only pair-wise matching. > > So a query is from my standpoint a kind of similarity-metric which can > only return 0 and 1. Sometimes, as in SMARTS matching we are only > interested in subgraph isomorphism. > result1=calculator.calculate(mol1,"XYZ", LabelSet) > result2=calculator.calculate(mol2,"XYZ", LabelSet) > // only applicable for this specific calculator > // can be used for maximum common substructure search (MCS) > matchings=matchingsThatILike(result1,result2, Properties); > > So, for SMARTS matching we need also: > matchings=matchingsThatILike(query1,result2, Properties); > > For pharmacophores 2D/3D/Shape we can also use this appraoch, because > the representation for the similarity/matching is the relevant point. > matchings=matchingsThatILike(query1,result2, Properties); > or > similarity=metricThatILike(result1,result2, Properties); > > Kind regards, Joerg > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |