octet-devel Mailing List for Octet (Page 2)

Status: Alpha

Brought to you by: r_apodaca

octet-devel — Octet developer list.

You can subscribe to this list here.

2004	_Jan	_Feb	_Mar	_Apr (3)	_May (11)	_Jun (7)	_Jul (12)	_Aug (10)	_Sep	_Oct (2)	_Nov (10)	_Dec (14)
2005	_Jan (3)	_Feb	_Mar (1)	_Apr	_May	_Jun (1)	_Jul (1)	_Aug (1)	_Sep (1)	_Oct	_Nov	_Dec
2006	_Jan	_Feb	_Mar	_Apr (2)	_May	_Jun	_Jul	_Aug (2)	_Sep (5)	_Oct (31)	_Nov (13)	_Dec

Flat | Threaded

<< < 1 2 3 4 > >> (Page 2 of 4)

[octet-devel] Proposal for Molecule, Atom, BondingSystem API changes

From: rich a. <che...@ya...> - 2004-11-27 18:56:22

I would like to propose some additional changes to the
Octet API (http://octet.sf.net).

These methods would be removed from the Atom
interface:

iterateNeighbors()
countNeighbors()
isConnectedTo(Atom atom)
countElectrons()
iterateBondingSystems()
countBondingSystems()
countReservedElectrons()
toNeighborArray()
toBondingSystemArray()
getConfiguration()

... and their equivalents would be placed into either
the Molecule or AtomGraph interface. For example,
Atom.getConfiguration() would become
Molecule.getConfiguration(Atom atom).  AtomGraph
already has iterateNeighbors(Atom atom) and
countNeighbors(Atom atom).

I believe these changes will result in a more
consistent, robust system. The methods in question
report state that is only meaningful within a Molecule
or AtomGraph context. It is confusing, for example,
for an AtomGraph to contain Atoms that can report on
their electronic configuration because that is a
Molecule-specific property. Or to have an AtomGraph
whose Atoms can report different connectivity than the
hosting AtomGraph itself. In addition, this approach
makes it convenient for MoleculeDecorator to override
nearly all Molecule functionality without the need for
an AtomDecorator.

The following methods would remain in the Atom
interface because they do not depend on a Molecule or
AtomGraph context:

getNucleus()
getLabel()

These changes would mean that both an Atom and its
enclosing AtomGraph or Molecule would need to be
passed as parameters to most methods operating on
Atoms. Most code has already moved in this direction,
so the changes to Octet itself will be minimized.

For consistency, the countElectrons() method of
BondingSystem would be moved into
Molecule.countElectrons(BondingSystem system). This
would result in the BondingSystem interface supporting
no methods beyond those inherited from AtomGraph. As a
result, BondingSystem, BondingSystemCollection, and
BondingSystemIterator could actually all be deleted
and replaced by comparable AtomGraph counterparts. But
I'm not sure I want to go that far, yet.

If there are no objections in the next few days, I
will go ahead and make these changes.

best,
r


		
__________________________________ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 
http://mobile.yahoo.com/maildemo

[octet-devel] Octet - full stereochemistry support now implemented

From: rich a. <che...@ya...> - 2004-11-26 23:12:24

I have just committed changes to the Octet CVS that
enable the specification and comparison of arbitrary
molecular conformations (http://octet.sf.net). Example
code can be found in
net.sourceforge.octet.junit.ConformationTest.

As with atomic Configuration, molecular Conformation
is specified using a system originally outlined by
Dietz (J. Chem. Inf. Comput. Sci. 1995, 35, 787). This
paper has served as the blueprint for many of Octet's
bonding and stereochemical concepts.

As an example of the use of this system, consider
(E)-2-pentene. It's structure is produced from the
following code (taken from
net.sourceforge.octet.util.TestMolecules):

  public static void
buildTrans2Pentene(MoleculeBuilder builder)
  {
    String c = "C";
    
    AtomHandle c0 = builder.addAtom(c);
    AtomHandle c1 = builder.addAtom(c);
    AtomHandle c2 = builder.addAtom(c);
    AtomHandle c3 = builder.addAtom(c);
    AtomHandle c4 = builder.addAtom(c);
    
    builder.connect(c0, c1, 1);
    builder.connect(c1, c2, 2);
    builder.connect(c2, c3, 1);
    builder.connect(c3, c4, 1);
    
    GammaSequenceHandle gamma =
builder.addGammaSequence(c1);

    builder.connect(c2, gamma);
    
    builder.configure(gamma, c0,
StereoKit.getTrigonalAngle(), 0);
    builder.configure(gamma, c3,
StereoKit.getTrigonalAngle() / 2, Math.PI);
  }

The method for specifying the (E)/(Z) stereochemistry
of double bonds, as above, is identical to that for
specifying the axial chirality of biaryls (see
net.sourceforge.octet.util.TestMolecules.buildRBinaphthyl())
 and allenes. This method, in turn, closely resembles
the method for specifying atomic Configuration.

With the addition of Conformation, it is now possible
to use Octet to do the following:

(1) Specifiy and query any conceivable molecular
bonding arrangement.

(2) Specify and query any conceivable atomic
Configuration.

(3) Specify and query any conceivable molecular
Conformation.

It is furthermore possible to do (1)-(3) without
resorting to special handling or ad hoc rules. In
fact, this system enables the consistent and faithful
representation of bonding arrangements and/or
stereochemical features that are simply not
repesentable by nearly all other toolkit or file
format. This means that clients needs to know very
little about the intent of the programmer (or file
format) that built a particular Molecule, making
client code easier to develop, interpret, and debug.

This sytem does not currently support fully specifying
topologically "exotic" molecules such as knots,
moebius strips, or rotaxanes, although these may be
supported in Octet 2.0 (only half-joking).

The underlying default implementation will still
require some refactoring/debugging in the weeks ahead.
But I think the system in its current state gives a
good flavor for the generality, accuracy, and
consistency that is possible.

The next steps from here will consist of: (1) a round
of refactoring that will include API changes
previously proposed on these lists, and likely other
changes (to be announced) as well; (2) a concerted
effort to extensively test and debug all subsystems;
and (3) an API freeze in preparation for the release
of Octet 1.0.0. We're almost there!

Of course, the reason I'm cross-posting all of this to
the qsar-devel list is because of earlier interest in
using Octet as the starting point for a molecular
abstraction layer for the QSAR project. What core
functionality does Octet still need to serve in this
capacity?

best,
r


		
__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com

[octet-devel] Octet-0.4.0 released, includes stereochem. updates

From: rich a. <che...@ya...> - 2004-11-21 20:49:40

Octet-0.4.0 has been released (http://octet.sf.net).
This release marks the first version to support the
specification and comparison of atomic stereochemical
configuration. Several new interfaces have been
defined, and the MoleculeBuilder and Atom interfaces
have been updated. BasicQueryBuilder now returns a
MoleculeQuery that compares atomic Configuration in
addition to constitution. A bug in the
UllmanIsomorphismTraverser that caused the same model
Atoms to be multiply traversed in certain situations
was fixed.

Unit tests that demonstrate the functionality of the
stereochemistry subsystem are included
(net.sourceforge.octet.junit.StereoTest). These unit
tests differentiate (R)- and (S)-isobutanol as well as
(R,R)- (S,S)-, and meso-2,3-butanediol. In addition,
both cisplatin and transplatin are distinguishable.

The cisplatin example demonstrates how Octet enables
non-tetrahedral configurational stereochemistry to be
compared using the same flexible formalism as
tetrahedral configurational stereochemistry. No ad-hoc
rules or special treatments are necessary.

Some code remains stereochemically unaware. For
example, MolfileReader and AdapterMolecule do not
recognise stereochemical configuration. Also, this
release of Octet breaks compatibility with CDKTools
0.3.0.

The next release of Octet should complete the
stereochemical subsystem by enabling the specification
of molecular conformation. The mechanism for doing so
will be analogous to that for atomic configuration.

If you'd like to help, there is plenty to do. For
example, we really need to develop more tests of the
stereochemistry subsystem. Any code fragments that
create a Molecule with a configuration would be
helpful. A usability layer that can derive
Cahn-Ingold-Prelog stereodescriptors (or maybe,
conversely, use such a stereodescriptor to configure a
Molecule) is within reach but will still require some
effort. Documentation can always be used ;-).

best,
r


		
__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com

[octet-devel] Removing getMolecule() from Atom and BondingSystem interfaces

From: rich a. <che...@ya...> - 2004-11-20 22:42:55

I would like to propose removing getMolecule() from
the Atom and BondingSystem interfaces.

The rationale for providing these methods originally
was convenience. A method using an Atom would not need
the Molecule passed as a parameter as well. So,
methods that might require two parameters doFoo(Atom,
Molecule) only required one doFoo(Atom).

However, this convenience comes at the price of
extensibility and consistency.

For example, it's increasingly clear that the
Decorator Pattern will play a big role in extending
Octet. What if we want to develop a cannonicalization
scheme? We could extend MoleculeDecorator:

public class CannonicalizationMolecule extends
MoleculeDecorator
{
  public CannonicalizationMolecule(Molecule molecule,
Map atomMap)
  {
    // implementation
  }

  public int getAtomIndex(Atom atom)
  {
    // return cannonicalized index
  }

  // .. other overrides
}

The problem arises when methods that are ignorant of
this cannonicalization try to use Atom.getMolecule()
and end up getting the wrong (undecorated) Molecule
that has the uncannonicalized numbering scheme.

There are other situations. For example, I've been
batting around the idea of using Atoms and possibly
other Molecule components as Flyweights to enable the
efficient manipulation of extremely large Molecule
sets.Such a system is very difficult if each Atom
needs to return a unique Molecule.

I think the resulting system would encourage greater
consistency. Interface methods that require an Atom's
Molecule context will need to be designated as such,
rather than leaving implementations to their own
devices.

Practically speaking, I've always found that
Atom.getMolecule() could be replaced one way or
another with minimal fuss.

This proposal would only require small changes to
Octet itself, specifically in MoleculePrinter.

If there are no objections in the next few days, I
will go ahead and make the changes.

best,
r


		
__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com

[octet-devel] Removing releaseMolecule() from MoleculeBuilder interface

From: rich a. <che...@ya...> - 2004-11-20 21:46:06

I would like to propose removing the releaseMolecule()
method from the MoleculeBuilder interface.

The releaseMolecule() method is inappropriate for
certain MoleculeBuilder implementations. For example,
QueryBuilder should never release a Molecule.
Currently, BasicQueryBuilder throws an
UnsupportedOperationException when releaseMolecule()
is called. But this approach lacks the clarity and
sturdiness of simply removing the method from the
interface.

This change would require clients to use an
implementation-specific releaseFoo() method to release
an item Foo from a MoleculeBuilder. So, for example,
BasicQueryBuilder would define
releaseSubstructureQuery() and
releaseExactStructureQuery(). Similarly,
CDKMoleculeBuilder (cdktools package) would only
define releaseCDKMolecule(), but not releaseMolecule()
- defining both methods would only be confusing and
slightly redundant.

This change would result in only minor modifications
to Octet itself. In particular, TestMolecules would
require buildFoo(MoleculeBuilder) methods and all
createFoo(MoleculeBuilder) methods would be deleted.

I will go ahead and make these changes if there are no
objections in the next few days.

best,
r


		
__________________________________ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com

[octet-devel] Re: [QSAR-devel] Stereochemistry implementation - first pass complete

From: Joerg K. W. <we...@in...> - 2004-11-17 16:03:50

Excellent, i will have a look.

1. BTW, i've started to refactor JOELib to separate the coding and the 
implementation. I'm still not at the discussion level, but i'm on the 
way. I'm still working on the externalization from assigning all those 
nasty properties in Atoms, Bonds and the Molecule. If i have succeded i 
will be happy to discuss how to define the interface. The new openend 
branch in the CVS is called 'joelib2-redesign'.

2. I've heard from another person that the Beilstein institute likes 
Richs approach very much, because its the most general one in contrast 
to JOELib and CDK. So found this a convincing argument for starting a 
complete refactoring.

Kind regards, Joerg

> I have committed the first set of changes to Octet
> (http://octet.sf.net) that enable the specification
> and comparison of atomic stereochemical configuration.
> 
> As I've written previously, this code is based on a
> specification provided by Andreas Dietz (J. Chem. Inf.
> Comput. Sci. 1995, 787).
> 
> I view this code as a first step and very rough. I
> don't expect the interface definitions to change much,
> but the implementation is quite inefficient and
> probably buggy. In particular, the spherical polar
> coordinate manipulations in BasicMoleculeBuilder are
> probably more complicated than necessary.
> 
> Nevertheless, I have included a unit test
> (net.sourceforge.octet.junit.StereoTest) that
> demonstrates that two enantiomers of isobutanol can be
> identified as having opposite configurations by this
> system. This unit test also demonstrates how clients
> will use the updated MoleculeBuilder interface to
> define atomic configuration using spherical polar
> coordinates.
> 
> I am unaware of any other implementation of this
> stereochemistry specification in any language, so it
> will be interesting to see how it evolves. The system
> is a significant departure from every other method,
> but I believe the payoff is well worth the steep
> learning curve.
> 
> After I have cleaned up the implementation a bit, I
> plan to tackle molecular conformation next. Once that
> phase is complete, it should be possible to define
> practically any stereochemical arrangement using a
> single flexible formalism. An abstraction layer may be
> helpful to simplify the use of this system for
> standard cases (ie. tetrahedral carbon). It is at that
> point that the API will be frozen in preparation for
> the release of Octet 1.0 (and hopefully progress on
> the QSAR project).
> 
> As always, comments and feedback are welcome.  I did
> my best with the documentation of this code, but one
> really needs to read Dietz' paper carefully (and
> repeatedly :-)) to understand how the system works.
> I'm also more than willing to try to explain how I
> understand it.
> 
> best,
> r
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Check out the new Yahoo! Front Page. 
> www.yahoo.com 
>  
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Qsar-devel mailing list
> Qsa...@li...
> https://lists.sourceforge.net/lists/listinfo/qsar-devel
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] Stereochemistry implementation - first pass complete

From: rich a. <che...@ya...> - 2004-11-15 03:52:31

I have committed the first set of changes to Octet
(http://octet.sf.net) that enable the specification
and comparison of atomic stereochemical configuration.

As I've written previously, this code is based on a
specification provided by Andreas Dietz (J. Chem. Inf.
Comput. Sci. 1995, 787).

I view this code as a first step and very rough. I
don't expect the interface definitions to change much,
but the implementation is quite inefficient and
probably buggy. In particular, the spherical polar
coordinate manipulations in BasicMoleculeBuilder are
probably more complicated than necessary.

Nevertheless, I have included a unit test
(net.sourceforge.octet.junit.StereoTest) that
demonstrates that two enantiomers of isobutanol can be
identified as having opposite configurations by this
system. This unit test also demonstrates how clients
will use the updated MoleculeBuilder interface to
define atomic configuration using spherical polar
coordinates.

I am unaware of any other implementation of this
stereochemistry specification in any language, so it
will be interesting to see how it evolves. The system
is a significant departure from every other method,
but I believe the payoff is well worth the steep
learning curve.

After I have cleaned up the implementation a bit, I
plan to tackle molecular conformation next. Once that
phase is complete, it should be possible to define
practically any stereochemical arrangement using a
single flexible formalism. An abstraction layer may be
helpful to simplify the use of this system for
standard cases (ie. tetrahedral carbon). It is at that
point that the API will be frozen in preparation for
the release of Octet 1.0 (and hopefully progress on
the QSAR project).

As always, comments and feedback are welcome.  I did
my best with the documentation of this code, but one
really needs to read Dietz' paper carefully (and
repeatedly :-)) to understand how the system works.
I'm also more than willing to try to explain how I
understand it.

best,
r


		
__________________________________ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com

[octet-devel] Stereochemical representation and manipulation

From: rich a. <che...@ya...> - 2004-11-02 15:21:24

In the next week I propose modifying the Octet
Molecle, MoleculeBuilder, and Atom interfaces to
support molecular stereochemical configuration and
conformation (http://octet.sourceforge.net).

The specification for these changes can be found in
the article by Dietz: J. Chem. Inf. Comput. Sci. 1995,
35, 787.

The philosphy of this approach can be summed up in a
quote from the paper:

"Note that a molecular structure representation cannot
free the user from the task to decide how a chemical
structure should be represented. However, a lack of
versatility might force the user to represent a
chemical structure in a certain manner, even if he
would prefer to represent it differently."

At its core, this system will use the object oriented
equivalent of Dietz's "pencil of planes" idea.

This system should enable the unambiguous assignment
of stereochemical configuration to any atom. For
example, the mechanism of assigning the configuration
of a tetrahedral carbon is identical to assigning the
configuration of an octahedral or trigonal bipyramidal
metal center or transition state.

It should also be possible to unambigously specify all
forms of conformational stereochemistry such as that
found in allenes and biaryls, and E/Z isomerism unsing
a mechanism analogous to that for configuration. The
boat and chair forms of cyclohexane could also be
distinguishable through this mechanism, if required by
clients.

Because of its generality, this system represents
something of a departure from other methods for
handling conformation and configuration. To help
flatten the learning curve, I propose one or more
helper classes that can do such things as report a
Cahn-Ingold-Prelog stereodescriptor for a tetrahedral
carbon and determine arbitrary atomic configurations
as being identical, enantiomeric, or completely
different.

MoleculeBuilder will be updated to enable this
specification. Here, a spherical polar coordinate
system with an atom as its origin will be used. This
allows each atomic configuration to be set
independently from the others in a molecule. I was
actually surprised how straightforward it is to
specify configurations and conformations using
spherical polar coordinates.

These changes should enable just about any type of
stereochemistry to be consistently represented and
queried. One limitation is that stereochemistry
resulting from topological chirality will still be
undefinable, such as that in helicenes, or knots. Then
again, no system I'm aware of does this and the demand
for this will be just about nill for the forseeable
future.

For now these changes will be limited to the Octet
CVS. I don't think they will begin to appear in
releases for about a month or so.

Any feedback on this proposal would be welcomed.

cheers,
rich


		
__________________________________ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com

[octet-devel] Re: [QSAR-devel] Stereochemistry Specification

From: Dr P. Murray-R. <pm...@ca...> - 2004-10-18 07:22:05

On Oct 18 2004, rich apodaca wrote:

> I'm in the process of trying to introduce
> stereochemistry into Octet. In particular, I want to
> enhance the Molecule (or Atom) interface to enable
> clients to specify and query molecular stereochemical
> information.

I think atom and bond stereochemistry is tractable. I think most of the 
rest is problematic

> 
> So my simple question is: What is the most useful way
> to represent stereochemistry in a cheminformatics
> framework?
> 
> Ideally, any system that gets implemented should allow
> the following:
> 
> (1) Unambiguous representation of any chiral
> configuration: alkenes, 
yes

allenes, 

possible if central atom is given tetrahedral status

biaryls,

possible and messy if a dummy at is placed at centre of bond

> metallocenes, 

I know of no system and I would argue against developing one

tetrahedral carbon, 

yes

etc.

helicenes, etc
no current method

6-coordinates complexes
no method in common use

> 
> (2) A uniform method for querying molecules to obtain
> stereochemical information. The method used for biaryl
> chirality, for example, should be identical to that
> for tetrahedral carbon chirality.

It canb't be identical as there isn't a central atom, but it could be 
similar
> 
> (3) Stereochemistry should be specified without
> reference to a 3-D coordinate system.

Stereochemistry can be deduced from 3D coords. JUMBO already does this

> 
> (4) The solution should be as "intuitive" as possible.

The only useful approach is common usage. I am on a IUPAC committee on this 
topic. We have now agreed what a wedge and hatch bond means and how to use 
them. In most cases
> 
> (5) The solution should be flexible enough to never
> require special handling for unusual types of
> stereochemistry.

I don't think this is possible

> 
> For concrete stereochemistry implementations, I have
> looked mainly at CDK, OpenBabel, and JOELib, all of
> which appear to have some level of support for
> stereochemistry. All three appear to use a system of
> chiral flags on Atoms, Bonds, or both. Unfortunately,
> I haven't been able to find detailed documentation on
> many aspects of these approaches. In addition, it
> appears to me that the chiral flag approach is
> fundamentally not general enough to enable point (1).
> 
> I have been quite interested in a model for
> stereochemistry outlined by Akutsu:
> 
> J. Chem. Inf. Comput. Sci. 1991, 31, 414-417
> 
> The idea behind this paper is to transform a molecular
> graph with an ordered adjacency list representation
> for atomic neighbors into another unique graph
> representation in which the stereochemical topology is
> automatically encoded in the graph. The major drawback
> I see with this approach is the production of some
> potentially very large graphs. In addition, I'm not
> sure how to apply this approach to chirality with no
> stereocenter, as with biaryls.

I doubt it can be done
> 
> I have also been looking at another approach outlined
> by Dietz:
> 
> J. Chem. Inf. Comput. Sci. 1995, 35, 787
> 
> Unfortunately, this approach seems to require as a
> starting point at least partial knowledge of 3-D
> coordinates in order to specify chirality, which is
> not consistent with point (3) above. I believe that
> dealing with 3-D coordinates in any form greatly
> increases the complexity of specifying and using
> chirality. On the other hand, this system allows for
> the complete specification and differentiation of all
> chiral configurations of any molecule. And it may be
> possible to provide some kind of developer tool that
> makes it easier to use this approach. Another
> potential drawback of this approach my be the need to
> use only non-hydrogen-suppressed graphs.

There are some molecules where the only realistic method of describing the 
structures themselves is to give the 3D coordinates. Examples are mertal 
clusters. and what would you do with fluxional molecules?
> 
> Any info to help me move forward would be helpful.

Moreover there are few systems that can author anything other than atom and 
bond stereo

I would stick with atom-centered and bond-based.

JUMBO does all the required conversions between 2D and 3D

P.

> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> ------------------------------------------------------- This SF.net 
> email is sponsored by: IT Product Guide on ITManagersJournal Use IT 
> products in your business? Tell us what you think of them. Give us Your 
> Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more 
> http://productguide.itmanagersjournal.com/guidepromo.tmpl 
> _______________________________________________ Qsar-devel mailing list 
> Qsa...@li... 
> https://lists.sourceforge.net/lists/listinfo/qsar-devel

[octet-devel] Stereochemistry Specification

From: rich a. <che...@ya...> - 2004-10-17 23:25:42

I'm in the process of trying to introduce
stereochemistry into Octet. In particular, I want to
enhance the Molecule (or Atom) interface to enable
clients to specify and query molecular stereochemical
information.

So my simple question is: What is the most useful way
to represent stereochemistry in a cheminformatics
framework?

Ideally, any system that gets implemented should allow
the following:

(1) Unambiguous representation of any chiral
configuration: alkenes, allenes, biaryls,
metallocenes, tetrahedral carbon, etc.

(2) A uniform method for querying molecules to obtain
stereochemical information. The method used for biaryl
chirality, for example, should be identical to that
for tetrahedral carbon chirality.

(3) Stereochemistry should be specified without
reference to a 3-D coordinate system.

(4) The solution should be as "intuitive" as possible.

(5) The solution should be flexible enough to never
require special handling for unusual types of
stereochemistry.

For concrete stereochemistry implementations, I have
looked mainly at CDK, OpenBabel, and JOELib, all of
which appear to have some level of support for
stereochemistry. All three appear to use a system of
chiral flags on Atoms, Bonds, or both. Unfortunately,
I haven't been able to find detailed documentation on
many aspects of these approaches. In addition, it
appears to me that the chiral flag approach is
fundamentally not general enough to enable point (1).

I have been quite interested in a model for
stereochemistry outlined by Akutsu:

J. Chem. Inf. Comput. Sci. 1991, 31, 414-417

The idea behind this paper is to transform a molecular
graph with an ordered adjacency list representation
for atomic neighbors into another unique graph
representation in which the stereochemical topology is
automatically encoded in the graph. The major drawback
I see with this approach is the production of some
potentially very large graphs. In addition, I'm not
sure how to apply this approach to chirality with no
stereocenter, as with biaryls.

I have also been looking at another approach outlined
by Dietz:

J. Chem. Inf. Comput. Sci. 1995, 35, 787

Unfortunately, this approach seems to require as a
starting point at least partial knowledge of 3-D
coordinates in order to specify chirality, which is
not consistent with point (3) above. I believe that
dealing with 3-D coordinates in any form greatly
increases the complexity of specifying and using
chirality. On the other hand, this system allows for
the complete specification and differentiation of all
chiral configurations of any molecule. And it may be
possible to provide some kind of developer tool that
makes it easier to use this approach. Another
potential drawback of this approach my be the need to
use only non-hydrogen-suppressed graphs.

Any info to help me move forward would be helpful.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: [octet-devel] Differences/similarities between CDK, JOElib, Octet, Jmol, Jchempaint, Structure

From: rich a. <che...@ya...> - 2004-08-13 02:43:59

Hello Ola,

I can see how the similarities and differences among the half dozen or so Java cheminformatics frameworks/applications might be difficult to pick out (CDK, JChemPaint, JOELib, Octet, Structure, JUMBO, Marvin, soon QSAR, others) . The good news is most of them are open source. I'd like to offer some thoughts on what we're trying to do with Octet (http://octet.sourceforge.net) and Structure (http://structure.sourceforge.net) because I'm most involved with those.

Learning a new API is difficult, especially with something as multifaceted as cheminformatics. In part to flatten the learning curve, Octet is designed to deliver the minimal functionality that will be needed in all cheminformatics contexts. This is actually a lot harder than it might sound - the temptation to add just a little bit of extra functionality here and there is very powerful. As a result, Octet's major functionality consists of:

(1) The representation of Molecules with any bonding arrangement, from nonclassical carbocations to coordination complexes to inorganics to simple organic compounds is possible. All Molecules are queried using a unified interface that requires no exceptional handling for "wierd" molecules.

(2) All model-level objects (Atom, Molecule, etc.) are defined in terms of Java interfaces. The ways that concrete Molecules are implemented can vary drastically, but as long as the interface methods give consistent results, Octet can handle them all. This enables Octet users to fine-tune the Molecule implementation to their particular needs. For example, when dealing with large numbers of Molecules, low memory usage may be a high priority. When working with a limited number of very large Molecules such as proteins, the ability to speedily address and manipulate Atoms and bonding arrangements may be critical. An implementation that works in one case may fail miserably for the other case. So the flexibility to choose is essential for a robust framework.

(3) Simplified SMILES, Molfile, and SD file format readers and writers.

(4) An API for traversal of Molecules as graph objects. Breadth-first, depth-first, cycle, and isomorphism traversal are all possible via a consistent API.

(5) An API for substructure, exact-structure, and query atom queries.

(6) Identification of essential Molecule properies such as hydrogen atom count, formal bond order, and electron count.

(7) To be implemented in the near future, definition and manipulation of molecular stereochemistry.

And that's it for the functionality itself. Of course, this narrow focus leaves many specialized areas untouched, but the features above will be essential for most cheminformatics problems.

Recently, support for the use of CDK Molecules within Octet and the use of Octet Molecules within CDK has been developed. This package is called CDKTools. A copy with source code and unit tests can be downloaded here: https://sourceforge.net/project/showfiles.php?group_id=96108

By keeping the API small and simple, we hope to increase the probability that Octet will become a stable framework that is easy to learn, use, and especially extend.

Structure extends Octet's capabilities by enabling 2D structure drawing of Molecules. Not much progress has been made on this project recently - due mainly to efforts to move Octet closer to an API freeze and eventual 1.0 release, but it is indeed still alive.

The overall approach to Structure is similar to the approach taken with Octet: to deliver the minimal functionality that will be needed in the majority of 2D molecular rendering contexts. 2D coordinate generation falls into that category, and so it is a goal for the project. CDK has has done a wonderful job with 2D structure layout. But there is clearly room for a variety of new approaches in this largely neglected area, especially given the complementary functionality that Octet provides.

Regarding JChemPaint and Structure, both are aimed at 2D molecular rendering. However, they address the problem from different perspectives (feel free to correct me if I'm misstating, Egon). JChemPaint is a client-side application/applet that enables both rendering and editing of molecules, and has features that can be used as a library. Structure is solely a framework for 2D Molecule rendering that will provide the functionality on which rendering applications can be built. This may sound like a minor distinction at first, but it results in a very different set of design decisions that need to be made, bugs that need to be fixed, and resource committment.

Well, that's a long-winded attempt to try to answer your questions. Let me know if I can give you any further info.

best,
rich

Ola Spjuth <ola...@lc...> wrote:
Hello,

I am a little confused and don't know how these projects overlap and
their licenses.

CDK LGPL
JOElib GPL
Octet LGPL
Jmol LGPL
Jchempaint GPL
Structure LGPL

1) Have I understood the licenses above correctly? On some SF pages
(joelib & jchempaint) it says GPL or LGPL. What does that mean? May I
choose?

2) How much do CDK and JOElib overlap? I know you can use them together,
what are the benefits of this? Descriptors? Will descriptors not be
implemented in CDK?

3) What does Octet add to this mix (except that it's LGPL and JOElib is
not)? Can it be used with CDK? Overlap? Are the projects competing
against each other?

4) What does the Structure project add to all this (except that it's
built on Octet and LGPL)? The homepage says they are working on SDG,
isn't that already present in CDK? Doesn't JchemPaint do the same thing
as Structure?

I am posting this question in the CDK, Octet and JOElib mailinglists in
order to get more extensive information.

Best regards,

.../Ola Spjuth

--
---
Ola Spjuth, PhD student
Dept of Pharmacology & Linnaeus Centre for Bioinformatics
Uppsala University, Sweden

-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
octet-devel mailing list
oct...@li...
https://lists.sourceforge.net/lists/listinfo/octet-devel

---------------------------------
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.

[octet-devel] JGraphT as base for Octet, QSAR

From: Joerg K. W. <we...@in...> - 2004-08-12 15:14:49

Hi Rich,

that are good news, indeed.

I've still the long term refactoring idea to create a refactored 
JOELib2. So my plan is to use Octet, JGraphT and JOElib as base.

So this is nothing which can be done in a short time period, my plan is 
to provide a primitive implementation until the middle of the next year.

I can understand that you want not to focus on JGraphT, but from my 
standpoint of view we must focus on a graph implementation soon or 
things will get to wilde.
Especially, because i will need a default implementation to grant a 
stable basis for descriptor calculation algorithms.
A framework is fine, but a working application has also its benefits :-)
Both will grant a high maintenance-ability with, hopefully, many, many 
users ...

Kind regards, Joerg

> Hello Joerg,
>  
> I have good news for you. You *can* use JGraphT, any other existing graph framework, or any new graph framework, as a base for implementing the Molecule interface. Clients are free to choose the most optimized graph implementation they can find.
>  
> This is the big advantage of having business objects inherit an interface definition rather than a concrete class.
>  
> The way I would do this is to first implement the MoleculeBuilder interface, say JGraphTMoleculeBuilder. The implementation would build a JGraphT concrete Graph implementation behind the scenes (i.e. as a private instance) as MoleculeBuilder methods are invoked. When releaseMolecule() is invoked, JGraphTMoleculeBuilder would then wrap its JGraphT Graph in a Molecule implementation defined as a private inner class of JGraphTMoleculeBuilder. Hopefully I've explained this clearly, if not let me know and I can supply some skeletal source code.
>  
> You could even add a special method in JGraphTMoleculeBuilder that would release the JGraphT Graph for further manipulation. So then you could use Octet's SimpleSmilesReader, StuctureDataReader, or MolfileReader to build a JGraphT Graph instead of an Octet Molecule. This graph could then be used with all of the rich functionality in the JGraphT package, including traversers.
>  
> And all of this can happen without Octet (or QSAR) needing to know about JGraphT directly and without changing Octet in any way.
>  
> I actually considered using JGraphT (and other graph frameworks as well) for the default Molecule implementation. I decided against it mainly for simplicity. I didn't want Octet to require a lot of external dependencies. Not only that, but JGraphT comes with much more functionality, and in some cases not the correct functionality, for what I wanted to do with Octet.
>  
> Now, if the idea was to have a Molecule interface definition that extends the JGraphT Graph interface, I'm not in favor of that. The main reason is immutability. JGraphT's Graph interface is loaded with public mutator methods - meaning that any client can change a Graph representation at any time. To get around the inconsistencies this can lead to, JGraphT introduces GraphListener. But this means that every class that wants to be informed of a change to a graph needs to add itself as a listener - something that is easy to forget to do. It's also easy to forget to remove a class as a listener - preventing the garbage collector from deleting it, a form of "memory leak".
>  
> Since I could think of almost no situation that would require a Molecule to be modified once it was created, I decided to make Molecule and all of the interfaces it depends on (AtomPair, BondingSystem, Atom) immutable. The leads to simplification of the interface, more streamlined client code, no need for copy constructors or clone() methods, and also makes it harder to create bugs deriving from inconsistent Molecule state.
>  
> In summary, Octet supports Molecules with any underlying graph representation. But I would leave this kind of optimization up to users and wouldn't want to make it part of Octet. I would not favor Molecule inheriting JGraphT's Graph interface.
>  
> best,
> rich
> 
> "Joerg K. Wegner" <we...@in...> wrote:
> Hi Rich,
> 
> i know that my idea might be unpopular, but i think we should use also 
> jgrapht (LGPL) as base for octet, because they provide already some 
> graph-algorithms and traversers.
> 
> The 'simple graph' can be the default base for a molecule:
> org._3pq.jgrapht.graph.SimpleGraph
> 
> The implementation looks fine, the only thing i'm missing is the 
> labeling functionality for edges and vertexes. I've added a feature 
> request to theri tracking system:
> http://sourceforge.net/tracker/index.php?func=detail&aid=1007815&group_id=86459&atid=579690
> 
> 1. vertexes are no problem, because they are handled as Objects and 
> efficience for storing and removing is O(1), by accessing them by their 
> hashCode- and equals-method (unique identifier, e.g. pointer or index 
> number).
> An vertex interface with labels could be helpfull.
> public void put(VertexKey key, Object value) {
> keys.put(key,value);
> }
> 
> with
> public class VertexKey extends java.lang.Object
> {
> }
> 
> 2. edges contains no labels via label-key, so here we must contact the 
> jgrapht-team or modify their edge interface.
> public void put(EdgeKey key, Object value) {
> keys.put(key,value);
> }
> 
> /**
> * Return value associated with key in this edge
> */
> public Object get(EdgeKey key) {
> return keys.get(key);
> }
> 
> public void release(EdgeKey key){
> keys.remove(key);
> }
> with
> public class EdgeKey extends java.lang.Object
> {
> }
> 
> As i've seen that this functionality is missing in octet.
> Adding atoms or atompairs is O(1) and removing is missing completely.
> Furthermore, if following the actual implementation we will obtain for 
> removing O(N) instead of O(1), because you are using A List instead of a 
> map.
> 
> Kind regards, Joerg
> 
> 
>>Hello All,
>>
>>Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable.
>>
>>This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS.
>>
>>If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0.
>>
>>In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet.
>>
>>
>>cheers,
>>rich
>>
>>
>>---------------------------------
>>Do you Yahoo!?
>>Yahoo! Mail is new and improved - Check it out!
> 
> 
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] JGraphT as base for Octet, QSAR

From: rich a. <che...@ya...> - 2004-08-12 14:50:51

Hello Joerg,

I have good news for you. You *can* use JGraphT, any other existing graph framework, or any new graph framework, as a base for implementing the Molecule interface. Clients are free to choose the most optimized graph implementation they can find.

This is the big advantage of having business objects inherit an interface definition rather than a concrete class.

The way I would do this is to first implement the MoleculeBuilder interface, say JGraphTMoleculeBuilder. The implementation would build a JGraphT concrete Graph implementation behind the scenes (i.e. as a private instance) as MoleculeBuilder methods are invoked. When releaseMolecule() is invoked, JGraphTMoleculeBuilder would then wrap its JGraphT Graph in a Molecule implementation defined as a private inner class of JGraphTMoleculeBuilder. Hopefully I've explained this clearly, if not let me know and I can supply some skeletal source code.

You could even add a special method in JGraphTMoleculeBuilder that would release the JGraphT Graph for further manipulation. So then you could use Octet's SimpleSmilesReader, StuctureDataReader, or MolfileReader to build a JGraphT Graph instead of an Octet Molecule. This graph could then be used with all of the rich functionality in the JGraphT package, including traversers.

And all of this can happen without Octet (or QSAR) needing to know about JGraphT directly and without changing Octet in any way.

I actually considered using JGraphT (and other graph frameworks as well) for the default Molecule implementation. I decided against it mainly for simplicity. I didn't want Octet to require a lot of external dependencies. Not only that, but JGraphT comes with much more functionality, and in some cases not the correct functionality, for what I wanted to do with Octet.

Now, if the idea was to have a Molecule interface definition that extends the JGraphT Graph interface, I'm not in favor of that. The main reason is immutability. JGraphT's Graph interface is loaded with public mutator methods - meaning that any client can change a Graph representation at any time. To get around the inconsistencies this can lead to, JGraphT introduces GraphListener. But this means that every class that wants to be informed of a change to a graph needs to add itself as a listener - something that is easy to forget to do. It's also easy to forget to remove a class as a listener - preventing the garbage collector from deleting it, a form of "memory leak".

Since I could think of almost no situation that would require a Molecule to be modified once it was created, I decided to make Molecule and all of the interfaces it depends on (AtomPair, BondingSystem, Atom) immutable. The leads to simplification of the interface, more streamlined client code, no need for copy constructors or clone() methods, and also makes it harder to create bugs deriving from inconsistent Molecule state.

In summary, Octet supports Molecules with any underlying graph representation. But I would leave this kind of optimization up to users and wouldn't want to make it part of Octet. I would not favor Molecule inheriting JGraphT's Graph interface.

best,
rich

"Joerg K. Wegner" <we...@in...> wrote:
Hi Rich,

i know that my idea might be unpopular, but i think we should use also 
jgrapht (LGPL) as base for octet, because they provide already some 
graph-algorithms and traversers.

The 'simple graph' can be the default base for a molecule:
org._3pq.jgrapht.graph.SimpleGraph

The implementation looks fine, the only thing i'm missing is the 
labeling functionality for edges and vertexes. I've added a feature 
request to theri tracking system:
http://sourceforge.net/tracker/index.php?func=detail&aid=1007815&group_id=86459&atid=579690

1. vertexes are no problem, because they are handled as Objects and 
efficience for storing and removing is O(1), by accessing them by their 
hashCode- and equals-method (unique identifier, e.g. pointer or index 
number).
An vertex interface with labels could be helpfull.
public void put(VertexKey key, Object value) {
keys.put(key,value);
}

with
public class VertexKey extends java.lang.Object
{
}

2. edges contains no labels via label-key, so here we must contact the 
jgrapht-team or modify their edge interface.
public void put(EdgeKey key, Object value) {
keys.put(key,value);
}

/**
* Return value associated with key in this edge
*/
public Object get(EdgeKey key) {
return keys.get(key);
}

public void release(EdgeKey key){
keys.remove(key);
}
with
public class EdgeKey extends java.lang.Object
{
}

As i've seen that this functionality is missing in octet.
Adding atoms or atompairs is O(1) and removing is missing completely.
Furthermore, if following the actual implementation we will obtain for 
removing O(N) instead of O(1), because you are using A List instead of a 
map.

Kind regards, Joerg

> Hello All,
> 
> Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable.
> 
> This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS.
> 
> If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0.
> 
> In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet.
> 
> 
> cheers,
> rich
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Mail is new and improved - Check it out!

-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW: http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
(E. Hemingway)

Never mistake action for meaningful action.
(Hugo Kubinyi,2004)

-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Qsar-devel mailing list
Qsa...@li...
https://lists.sourceforge.net/lists/listinfo/qsar-devel

---------------------------------
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!

[octet-devel] JGraphT and labeled edges

From: Joerg K. W. <we...@in...> - 2004-08-12 11:42:46

Hi all,

the developers are right, so we should implement our own EdgeFactory.

https://sourceforge.net/tracker/?func=detail&atid=579690&aid=1007815&group_id=86459

Kind regards, Joerg
-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] Re: [QSAR-devel] A Proposal for QSAR

From: Joerg K. W. <we...@in...> - 2004-08-12 09:57:23

Hi All,

> (3) The ability to generate a set of all Molecules that satisfy the model (i.e., solving the inverse QSAR problem).
> ...  has hardly been considered, and would be, in my opionion, a 
significant advance over existing software with immediate payoff for the 
bench chemist.
Mmmh, this was already considered, but as far as i know this is an NP 
complete combinatorial optimization problem, so this is really hard.

Especially under the context of molecules which should be synthesizable. 
So 'de novo'-design is still not uncritical but more popular for such 
things.
So, in fact, we work on a graph which holds molecular fragments at its 
nodes in this special case.

> The key point is that Signature is not a number - it is a behavior.
> Signatures have many useful properties. A Signature can be used to construct a set of Molecules described by it. This constrast to the vast majority of descriptors in which too much information is lost, and is a key element in solving the inverse-QSAR problem. Signature can be used to construct many of the commonly-used topological descriptors. Signatures are also less degenerate than many other descriptors.
A BFS is not the solution for all problems (No-Free-Lunch-Theorem for 
optimization), but i agree that this is more general than basic hard 
coded graph traversing descriptors.

But at the moment i can't see, that this will help with Atom-Pair 
descriptors or with matrix descriptors. Furthermore i'm not sure if 
things like RDF are possible with this approach.

> Before a Signature implementation can be built, a mechansim for Breadth First Traversal must be in place. This is the current focus of Octet. BreadthFirstTraverser will use the same Handler/Controller architecture as other Traversers, and so will be a good starting point for a Signature builder, among other uses.
A set of possible traversers is more general, so i would prefer here a 
traverser factory to pick the traverser. Or more exact i would prefer 
parameters for the Signature object.

> I'm not sure if this was the original intent of the QSAR Project, which seemed more oriented toward building a QSAR gui. However, I believe that the system I'm proposing would be a critical component of that goal.
If we are only interested in a Descriptor-GUI, we can write a wrapper 
for joelib.test.DescriptorCalculation
For the data mining step i would still prefer the conversion to a Weka 
data structure and then apply the Weka-GUI directly, at least for the 
primitive descriptor types.
For the complexer ones we must at first modify the data mining methods 
directly to work on the graph, subgraph and whatever metrics, because 
this is not a standard-data mining tasks, so i know not one project 
which allows such things directly. That's why we've introduced the
joelib.algo.datamining.weka-package in JOElib.

I think it is more important to have a good design for future scientific 
  work to be as general as possible and allowing to mix actual:
chemistry knowledge (chemo)
with
data mining methods and algorithms (informatics)

Kind regards, Joerg
-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] Re: Differences/similarities between CDK, JOElib, Octet, Jmol, Jchempaint, Structure

From: Joerg K. W. <we...@in...> - 2004-08-12 09:28:21

Hi Ola,

all things said here are my personal opinion, so please be patient by=20
reading them.

> CDK		LGPL
> JOElib	GPL
> Octet		LGPL
> Jmol		LGPL
> Jchempaint	GPL
> Structure	LGPL
>=20
> 1) Have I understood the licenses above correctly? On some SF pages
> (joelib & jchempaint) it says GPL or LGPL. What does that mean? May I
> choose?
Not at all. GPL is harder than LGPL.
So in JOELib this means the kernel (this means the chemical expert=20
systems) is GPL and contains some LGPL parts. But you can not change the=20
GPL license. The GPL license comes from the stalled OELib project, so=20
commercial users can buy a OEChem license from EyesOpen, which is the=20
official commercial successor of OELib.

> 2) How much do CDK and JOElib overlap? I know you can use them together=
,
> what are the benefits of this? Descriptors? Will descriptors not be
> implemented in CDK?
They do not really overlap. Because they have different data structures=20
for molecules. There is a primitive converter class, but not more. So,=20
both have a different focus on what they provide. See documentation and=20
tutorials for details.
JOELib contains also LGPL code from Egon (CML) and modified 2D rendering=20
classes from Christoph (no 2D layout, only rendering, no event model)=20
which allows also to show SMARTS matchings and to export images and PDF.
Descriptors? Depends on the kind of the descriptors i would say, but=20
JOELib is here much more advanced (but i might be not objective here).

> 3) What does Octet add to this mix (except that it's LGPL and JOElib is
> not)? Can it be used with CDK? Overlap? Are the projects competing
> against each other?
No competition is the last thing we are interested in, because we are=20
too less developers to be really competitive. We are trying to combine=20
the different data structures in a general way in the octet project. But=20
this is still under discussion and far away from a concrete implementatio=
n.
So, on long terms this might provide a common interface.
Hopefully this will faciliate the usage of a chemoinformatics tools and=20
faciliate the project maintenance, we will see ...

> 4) What does the Structure project add to all this (except that it's
> built on Octet and LGPL)? The homepage says they are working on SDG,
> isn't that already present in CDK? Doesn't JchemPaint do the same thing
> as Structure?
Rich, is this project stalled or in progress ?

> I am posting this question in the CDK, Octet and JOElib mailinglists in
> order to get more extensive information.
Crossposting causes always many e-mails for users subscribed to all=20
users. If you bear such things always in mind, this is o.k.


CU, J=F6rg

--=20
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] Differences/similarities between CDK, JOElib, Octet, Jmol, Jchempaint, Structure

From: Ola S. <ola...@lc...> - 2004-08-12 08:53:22

Hello,

I am a little confused and don't know how these projects overlap and
their licenses.

CDK		LGPL
JOElib		GPL
Octet		LGPL
Jmol		LGPL
Jchempaint	GPL
Structure	LGPL

1) Have I understood the licenses above correctly? On some SF pages
(joelib & jchempaint) it says GPL or LGPL. What does that mean? May I
choose?

2) How much do CDK and JOElib overlap? I know you can use them together,
what are the benefits of this? Descriptors? Will descriptors not be
implemented in CDK?

3) What does Octet add to this mix (except that it's LGPL and JOElib is
not)? Can it be used with CDK? Overlap? Are the projects competing
against each other?

4) What does the Structure project add to all this (except that it's
built on Octet and LGPL)? The homepage says they are working on SDG,
isn't that already present in CDK? Doesn't JchemPaint do the same thing
as Structure?

I am posting this question in the CDK, Octet and JOElib mailinglists in
order to get more extensive information.

Best regards,

   .../Ola Spjuth

-- 
---
Ola Spjuth, PhD student
Dept of Pharmacology & Linnaeus Centre for Bioinformatics
Uppsala University, Sweden

[octet-devel] Re: [QSAR-devel] Octet-0.3.2 and Signature Descriptor

From: Joerg K. W. <we...@in...> - 2004-08-12 08:26:38

Hi Rich,

i know that my idea might be unpopular, but i think we should use also 
jgrapht (LGPL) as base for octet, because they provide already some 
graph-algorithms and traversers.

The 'simple graph' can be the default base for a molecule:
org._3pq.jgrapht.graph.SimpleGraph

The implementation looks fine, the only thing i'm missing is the 
labeling functionality for edges and vertexes. I've added a feature 
request to theri tracking system:
http://sourceforge.net/tracker/index.php?func=detail&aid=1007815&group_id=86459&atid=579690

1. vertexes are no problem, because they are handled as Objects and 
efficience for storing and removing is O(1), by accessing them by their 
hashCode- and equals-method (unique identifier, e.g. pointer or index 
number).
An vertex interface with labels could be helpfull.
    public void put(VertexKey key, Object value) {
         keys.put(key,value);
     }

with
public class VertexKey extends java.lang.Object
{
}

2. edges contains no labels via label-key, so here we must contact the 
jgrapht-team or modify their edge interface.
    public void put(EdgeKey key, Object value) {
         keys.put(key,value);
     }

     /**
      * Return value associated with <CODE>key</CODE> in this edge
      */
     public Object get(EdgeKey key) {
         return keys.get(key);
     }

     public void release(EdgeKey key){
         keys.remove(key);
     }
with
public class EdgeKey extends java.lang.Object
{
}

As i've seen that this functionality is missing in octet.
Adding atoms or atompairs is O(1) and removing is missing completely.
Furthermore, if following the actual implementation we will obtain for 
removing O(N) instead of O(1), because you are using A List instead of a 
map.

Kind regards, Joerg

> Hello All,
>  
> Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable.
>  
> This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS.
>  
> If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0.
>  
> In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet.
> 
>  
> cheers,
> rich
> 
> 		
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Mail is new and improved - Check it out!


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] Octet-0.3.2 and Signature Descriptor

From: rich a. <che...@ya...> - 2004-08-12 06:16:15

Hello All,

Octet-0.3.2 has been released (http://octet.sourceforge.net). This version contains support for breadth-first traversal (via the BreadthTraverser interface) as part of a refactored net.sourceforge.octet.graph package. Although the implementation of BasicBreadthTraverser may still have a hidden kink or two to work out, the API, which borrows from SAX in its style (http://www.saxproject.org), is relatively stable.

This was the last major set of functionality that seemed necessary for the development of a Java implementation of the Signature molecular descriptor itself. Of course, the framework for using the descriptor in building and using QSAR models will require a good deal more infrastructure. A preliminary draft of the Signature interface and a skeletal implementation (BasicSignature) will be appearing soon in the net.sourceforge.octet.qsar package CVS.

If you'd like to help, there's plenty to do! Feedback regarding the design/usability of the traversal API and especially bugs in its implementations would be helpful. Ideas on the proper implementation of stereochemistry, which will be the last major addition to Octet, would also be helpful. If you'd like to see any changes made to anything, now is the time - because the Octet API will be frozen some time in the next few months in preparation for the release of version 1.0.

In the next week or so, CDKTools - the CDK "bindings" for Octet's core interfaces - will be updated and released to reflect the recent changes made in Octet.

cheers,
rich

---------------------------------
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!

[octet-devel] Re: [QSAR-devel] A Proposal for QSAR

From: Joerg K. W. <we...@in...> - 2004-08-09 07:50:55

Hi All,

only a short comment for the moment.

> (3) The ability to generate a set of all Molecules that satisfy the model (i.e., solving the inverse QSAR problem).

So, this is an (combinatorial) optimization problem and our group has 
published last week the internal developed JavaEVA library (at the 
moment only as binary, because the license model is still under discussion):
http://www-ra.informatik.uni-tuebingen.de/software/JavaEvA/index.html

Kind regards, Joerg

>  
> My proposal is to make the development of a QSAR system that satisfies (1)-(3) the immediate focus of the QSAR project. Many systems address (1)-(2), but point (3) has hardly been considered, and would be, in my opionion, a significant advance over existing software with immediate payoff for the bench chemist.
>  
> I propose building this system with a single flavor of molecular descriptor called "Signature".
>  
> Briefly, the Signature of a Molecule is composed of the individual Signatures of its Atoms. An atomic Signature is composed of the Atom itself, and the set of Atoms surrounding it at a particular distance (think Breadth-First Search). The distance, or "height" is user-definable. The key point is that Signature is not a number - it is a behavior.
>  
> Signatures have many useful properties. A Signature can be used to construct a set of Molecules described by it. This constrast to the vast majority of descriptors in which too much information is lost, and is a key element in solving the inverse-QSAR problem. Signature can be used to construct many of the commonly-used topological descriptors. Signatures are also less degenerate than many other descriptors.
>  
> A series of four papers has been published on Signature. The third in the series is available online here: (http://www.genomes2life.org/publications/Signature-3.pdf). This article clearly outlines how a system using Signature builds an SAR model and solves the inverse QSAR problem. The first article in the series clearly specifies what a Signature is, with an excellent review of descriptor development and use. It is available here: (http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/ci020345w).
> Will building this system require many intermediate subsystems to be built? Of course. However, the blueprint is already in place. It's just a matter of constructing the software that meets the component specifications.
>  
> The foundation for this system will be Signature itself. I would propose that Signature should be an interface that concrete Signatures implement. I won't go into the interface specification here, but it should be straightforward to develop.
>  
> Before a Signature implementation can be built, a mechansim for Breadth First Traversal must be in place. This is the current focus of Octet. BreadthFirstTraverser will use the same Handler/Controller architecture as other Traversers, and so will be a good starting point for a Signature builder, among other uses.
>  
> Looking further out, an object-oriented architecture that encapsulates the stages of QSAR analysis needs to be developed: building equations; solving equations; and producing Molecules that match the solutions to the equations. As I mentioned, the blueprint is available - the challenge will be to build components that meet the specification.
>  
> I'm not sure if this was the original intent of the QSAR Project, which seemed more oriented toward building a QSAR gui. However, I believe that the system I'm proposing would be a critical component of that goal.
>  
> As a concrete next step, I would propose developing a Signature interface based on Octet. Simultaneously, a default implementation, BasicSignature, could be developed as a reality check for the design. The construction of simple unit tests will give the effort a context. I'm not sure where this prototype should be hosted, but due to the still-fluid nature of the Octet API, I think it would be most convenient to host it in a net.sourceforge.octet.qsar package for the time being. When we're all confident that the low-level features to make this system happen are in place, it can then be moved into a QSAR Project package.
>  
> This is one direction to take, and I'm open to any suggestions or comments.
>  
> cheers,
> rich
>  
> 
> 		
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Mail - 50x more storage than other providers!


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] A Proposal for QSAR

From: rich a. <che...@ya...> - 2004-07-30 15:05:39

Hello All,

After thinking about Joerg's comments and the discussion regarding object oriented descriptors, I've concluded that I've been approaching the entire concept from the wrong angle. Descriptor, in the sense that I've been thinking about it is really two completely different ideas: (1) the calculation of a numerical property for a particular Molecule (clogp, TPSA, etc.); and (2) the use of an algorithm for comparing a set of Molecules and their experimentally determined properies with the ultimate goal of building a predictive model. I believe that (2) is a far more important problem to work on.

The following is a proposal for a predictive QSAR system based on Octet (http://octet.sourceforge.net). It's key features will include:

(1) The use of a training set consisting of Molecules and data (IC50's, logP's, boiling points, Rf values, nmr shifts, etc.) for the generation of a QSAR Model.
(2) The ability to predict, based on the Model, the activity/property of any new Molecule.
(3) The ability to generate a set of all Molecules that satisfy the model (i.e., solving the inverse QSAR problem).

My proposal is to make the development of a QSAR system that satisfies (1)-(3) the immediate focus of the QSAR project. Many systems address (1)-(2), but point (3) has hardly been considered, and would be, in my opionion, a significant advance over existing software with immediate payoff for the bench chemist.

I propose building this system with a single flavor of molecular descriptor called "Signature".

Briefly, the Signature of a Molecule is composed of the individual Signatures of its Atoms. An atomic Signature is composed of the Atom itself, and the set of Atoms surrounding it at a particular distance (think Breadth-First Search). The distance, or "height" is user-definable. The key point is that Signature is not a number - it is a behavior.

Signatures have many useful properties. A Signature can be used to construct a set of Molecules described by it. This constrast to the vast majority of descriptors in which too much information is lost, and is a key element in solving the inverse-QSAR problem. Signature can be used to construct many of the commonly-used topological descriptors. Signatures are also less degenerate than many other descriptors.

A series of four papers has been published on Signature. The third in the series is available online here: (http://www.genomes2life.org/publications/Signature-3.pdf). This article clearly outlines how a system using Signature builds an SAR model and solves the inverse QSAR problem. The first article in the series clearly specifies what a Signature is, with an excellent review of descriptor development and use. It is available here: (http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/ci020345w).
Will building this system require many intermediate subsystems to be built? Of course. However, the blueprint is already in place. It's just a matter of constructing the software that meets the component specifications.

The foundation for this system will be Signature itself. I would propose that Signature should be an interface that concrete Signatures implement. I won't go into the interface specification here, but it should be straightforward to develop.

Before a Signature implementation can be built, a mechansim for Breadth First Traversal must be in place. This is the current focus of Octet. BreadthFirstTraverser will use the same Handler/Controller architecture as other Traversers, and so will be a good starting point for a Signature builder, among other uses.

Looking further out, an object-oriented architecture that encapsulates the stages of QSAR analysis needs to be developed: building equations; solving equations; and producing Molecules that match the solutions to the equations. As I mentioned, the blueprint is available - the challenge will be to build components that meet the specification.

I'm not sure if this was the original intent of the QSAR Project, which seemed more oriented toward building a QSAR gui. However, I believe that the system I'm proposing would be a critical component of that goal.

As a concrete next step, I would propose developing a Signature interface based on Octet. Simultaneously, a default implementation, BasicSignature, could be developed as a reality check for the design. The construction of simple unit tests will give the effort a context. I'm not sure where this prototype should be hosted, but due to the still-fluid nature of the Octet API, I think it would be most convenient to host it in a net.sourceforge.octet.qsar package for the time being. When we're all confident that the low-level features to make this system happen are in place, it can then be moved into a QSAR Project package.

This is one direction to take, and I'm open to any suggestions or comments.

cheers,
rich

---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!

[octet-devel] Recent API changes, proposed changes.

From: rich a. <che...@ya...> - 2004-07-29 03:03:51

Hello All,
 
A change to the Octet (http://octet.sourceforge.net) Molecule API has been made and committed to CVS. BondingSystem now extends AtomGraph. This means, among other things, that BondingSystems can now be traversed with Traversers (such as DepthFirstTraverser and CycleTraverser) and compared to other AtomGraphs with AtomGraphComparators (such as UllmanComparator). Several redundant BondingSystem methods were replaced as a result.
 
An AromaticityTool is now available. It's fairly crude at this stage, simply applying the 4n + 2 rule to the electron count of a cyclic, multi-atom BondingSystem. But it does, for example, detect the seven-membered aromatic ring in the homotropylium cation.
 
I would also like to propose that the method "iterateBondingSystems(Atom neighbor)" be removed from the Atom interface. This method has never been implemented and is largely redundant anyway. Any objections?
 
rich

		
---------------------------------
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!

[octet-devel] Coding and similarity ?

From: Joerg K. W. <we...@in...> - 2004-07-27 15:51:29

Hi Rich,

I've changed the subject to being more precisely.
I agree that things are getting complex, but primitive native 
numeric/nominal descriptors are only a really small subset of all 
possible codings for molecular structures (descriptor results).

descriptor (parameters, molecule): algorithm to get values
descriptor result:       storing object for the abstract molecule
                          numeric,nominal value, binary nominal value,
                          atom-pair, mcs, ...
query (parameters):  a search method getting a list of valid matchings
                      e.g. SMARTS, AP, shape, whatever, ...
metric (parameters, descRes1, descRes2): Getting similarity for two
                                          possibly codings

> But one thing that is not clear to me is how a generic Metric (or Comparator) does its job (without violating encapsulation) of comparing two Descriptor calculations given that the way in which each Descriptor represents itself is unique. For example, a Tanamoto comparison of two fingerprints will be done one way, but a Tanamoto comparison of two TPSA's will be done very differently. A Euclidian distance comparison of Topological Torsion is straightforward, but the same comparison of clogP - that's done very differently, I imagine.
Generic would not be the correct term.
The basic problem we always have is that 'similarity' can and definitely 
should not be separated from the metric, because a metric can only 
interpret the features given.
I've tried to find a structure for my private literature and i've now 
the opinion that coding and similarity are two sides of a coin.
So, we can have different images on one of the two sides, but we can not 
  split the coin.

So, eventually every descriptorResult should have something like:
List=descriptorResult.getPossibleMetrics();

And i've also the opinion that we should be really general here, because 
most model building algorithms (classification, regression, clustering) 
need most often only a kind of similarity and a meanValue for a set of 
molecules.
And the primitive euclidian distance of descriptor (sub)sets is only the 
  plain data mining approach with loosing all topologial information 
(inverse QSAR problem).

> And then there's the problem that a generic Metric will need a much wider Descriptor interface to do a comparison than a generic DescriptorResult or Descriptor will have.
Hmm, i think the result holds the: coding
and the metric addresses: similarity on coding

> How does JOELib handle these issues?
Not good and really diverse.
For general descriptor results i've recently introduced:
joelib.math.similarity.DistanceMetric

For basic values (numeric or nominal or binary nominal), furthermore 
there are some hot topics working directly on molecular structures. I 
will not discuss these things on the public mailing list, but i'm 
definitely willingly to cooperate here, if the plan is to write a paper 
using one of the new methods. For all methods we have the atom labelling 
(set) problem !

EUCLIDIAN, TANIMOTO:
joelib.util.ComparisonHelper
the euclidian or tanimoto metric is chosen from the kind of descriptor 
given to
setComparisonDescriptor(String)
setComparisonDescriptor(String[])

ATOM-PAIR (also unpublished work of Nikolas Fechner available, still in 
development)
joelib.desc.types.atompair.BasicAPDistanceMetric

MCS(not public, still in development, paper submitted, eventually i will 
publish after the paper was accepted, but i'm not sure if i'm willingly 
to share the implementation advantages so early)
Really weird, but i will prefer the abstractest object oriented way you 
can provide.
In fact two results (coding) and metric based on these results.
But there are tons of ways you can code (parameters for MCS generation) 
the MCS and you can apply the metric (parameters for metric)

> It almost seems like the "Descriptor" category itself is overly general and needs to be broken down further. Otherwise any Descriptor framework will have to know too much about particular Descriptor implementations with the result being a decidedly non-object-oriented framework that is difficult to extend and maintain. How can we address this?
In JOElib every descriptor knows it's result, so if you call
result=descriptor.calculate(molecule)
you will get the correct result. Because this is done by using 
Java-Reflection this is not the most efficient way, but if we use 
result=descriptor.calculate(molecule, result) this will be efficient.
Hence, standard users will have to pay a runtime-penalty, because object 
generation in Java is expensive (see also joelib.desc.ResultFactory).

I suggest that every result should know possible metrics.

I've also introduced a joelib.desc.DescriptorInfo object
Additionally there exists the DescDescription object which holds 
informations for each descriptor. If you will try:
joelib/ant> ant JOELibTestGUI

And you will switch to Info-->Descriptors Panel all informations are 
generated and loaded on the fly by using:
111. DescriptorFactory (get all descriptors JOELib can calculate, so we 
know the details for them, BTW unavailable documentation will cause 
annoying warnings, so developers are forced to provide from the 
beginning documentation files)

222. Get descriptor infos for each descriptor

333. Load single HTML documentation (generated also from DocBook-XML) 
for each descriptor

444. show informations.

Kind regards, Joerg
>  
> rich
> 
> "Joerg K. Wegner" <we...@in...> wrote:
> Hi again,
> 
> we should for performace issues not use (as in JOElib):
> molecule.calculate("XYZ")
> 
> we should use:
> keyXYZ=KeyFactory.getKey("XYZ");
> 
> // and use internal caching for this descriptor
> molecule.calculate(keyXYZ);
> 
> Kind regards, Joerg
> 
> 
>>Hi Rich,
>>
>>
>>>* Molecule implements AtomGraph. In the near future, BondingSystem 
>>>should also implement AtomGraph to enable traversal/query with the 
>>>same tools used for Molecules (any objections to this?)
>>
>>Good.
>>
>>
>>>* Traversers traverse the graph structure of any AtomGraph. Traversers 
>>>are low-level components that are helpful for building higher-level 
>>>functionality. Currently two types of Traverser are available: 
>>>DepthFirstTraverser and CycleTraverser. Both use a system of Handlers 
>>>and Controllers - Handlers for handling events generated at various 
>>>stages of a traversal algorithm and Controllers for exercising limited 
>>>control over the algorithm itself. This system borrows from SAX's 
>>>ContentHandler idea. HanserCycleTraverser is an implementation of 
>>>CycleTraverser that uses Hanser's algorithm for finding the set of all 
>>>cycles of an AtomGraph using collapsing Path-Graphs.
>>
>>CycleTraverser should use an interface, so that we can switch the 
>>traverser.
>>If nothing is said a default traverser should be used.
>>The traverser should also have an ID and version number analogue to 
>>descriptors.
>>
>>
>>
>>>* MoleculeComparator compares two AtomGraphs for isomorphism, but 
>>>without comparing atom/bonding properties. UllmanComparator implements 
>>>MoleculeComparator by using Ullman's subgraph isomorphism algorithm. 
>>>Like Traverser, MoleculeComparator uses a system of Handlers and 
>>>Controllers for fine-grained control. It should be possible to use 
>>>this sytem to create additional isomorphism algorithms implementing 
>>>MoleculeComparator.
>>
>>Isn't this only a formulation problem ?
>>Can't we use a boolean method compareNode(LabelSet) which uses a set of 
>>labels to check isomorphism ?
>>
>>
>>>* QueryBuilder enables clients to build a molecular query using the 
>>>same process that is used for building a Molecule with 
>>>MoleculeBuilder. In fact, QueryBuilder extends MoleculeBuilder and can 
>>>be used in many contexts calling for a MoleculeBuilder. QueryBuilder 
>>>is designed for building queries that are based on a template molecule 
>>>with constraints placed on individual Atoms with AtomQuery.
>>
>>Can 'pharmacophores' treated also with this approach. So are combined 
>>features, e.g. carbon acid group combined to a single feature and a 
>>distance to all other features allowed ?
>>
>>
>>
>>>* SmartsQueryFactory is in the early stages, but is intended to 
>>>simplify the process of using QueryBuilder by enabling clients to use 
>>>SMARTS Atomic Primitive strings as keys to obtain a fully functional 
>>>AtomQuery. Although this isn't exactly a SMARTS parser, it isn't that 
>>>far from being one given Octet's SmilesReader. Currenly only the 
>>>wildcard Atomic Primitive ("*") is supported, but other should be 
>>>appearing soon. The approach here has some elements in common with 
>>>that of CDK's growing SMARTS support, but there are also some 
>>>interesting differences.
>>
>>Same as above, so atom based (not feature based) compareNode(LabelSet) 
>>method, where the LabelSet is what i would call the chemical kernel atom 
>>labelling set.
>>
>>
>>>Looking a little further down the road for QSAR, what are people's 
>>>thoughts on a framework for molecular descriptors? Of course, there 
>>>hundreds of descriptors, and of course we all have our ideas on what a 
>>>particular descriptor means or doesn't mean. What I'm actually 
>>>wondering about is what a descriptor facility in QSAR would look and 
>>>feel like. I've been looking at JOELib's descriptor framework, which 
>>>has some reasonable concepts. From what I can tell, there are two 
>>>basic kinds of descriptor: a "holistic" descriptor that is a single 
>>>value (i.e. TPSA) and which is primitive-like, and everything else, 
>>>which tends to be higher-resolution in nature (i.e. Topological 
>>>Torsion) and more object-like. Are there any other ideas? 
>>
>>With respect to query i would prefer the object approach, so we can use:
>>result=molecule.calculate("XYZ")
>>or as in JOELib
>>result1=calculator.calculate(mol1,"XYZ", Properties)
>>result2=calculator.calculate(mol2,"XYZ", Properties)
>>
>>for matching or similarity we can then use
>>// inherited from Comparator in Java API
>>// applicable for euclidian, tanimoto, atom-pairs
>>similarity=metricThatILike(result1,result2, Properties);
>>
>>For simple single value descriptors it would be also interesting to have:
>>similarity=metricThatILike(ResultSet1,ResultSet2, Properties);
>>Also with pharmacophore outlook or multiple graph isomorphism and not 
>>only pair-wise matching.
>>
>>So a query is from my standpoint a kind of similarity-metric which can 
>>only return 0 and 1. Sometimes, as in SMARTS matching we are only 
>>interested in subgraph isomorphism.
>>result1=calculator.calculate(mol1,"XYZ", LabelSet)
>>result2=calculator.calculate(mol2,"XYZ", LabelSet)
>>// only applicable for this specific calculator
>>// can be used for maximum common substructure search (MCS)
>>matchings=matchingsThatILike(result1,result2, Properties);
>>
>>So, for SMARTS matching we need also:
>>matchings=matchingsThatILike(query1,result2, Properties);
>>
>>For pharmacophores 2D/3D/Shape we can also use this appraoch, because 
>>the representation for the similarity/matching is the relevant point.
>>matchings=matchingsThatILike(query1,result2, Properties);
>>or
>>similarity=metricThatILike(result1,result2, Properties);
>>
>>Kind regards, Joerg
>>
>>
> 
> 
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

[octet-devel] Re: [QSAR-devel] Beginnings of query support, descriptor framework

From: rich a. <che...@ya...> - 2004-07-27 14:55:32

Hello Joerg,
 
CycleTraverser is an interface and HanserCycleTraverser is a concrete implementation (one of many possible). So a method (say, an aromaticity detector) can take CycleTraverser as an argument and not have to worry about how the cycle perception is done. This is the approach I plan to take with all Traversers.
 
Your question about a node in a search molecule being a group of Atoms ("pharmacophore" search) is one I've been thinking about for a while now. No, I don't think QueryBuilder would be able to do this elegantly. But, yes, I think it can be done by using Reduced Graphs. This is a graph in which the nodes are structure fragments and the edges are connections between the fragments. It would let clients do interesting things like ask "does this molecule have a six-membered ring with a carboxylate and an amine at any relative positions?
 
I like your ideas on the Descriptor API. I've also played around with the idea that a Query is a special case of Descriptor. I think its worthwhile moving in that direction.
 
But one thing that is not clear to me is how a generic Metric (or Comparator) does its job (without violating encapsulation) of comparing two Descriptor calculations given that the way in which each Descriptor represents itself is unique. For example, a Tanamoto comparison of two fingerprints will be done one way, but a Tanamoto comparison of two TPSA's will be done very differently. A Euclidian distance comparison of Topological Torsion is straightforward, but the same comparison of clogP - that's done very differently, I imagine.
 
And then there's the problem that a generic Metric will need a much wider Descriptor interface to do a comparison than a generic DescriptorResult or Descriptor will have.
 
How does JOELib handle these issues?
 
It almost seems like the "Descriptor" category itself is overly general and needs to be broken down further. Otherwise any Descriptor framework will have to know too much about particular Descriptor implementations with the result being a decidedly non-object-oriented framework that is difficult to extend and maintain. How can we address this?
 
rich

"Joerg K. Wegner" <we...@in...> wrote:
Hi again,

we should for performace issues not use (as in JOElib):
molecule.calculate("XYZ")

we should use:
keyXYZ=KeyFactory.getKey("XYZ");

// and use internal caching for this descriptor
molecule.calculate(keyXYZ);

Kind regards, Joerg

> Hi Rich,
> 
>> * Molecule implements AtomGraph. In the near future, BondingSystem 
>> should also implement AtomGraph to enable traversal/query with the 
>> same tools used for Molecules (any objections to this?)
> 
> Good.
> 
>> * Traversers traverse the graph structure of any AtomGraph. Traversers 
>> are low-level components that are helpful for building higher-level 
>> functionality. Currently two types of Traverser are available: 
>> DepthFirstTraverser and CycleTraverser. Both use a system of Handlers 
>> and Controllers - Handlers for handling events generated at various 
>> stages of a traversal algorithm and Controllers for exercising limited 
>> control over the algorithm itself. This system borrows from SAX's 
>> ContentHandler idea. HanserCycleTraverser is an implementation of 
>> CycleTraverser that uses Hanser's algorithm for finding the set of all 
>> cycles of an AtomGraph using collapsing Path-Graphs.
> 
> CycleTraverser should use an interface, so that we can switch the 
> traverser.
> If nothing is said a default traverser should be used.
> The traverser should also have an ID and version number analogue to 
> descriptors.
> 
> 
>> * MoleculeComparator compares two AtomGraphs for isomorphism, but 
>> without comparing atom/bonding properties. UllmanComparator implements 
>> MoleculeComparator by using Ullman's subgraph isomorphism algorithm. 
>> Like Traverser, MoleculeComparator uses a system of Handlers and 
>> Controllers for fine-grained control. It should be possible to use 
>> this sytem to create additional isomorphism algorithms implementing 
>> MoleculeComparator.
> 
> Isn't this only a formulation problem ?
> Can't we use a boolean method compareNode(LabelSet) which uses a set of 
> labels to check isomorphism ?
> 
>> * QueryBuilder enables clients to build a molecular query using the 
>> same process that is used for building a Molecule with 
>> MoleculeBuilder. In fact, QueryBuilder extends MoleculeBuilder and can 
>> be used in many contexts calling for a MoleculeBuilder. QueryBuilder 
>> is designed for building queries that are based on a template molecule 
>> with constraints placed on individual Atoms with AtomQuery.
> 
> Can 'pharmacophores' treated also with this approach. So are combined 
> features, e.g. carbon acid group combined to a single feature and a 
> distance to all other features allowed ?
> 
> 
>> * SmartsQueryFactory is in the early stages, but is intended to 
>> simplify the process of using QueryBuilder by enabling clients to use 
>> SMARTS Atomic Primitive strings as keys to obtain a fully functional 
>> AtomQuery. Although this isn't exactly a SMARTS parser, it isn't that 
>> far from being one given Octet's SmilesReader. Currenly only the 
>> wildcard Atomic Primitive ("*") is supported, but other should be 
>> appearing soon. The approach here has some elements in common with 
>> that of CDK's growing SMARTS support, but there are also some 
>> interesting differences.
> 
> Same as above, so atom based (not feature based) compareNode(LabelSet) 
> method, where the LabelSet is what i would call the chemical kernel atom 
> labelling set.
> 
>> Looking a little further down the road for QSAR, what are people's 
>> thoughts on a framework for molecular descriptors? Of course, there 
>> hundreds of descriptors, and of course we all have our ideas on what a 
>> particular descriptor means or doesn't mean. What I'm actually 
>> wondering about is what a descriptor facility in QSAR would look and 
>> feel like. I've been looking at JOELib's descriptor framework, which 
>> has some reasonable concepts. From what I can tell, there are two 
>> basic kinds of descriptor: a "holistic" descriptor that is a single 
>> value (i.e. TPSA) and which is primitive-like, and everything else, 
>> which tends to be higher-resolution in nature (i.e. Topological 
>> Torsion) and more object-like. Are there any other ideas? 
> 
> With respect to query i would prefer the object approach, so we can use:
> result=molecule.calculate("XYZ")
> or as in JOELib
> result1=calculator.calculate(mol1,"XYZ", Properties)
> result2=calculator.calculate(mol2,"XYZ", Properties)
> 
> for matching or similarity we can then use
> // inherited from Comparator in Java API
> // applicable for euclidian, tanimoto, atom-pairs
> similarity=metricThatILike(result1,result2, Properties);
> 
> For simple single value descriptors it would be also interesting to have:
> similarity=metricThatILike(ResultSet1,ResultSet2, Properties);
> Also with pharmacophore outlook or multiple graph isomorphism and not 
> only pair-wise matching.
> 
> So a query is from my standpoint a kind of similarity-metric which can 
> only return 0 and 1. Sometimes, as in SMARTS matching we are only 
> interested in subgraph isomorphism.
> result1=calculator.calculate(mol1,"XYZ", LabelSet)
> result2=calculator.calculate(mol2,"XYZ", LabelSet)
> // only applicable for this specific calculator
> // can be used for maximum common substructure search (MCS)
> matchings=matchingsThatILike(result1,result2, Properties);
> 
> So, for SMARTS matching we need also:
> matchings=matchingsThatILike(query1,result2, Properties);
> 
> For pharmacophores 2D/3D/Shape we can also use this appraoch, because 
> the representation for the similarity/matching is the relevant point.
> matchings=matchingsThatILike(query1,result2, Properties);
> or
> similarity=metricThatILike(result1,result2, Properties);
> 
> Kind regards, Joerg
> 
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW: http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
(E. Hemingway)

Never mistake action for meaningful action.
(Hugo Kubinyi,2004)



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Qsar-devel mailing list
Qsa...@li...
https://lists.sourceforge.net/lists/listinfo/qsar-devel

		
---------------------------------
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.

[octet-devel] Re: [QSAR-devel] Beginnings of query support, descriptor framework

From: Joerg K. W. <we...@in...> - 2004-07-26 13:56:10

Hi again,

we should for performace issues not use (as in JOElib):
molecule.calculate("XYZ")

we should use:
keyXYZ=KeyFactory.getKey("XYZ");

// and use internal caching for this descriptor
molecule.calculate(keyXYZ);

Kind regards, Joerg

> Hi Rich,
> 
>> * Molecule implements AtomGraph. In the near future, BondingSystem 
>> should also implement AtomGraph to enable traversal/query with the 
>> same tools used for Molecules (any objections to this?)
> 
> Good.
> 
>> * Traversers traverse the graph structure of any AtomGraph. Traversers 
>> are low-level components that are helpful for building higher-level 
>> functionality. Currently two types of Traverser are available: 
>> DepthFirstTraverser and CycleTraverser. Both use a system of Handlers 
>> and Controllers - Handlers for handling events generated at various 
>> stages of a traversal algorithm and Controllers for exercising limited 
>> control over the algorithm itself. This system borrows from SAX's 
>> ContentHandler idea. HanserCycleTraverser is an implementation of 
>> CycleTraverser that uses Hanser's algorithm for finding the set of all 
>> cycles of an AtomGraph using collapsing Path-Graphs.
> 
> CycleTraverser should use an interface, so that we can switch the 
> traverser.
> If nothing is said a default traverser should be used.
> The traverser should also have an ID and version number analogue to 
> descriptors.
> 
> 
>> * MoleculeComparator compares two AtomGraphs for isomorphism, but 
>> without comparing atom/bonding properties. UllmanComparator implements 
>> MoleculeComparator by using Ullman's subgraph isomorphism algorithm. 
>> Like Traverser, MoleculeComparator uses a system of Handlers and 
>> Controllers for fine-grained control. It should be possible to use 
>> this sytem to create additional isomorphism algorithms implementing 
>> MoleculeComparator.
> 
> Isn't this only a formulation problem ?
> Can't we use a boolean method compareNode(LabelSet) which uses a set of 
> labels to check isomorphism ?
> 
>> * QueryBuilder enables clients to build a molecular query using the 
>> same process that is used for building a Molecule with 
>> MoleculeBuilder. In fact, QueryBuilder extends MoleculeBuilder and can 
>> be used in many contexts calling for a MoleculeBuilder. QueryBuilder 
>> is designed for building queries that are based on a template molecule 
>> with constraints placed on individual Atoms with AtomQuery.
> 
> Can 'pharmacophores' treated also with this approach. So are combined 
> features, e.g. carbon acid group combined to a single feature and a 
> distance to all other features allowed ?
> 
> 
>> * SmartsQueryFactory is in the early stages, but is intended to 
>> simplify the process of using QueryBuilder by enabling clients to use 
>> SMARTS Atomic Primitive strings as keys to obtain a fully functional 
>> AtomQuery. Although this isn't exactly a SMARTS parser, it isn't that 
>> far from being one given Octet's SmilesReader. Currenly only the 
>> wildcard Atomic Primitive ("*") is supported, but other should be 
>> appearing soon. The approach here has some elements in common with 
>> that of CDK's growing SMARTS support, but there are also some 
>> interesting differences.
> 
> Same as above, so atom based (not feature based) compareNode(LabelSet) 
> method, where the LabelSet is what i would call the chemical kernel atom 
> labelling set.
> 
>> Looking a little further down the road for QSAR, what are people's 
>> thoughts on a framework for molecular descriptors? Of course, there 
>> hundreds of descriptors, and of course we all have our ideas on what a 
>> particular descriptor means or doesn't mean. What  I'm actually 
>> wondering about is what a descriptor facility in QSAR would look and 
>> feel like. I've been looking at JOELib's descriptor framework, which 
>> has some reasonable concepts. From what I can tell, there are two 
>> basic kinds of descriptor: a "holistic" descriptor that is a single 
>> value (i.e. TPSA) and which is primitive-like, and everything else, 
>> which tends to be higher-resolution in nature (i.e. Topological 
>> Torsion) and more object-like. Are there any other ideas? 
> 
> With respect to query i would prefer the object approach, so we can use:
> result=molecule.calculate("XYZ")
> or as in JOELib
> result1=calculator.calculate(mol1,"XYZ", Properties)
> result2=calculator.calculate(mol2,"XYZ", Properties)
> 
> for matching or similarity we can then use
> // inherited from Comparator in Java API
> // applicable for euclidian, tanimoto, atom-pairs
> similarity=metricThatILike(result1,result2, Properties);
> 
> For simple single value descriptors it would be also interesting to have:
> similarity=metricThatILike(ResultSet1,ResultSet2, Properties);
> Also with pharmacophore outlook or multiple graph isomorphism and not 
> only pair-wise matching.
> 
> So a query is from my standpoint a kind of similarity-metric which can 
> only return 0 and 1. Sometimes, as in SMARTS matching we are only 
> interested in subgraph isomorphism.
> result1=calculator.calculate(mol1,"XYZ", LabelSet)
> result2=calculator.calculate(mol2,"XYZ", LabelSet)
> // only applicable for this specific calculator
> // can be used for maximum common substructure search (MCS)
> matchings=matchingsThatILike(result1,result2, Properties);
> 
> So, for SMARTS matching we need also:
> matchings=matchingsThatILike(query1,result2, Properties);
> 
> For pharmacophores 2D/3D/Shape we can also use this appraoch, because 
> the representation for the similarity/matching is the relevant point.
> matchings=matchingsThatILike(query1,result2, Properties);
> or
> similarity=metricThatILike(result1,result2, Properties);
> 
> Kind regards, Joerg
> 
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

52 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 > >> (Page 2 of 4)

2004	Jan	Feb	Mar	Apr (3)	May (11)	Jun (7)	Jul (12)	Aug (10)	Sep	Oct (2)	Nov (10)	Dec (14)
2005	Jan (3)	Feb	Mar (1)	Apr	May	Jun (1)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov	Dec
2006	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug (2)	Sep (5)	Oct (31)	Nov (13)	Dec