Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project!

## Re: [Cdk-devel] Re: CDK and Bonds

 Re: [Cdk-devel] Re: CDK and Bonds From: Brian Kelley - 2002-02-20 15:00:44 ```Christoph Steinbeck wrote: > > With respect to the substructre search based on fingerprints. This is > something that we could use right now :-) in our database project and > we would be able and willing contribute manpower and time immediately. > Do you have literature reference? The python package generates molecule fingerprints as well. Actually, it is incredibly easy. Here is the basic algorithm: Perform a depth first search in a molecule up to depth D and generate strings for each path traversed. Daylight uses D = 7 I use smiles like strings such as "C-C-C-c:c" where : means aromatic bond and lower case means aromatic atom. For each generated path: seed = hash(path) initialize a random number generator with the seed from the hash of the path. (Hash here returns an integer value for each string.) pull out the first random integer between 0 and N where N is the size of the bit string. Set this bit in the fingerprint. Done! Daylight has good reference on this topic, look under fingerprints http://www.daylight.com/dayhtml/doc/theory/theory.toc.html#Table of Contents A trivial string hashing function is here, although eventually you'll probably want to capitalize on the smaller alphabet used in the paths generated: http://www.dcc.uchile.cl/~rbaeza/handbook/algs/3/331.hash.c.html Here is the python code for the above import random fingerprint = 0L N = 1024 # paths is a list of string values for path in paths: # random can be seeded by any object that # has __hash__ defined random.seed(path) # generate a bit from 0 to N bit = int(random.random() * N) # set that bit in the fingerprint fingerprint = fingerprint | 1L< > > Cheers, > > Chris > -- Brian Kelley bkelley@... Whitehead Institute for Biomedical Research 617 258-6191 ```

 [Cdk-devel] Re: CDK and Bonds From: Christoph Steinbeck - 2002-02-20 08:43:13 ```Hi Oliver, you are raising an very important point here. While the early code in the base classes still tried to cover chemistry as broadly as possible, most of the stuff that was added later just assumed standard organic chemistry with a one-bond-connects-two-atoms" conecept. This is - needless to say - very short sighted and should be changed. As you imply, the number of implications of this change would be enormous, so you should have a week or so of free time, a pool, a good pool-barkeeper and laptop when you attempt to make this concept more general :-) But anyway, it would be worthwhile. With respect to your last question: Of course we are interested. The CDK would be in urgent need for canonicalization. Are you aware that Brian Kelley just recently posted some interesting python code on the list? With respect to the substructre search based on fingerprints. This is something that we could use right now :-) in our database project and we would be able and willing contribute manpower and time immediately. Do you have literature reference? Cheers, Chris -- Dr. Christoph Steinbeck (http://www.ice.mpg.de/departments/ChemInf) MPI of Chemical Ecology, Winzerlaer Str. 10, Beutenberg Campus, 07745 Jena, Germany Tel: +49(0)3641 571263 - Fax: +49(0)3641 571202 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. Oliver Horlacher wrote: > Hi > > I am working with the chemical development kit and I have > some questions. The first is about the Bond class. It > says that it implements the concept of a bond, i.e. a > number of electrons connecting a number of atoms. However > some of the methods assume 2 atoms i.e. contains and > getConnectedAtom. I have changed contains to step through > the atoms[], but getConnectedAtom can only return one atom > so I have not altered it. Also it is used very often so I > don’t know what impact changing the signature to > getConnectedAtoms(Atom atom, Atom[] found) or > getConnectedAtom(Atom atom, Collection found) would have. > How are the hydrogen bridges in a molecule such as B2H6 > represented? Also the atoms[] is only ever initialized to > contain two elements and it is not grown if the setAtom(a, > 3) is called. > The short-term goals for the use of the CDK is to be able > to produce canonical SMILES, and do similarity and > substructure searches based on fingerprints. Which is also > on your agenda, so I was wondering if you would be > interested in collaborating. > If you are interested in who we are and what we do our > homepage is http://www.therastrat.com. > > Cheers > > Oliver Horlacher > > > ```
 Re: [Cdk-devel] Re: CDK and Bonds From: Brian Kelley - 2002-02-20 15:00:44 ```Christoph Steinbeck wrote: > > With respect to the substructre search based on fingerprints. This is > something that we could use right now :-) in our database project and > we would be able and willing contribute manpower and time immediately. > Do you have literature reference? The python package generates molecule fingerprints as well. Actually, it is incredibly easy. Here is the basic algorithm: Perform a depth first search in a molecule up to depth D and generate strings for each path traversed. Daylight uses D = 7 I use smiles like strings such as "C-C-C-c:c" where : means aromatic bond and lower case means aromatic atom. For each generated path: seed = hash(path) initialize a random number generator with the seed from the hash of the path. (Hash here returns an integer value for each string.) pull out the first random integer between 0 and N where N is the size of the bit string. Set this bit in the fingerprint. Done! Daylight has good reference on this topic, look under fingerprints http://www.daylight.com/dayhtml/doc/theory/theory.toc.html#Table of Contents A trivial string hashing function is here, although eventually you'll probably want to capitalize on the smaller alphabet used in the paths generated: http://www.dcc.uchile.cl/~rbaeza/handbook/algs/3/331.hash.c.html Here is the python code for the above import random fingerprint = 0L N = 1024 # paths is a list of string values for path in paths: # random can be seeded by any object that # has __hash__ defined random.seed(path) # generate a bit from 0 to N bit = int(random.random() * N) # set that bit in the fingerprint fingerprint = fingerprint | 1L< > > Cheers, > > Chris > -- Brian Kelley bkelley@... Whitehead Institute for Biomedical Research 617 258-6191 ```