## [Cdk-devel] Canonicalization Routines

 [Cdk-devel] Canonicalization Routines From: Brian Kelley - 2002-01-17 21:15:37 ```I have been experimenting with implementing canonicalization routines in CDK. After an hour or so getting reaquainted with java I expect the process to be relatively straight forward. Here is what's missing though. Each atom and bond needs to be labled with an integer equivalence class. For bonds this is easy, we can just use the bond type or bond order. Daylight uses 1 2 3 and 4 if the bond is aromatic. An aromatic bond supersedes wheter the bond is single or double. For atoms this is a slightly harder challange, you need to take into account everything you would like to make an atom distinct. For example if C+ and C are considered different then charge must be taken into account. Here is what I have been using in python equiv_class = atom.number + \ 1000*(atom.charge+10) + \ 100000*(atom.hcount) + \ 1000000*(atom.weight) (atom.weight is an offset from the atom's mass so it is usually zero) The problem here is that using number's this large can lead to integer overflow during math computations. How does java deal with integer overflow? Python nicely propagates the integer into a long integer automagically. The last hurdle for me is some java sorting issues. Is there an easy way to do an ordered sort? Imagine I have a list that looks like: [(2,1), (4,2), (2,3)] And I would like the sorted list to look like [(2,1), (2,3), (4,2)] I imagine I would have to make a container list and supply an equivalence function. Is there something more elegant? If anyone is interested in seeing the python code, just ask and I'll post it. I can also describe the algorithm which is pretty cool if you ask me. -- Brian Kelley bkelley@... Whitehead Institute for Biomedical Research 617 258-6191 ```

 [Cdk-devel] Canonicalization Routines From: Brian Kelley - 2002-01-17 21:15:37 ```I have been experimenting with implementing canonicalization routines in CDK. After an hour or so getting reaquainted with java I expect the process to be relatively straight forward. Here is what's missing though. Each atom and bond needs to be labled with an integer equivalence class. For bonds this is easy, we can just use the bond type or bond order. Daylight uses 1 2 3 and 4 if the bond is aromatic. An aromatic bond supersedes wheter the bond is single or double. For atoms this is a slightly harder challange, you need to take into account everything you would like to make an atom distinct. For example if C+ and C are considered different then charge must be taken into account. Here is what I have been using in python equiv_class = atom.number + \ 1000*(atom.charge+10) + \ 100000*(atom.hcount) + \ 1000000*(atom.weight) (atom.weight is an offset from the atom's mass so it is usually zero) The problem here is that using number's this large can lead to integer overflow during math computations. How does java deal with integer overflow? Python nicely propagates the integer into a long integer automagically. The last hurdle for me is some java sorting issues. Is there an easy way to do an ordered sort? Imagine I have a list that looks like: [(2,1), (4,2), (2,3)] And I would like the sorted list to look like [(2,1), (2,3), (4,2)] I imagine I would have to make a container list and supply an equivalence function. Is there something more elegant? If anyone is interested in seeing the python code, just ask and I'll post it. I can also describe the algorithm which is pretty cool if you ask me. -- Brian Kelley bkelley@... Whitehead Institute for Biomedical Research 617 258-6191 ```
 Re: [Cdk-devel] Canonicalization Routines From: Christoph Steinbeck - 2002-01-18 15:54:18 ```Hi Brian, thanks for these interesting notes. I think the sorting issue can be easily overcome. And why not use the method that you describe for sorting. I think, however, that it is limited to Java 2, because this sort mechanism that you sketch relies on the collection classes. One could use SortedSet and a Comparator object. With respect to the integer: Java does not cast automatically, but we can use "long" instead, which should be more than large enough. How many lines is your python code? Too much to post in an email? Cheers, Chris -- Dr. Christoph Steinbeck (http://www.ice.mpg.de/departments/ChemInf) MPI of Chemical Ecology, Winzerlaer Str. 10, Beutenberg Campus, 07745 Jena, Germany Tel: +49(0)3641 571263 - Fax: +49(0)3641 571202 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. Brian Kelley wrote: > I have been experimenting with implementing canonicalization routines in > CDK. > > After an hour or so getting reaquainted with java I expect the process > to be relatively straight forward. > > Here is what's missing though. > > Each atom and bond needs to be labled with an integer equivalence class. > For bonds this is easy, we can just use the bond type or bond order. > > Daylight uses 1 2 3 and 4 if the bond is aromatic. An aromatic bond > supersedes wheter the bond is single or double. > > For atoms this is a slightly harder challange, you need to take into > account everything you would like to make an atom distinct. For example > if C+ and C are considered different then charge must be taken into > account. > > Here is what I have been using in python > > equiv_class = atom.number + \ > 1000*(atom.charge+10) + \ > 100000*(atom.hcount) + \ > 1000000*(atom.weight) > > (atom.weight is an offset from the atom's mass so it is usually zero) > > The problem here is that using number's this large can lead to integer > overflow during math computations. How does java deal with integer > overflow? Python nicely propagates the integer into a long integer > automagically. > > The last hurdle for me is some java sorting issues. > > Is there an easy way to do an ordered sort? Imagine I have a list that > looks like: > > [(2,1), (4,2), (2,3)] > > And I would like the sorted list to look like > [(2,1), (2,3), (4,2)] > > I imagine I would have to make a container list and supply an > equivalence function. Is there something more elegant? > > If anyone is interested in seeing the python code, just ask and I'll > post it. I can also describe the algorithm which is pretty cool if you > ask me. > ```