Re: [OpenBabel-scripting] Graph Kernels

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

> > Openbabel is more of a molecule parsing program then a graph kernel
> > program.
Well I would define that differently.
1. OpenBabel is a chemical expert system and has everything in place to
transform stupid element graphs into chemical meaningful molecules.
2. A graph kernel is just a specific similarity measure working on graphs
which has specific mathematical properties.
In fact there is no strict separation, because just a direct graph
transformation without chemical information is for sure the last thing we
want!

> >> Regarding DFS specifically, I believe it is present but I am not sure
> >> where. I'm sure that Geoff can comment on this.
> There are both DFS and BFS iterators in the 2.1 development code,
> which incidentally is the only version with PyOpenBabel.
A DFS is really primitive to implement, the only thing you nee is a queue,
for e.g. using STL. Beside of that there might be more efficient solutions
using graph algorithm libraries or machine learning algorithms. 
One thing is for sure ... nothing comes for free ... there is no free lunch
... also not for mining.

> I'm not particularly versed on "tanimoto" vs. "hybrid". There is
> already support for Tanimoto based on various fingerprint methods:
> http://openbabel.sourceforge.net/wiki/Tutorial:Fingerprints
Tanimoto similarity *ARRGH* usually most people think if they use "tanimoto
similarity" all their problems will be solved, you can use it for curing any
illness and save the world. To be very clear here, Tanimoto is just a metric
like tons of other metrics. The similarity getting with it depends on the
coding used, for tanimoto e.g. a vector coding based on any kind of features
like complexity, substructures, etc. or on maximum common substructures
(MCS). Yes, there is a tanimoto metric for MCS, check out one of the
Willet/Gillet publications.

If you are interested in publishing I heavily recommend comparing against
existing kernels and graph mining methods. Since most algorithms in this
area a freely available anyway this should not be a problem. Well this will
also be the tough part, since you are really comparing algorithms. This is a
big advantage to a huge number of QSAR papers where it is often not really
clear what is compared ... the chemical expert system or the transformation
rule followed by mining? Please ensure that your comparison is fair. You can
not compare black-box algorithms with OpenBabel/JOELib since it is not clear
if mining differences might come from the chemical expert system or from the
following mining step, so any error up to ten percent is just in range of
the standard deviation (just a rough guess of my experience).

Exactly this is the reason why you should contribute any code to OpenBabel,
also the algorithms you test against. Anyway, you have to dig into the code
somehow, since you need the "raw" access to chemical graphs, only there you
can create new, innovate, better and faster algorithms. If you will just use
the usual stuff you will get what all the others have already ... the
precious Tanimoto similarity.

Joerg