From: joerg.wegner <joe...@we...> - 2006-08-08 19:34:27
|
Hi all, > > Openbabel is more of a molecule parsing program then a graph kernel > > program. Well I would define that differently. 1. OpenBabel is a chemical expert system and has everything in place to transform stupid element graphs into chemical meaningful molecules. 2. A graph kernel is just a specific similarity measure working on graphs which has specific mathematical properties. In fact there is no strict separation, because just a direct graph transformation without chemical information is for sure the last thing we want! > >> Regarding DFS specifically, I believe it is present but I am not sure > >> where. I'm sure that Geoff can comment on this. > There are both DFS and BFS iterators in the 2.1 development code, > which incidentally is the only version with PyOpenBabel. A DFS is really primitive to implement, the only thing you nee is a queue, for e.g. using STL. Beside of that there might be more efficient solutions using graph algorithm libraries or machine learning algorithms. One thing is for sure ... nothing comes for free ... there is no free lunch ... also not for mining. > I'm not particularly versed on "tanimoto" vs. "hybrid". There is > already support for Tanimoto based on various fingerprint methods: > http://openbabel.sourceforge.net/wiki/Tutorial:Fingerprints Tanimoto similarity *ARRGH* usually most people think if they use "tanimoto similarity" all their problems will be solved, you can use it for curing any illness and save the world. To be very clear here, Tanimoto is just a metric like tons of other metrics. The similarity getting with it depends on the coding used, for tanimoto e.g. a vector coding based on any kind of features like complexity, substructures, etc. or on maximum common substructures (MCS). Yes, there is a tanimoto metric for MCS, check out one of the Willet/Gillet publications. If you are interested in publishing I heavily recommend comparing against existing kernels and graph mining methods. Since most algorithms in this area a freely available anyway this should not be a problem. Well this will also be the tough part, since you are really comparing algorithms. This is a big advantage to a huge number of QSAR papers where it is often not really clear what is compared ... the chemical expert system or the transformation rule followed by mining? Please ensure that your comparison is fair. You can not compare black-box algorithms with OpenBabel/JOELib since it is not clear if mining differences might come from the chemical expert system or from the following mining step, so any error up to ten percent is just in range of the standard deviation (just a rough guess of my experience). Exactly this is the reason why you should contribute any code to OpenBabel, also the algorithms you test against. Anyway, you have to dig into the code somehow, since you need the "raw" access to chemical graphs, only there you can create new, innovate, better and faster algorithms. If you will just use the usual stuff you will get what all the others have already ... the precious Tanimoto similarity. Joerg |