Menu

#755 inconsistent fingerprint generation

open
nobody
5
2012-10-23
2011-10-13
Chris
No

I have come across a problem with certain smiles strings with regard to generating a fs index and probing against that index.

Consider the file test.smi
c1cccc(c1N+[O-])Sc1ccc(cc1)C chem001

build a fs index
$ babel -ismi test.smi -ofs test.fs
build properly, converting 1 molecule

probing against the fs with the exact same input smiles:
$ babel test.fs hitlist.smi -s'c1cccc(c1N+[O-])Sc1ccc(cc1)C' -at0.85
yields 0 molecules converted

Rerunning the same probe but with a lower tanimoto threshold results in a hit:
$ babel test.fs hitlist.smi -s'c1cccc(c1N+[O-])Sc1ccc(cc1)C' -at0.80
yields 1 molecule converted

Direct comparison works:
$ obabel -:'c1cccc(c1N+[O-])Sc1ccc(cc1)C' -:'c1cccc(c1N+[O-])Sc1ccc(cc1)C' -ofpt
yields a tanimoto of 1

and probing directly against the original .smi file:
$ babel test.smi hitlist.smi -s'c1cccc(c1N+[O-])Sc1ccc(cc1)C' -at0.85
yields 1 molecule converted

In my tests using ~85 probe molecules against a 160,000-entry database this last method (direct comparison to the smi file) doesn't always work. However, the molecules that give problems when probing against the fs index are different from the molecules that give problems when probing directly against the smi file.

My OS is Mac OSX 10.6.8
This error reproduces on Open Babel 2.3.0 and 2.3.9 (dev version)

Christopher Mayne
cgmayne@gmail.com

Discussion