From: Fabien F. <ffo...@im...> - 2004-10-04 16:02:57
|
There is na small documentation in the file fingerprint.cpp. Here is the doc pasted: /** \class fingerprint \brief 2D Fingerprint class To compute the fingerprint fpt for a molecule mol \code fingerprint fpt(mol.GetTitle()); fpt.HashMol(mol); \endcode The fingerprint computation requires the execution of the following steps: - Obtain the fragments - Remove duplicate fragments and get a hash number for each fragment - Obtain the fingerprint of the fragments The fragments are obtained by means of a recursive algorithm. The algorithm finds all linear fragments up to a size of seven atoms. Cyclic fragments are also identified by checking if there are ring closures in the linear fragment. However, the algorithm does not identify branched fragments if there are not part of a ring. for example CC(C)C will have the fragments: C CC CCC and C1C(C)C1 will have the fragments: C CC CCC C1CC1 CC1CC1 Duplicate fragments, i.e. fragment generated by the same atoms, are removed from the fragment list. The only fragment that are kept are the ones which produce the lowest hash numbers. The hash number for a fragment is an integer number which is generated from the atoms and bonds of the fragment. The value of this number depends on the position, the atomic number and type (aromatic or not) of the atoms of the fragment, the position and the type of the bonds of the fragments, the position and the type of the ring closures. The fingerprint is a bitstring of 1021 bits. It is evident that there are many more possible fragments than 1021, and consequently the hash number is often higher than 1021. It is easy to obtain a number lower than 1021 from the hash number, just by dividing the hash number by 1021 and by keeping the reminder. this reminder is then used to set a bit to one in the fingerprint. Actually, the size of the bitstring is of 1024 bits but we divide by 1021 because 1021 is a prime number which produces a better hashing than an even number such as 1024. Consequently, the three last bits of the fingerprint are always set to zero. To print the fingerprint in ascii format use the function printFingerprint. */ Peter Murray-Rust wrote: > At 12:18 04/10/2004 +0200, Fabien Fontaine wrote: > >> Hi, >> >> I have implemented a new type of output for openbabel. It is -ofpt which >> stands for 2D fingerprint. For each molecule The output are two ASCII >> lines. The first line contains the molecule name, e.g.: >> >molname1 >> The second line contains the fingerprint, e.g.: >> 000010001000....00000 >> The size of the fingerprint is 1024 bits, each bit set to one correspond >> to the presence of a 2D fragment. The fragments are linear or with >> ring closures but not branched. The maximum fragment size is 7 atoms. >> >> I have also added a little tool called obtanimoto which computes the >> tanimoto coefficient between a query molecule and a database of >> fingerprint. >> >> I haven't applied these fingerprint yet so any testing and comparison >> with already existing fingerprint is welcomed. > > > This looks like being very useful for some of the things we want to do. > Is there documentation as to how the fingerprints are generated? Are the > 1024 fragments handcoded? or are there more and are they folded into a > fixed length? > > P. > > > Peter Murray-Rust > Unilever Centre for Molecular Informatics > Chemistry Department, Cambridge University > Lensfield Road, CAMBRIDGE, CB2 1EW, UK > Tel: +44-1223-763069 > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > OpenBabel-discuss mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > -- Fabien Fontaine Engineer in Biotechnology Computer-Assisted Drug Design Laboratory Research Group on Biomedical Informatics (GRIB) - IMIM/UPF Passeig Maritim de la Barceloneta, 37-49 Tel: +34 93 224 08 94 E-08003 Barcelona (Spain) Fax: +34 93 224 08 75 e-mail: ffo...@im... http://www.imim.es/grib |