From: Noel O'Boyle <noel.oboyle2@ma...>  20050418 11:54:46

In fact, I was also using a subset of the NCI dataset, and I see that this is a general problem with the NCI dataset (c.f. Chemoinformatics Concepts, Methods, and Tools for Drug Discovery, Bajorath, J=C3=BCrgen, 2= 004)  it contains a number of molecules with a large number of cycles. =20 Presumably, the number of cycles correlates with the M.W., so if I am interested in druglike molecules (which I was), I could just apply a cutoff, maybe twice the ruleoffive value, 1000, which would be pretty safe (just guessing here). On the other hand, it is only necessary to calculate the fingerprint for a given molecule once. Perhaps an SQL database of fingerprints for the NCI dataset would be very useful. Better still (for me :), an n x n matrix of Tanimoto values. Anybody interested in making this publicly available somehow? On Mon, 20050418 at 12:33, Nina Jeliazkova wrote: > Noel, all, >=20 > I have run into the problem of slow fingerprints (and smiles as well) s= ome > months ago, while playing with NCI dataset. There are some molecules in= this > dataset which can run for two days. >=20 > In fact the slow part is AllRingsFinder class and although the algorith= m > implemented for finding all rings is published, it is not very efficien= t in > some cases. I could provide statistics for timing for almost all NCI d= ataset > if anybody is interested. >=20 > A test I had developed is as follows: >=20 > 1) calculate the spanning tree of the molecule (I would be glad to cont= ribute > the code to CDK, I couldn't find spanning tree functionality some month= s ago, > haven't checked recently).=20 > This is a classic and fast algorithms, so not problems with timing. >=20 > 2) identify the number of cyclic bonds (this is straightforward from a > spanning tree) >=20 > 3) identify the maximum bonds per atom=20 >=20 > 4) calling AllRingsFinder is safe for compounds with the number of cycl= ic > bonds less than about 37 (this is heuristic ! ) and maximum bonds per a= tom <=3D > 4 (yes, there are some exotic structures within NCI dataset with more t= han 4 > bonds per atom) >=20 > This makes things safe (btw, some structures which could possibly go fa= st will > be missed), but nevertheless it is just an workaround. >=20 > The better solution is to have a flag inside the AllRingsFinder, so tha= t if it > is called in a thread, one just kills the thread if the allowed time is > exhausted. Haven't tried this. >=20 > If anybody interested in code / statistics, please let me know.=20 >=20 > Regards, > Nina >=20 > >< > Assoc. Prof. Dr. Nina NikolovaJeliazkova >=20 > Institute for Parallel Processing > Bulgarian Academy of Sciences > 25a "acad. G.Bonchev" str.=20 > Sofia 1113 > Bulgaria >=20 > Phone : +359 2 979 6616 > Mobile: +359 088 6802011 > Fax : +359 2 8707273 > http://luna.acad.bg/nina > >< >=20 > "SourceForge.net" <noreply@...> wrote: >=20 > > Feature Requests item #1181323, was opened at 20050412 09:26 > > Message generated for change (Tracker Item Submitted) made by Item Su= bmitter > > You can respond by visiting:=20 > > > https://sourceforge.net/tracker/?func=3Ddetail&atid=3D370024&aid=3D1181= 323&group_id=3D20024 > >=20 > > Category: cdk.fingerprint > > Group: None > > Status: Open > > Priority: 5 > > Submitted By: Noel O\'Boyle (baoilleach) > > Assigned to: Christoph Steinbeck (steinbeck) > > Summary: Test for very slow fingerprints > >=20 > > Initial Comment: > > I have been calculating fingerprints for 3000 > > 'reallife' molecules, using the default settings for > > the FingerPrinter class (which are not described in the > > API JavaDoc  I think they probably should be). Most > > molecules took a fraction of a second to calculate. > > However, a couple of them took up to 8 hours. > > This was due to a large number of subgraphs (I think). > > Is there any way to guesstimate whether a particular > > molecule will be very slow to FingerPrint, so that it > > can be left out of a screen if desired? In the end, it > > took around 4 days to calculate fingerprints for the > > 3000 molecules. To be fair to FingerPrinter, the slow > > molecules did not look very druglike, but I would have > > prefered to leave 6 molecules out and complete the > > calculation in one hour, rather than include them, and > > take 4 days. > > If you are interested, I have attached one of the slow > > molecules. > >=20 > > Noel > >=20 > > =  > >=20 > > You can respond by visiting:=20 > > > https://sourceforge.net/tracker/?func=3Ddetail&atid=3D370024&aid=3D1181= 323&group_id=3D20024 > >=20 > >=20 > >  > > SF email is sponsored by  The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real use= rs. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > > _______________________________________________ > > Cdkdevel mailing list > > Cdkdevel@... > > https://lists.sourceforge.net/lists/listinfo/cdkdevel > >=20 >=20 >=20 >=20 >=20 >=20 >  > SF email is sponsored by  The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users= =2E > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > _______________________________________________ > Cdkdevel mailing list > Cdkdevel@... > https://lists.sourceforge.net/lists/listinfo/cdkdevel 