Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Rightclick on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Nina Jeliazkova <nina@ac...>  20050418 14:44:53

Hi Chris, Christoph Steinbeck <c.steinbeck@...> wrote: > Hi Nina, and others, > > thanks a lot for this nice analysis. > > cdk.AllRingsFinder has two problems: > 1. The algorithm might be slow by design in some cases. yes, it is, but finding all possible rings IS a hard problem > 2. My implementation might be suboptimal. haven't checked :) > We can ignore point 2 for now :), because clearly the combinatorial > problem underlying point 1 will catch us anyway above a certain number > of rings in the molecule. exactly > > In the case of the cdk.Fingerprinter, AllRingsFinder is used as a > preprocessing step to aromaticity detection. > Marking bonds as aromatic clearly is a crucial step in fingerprint > calculation. Since we are thus not interested in all rings but only > those who are likely to be aromatic, we might be able to come up with a > better solution, which does it without AllRingsFinder. btw, some algorithms consider finding "relevant" cycles in the graph, I particularly found this one quite appealing in theory (and used in a chemistry context) http://www.combinatorics.org/Volume_4/PostScriptfiles/v4i1r9.ps > > One idea that we recently had was to make the SmallestSetOfSmallestRings > (SSSR), which is fast even for C60, and to produce the SetOfAllRings by > a linear combination of the SSSR rings. > Clearly, this will not help in all cases, especially when the whole ring > system is aromatic. > Then again, doing the aromaticity detection just based on the SSSR will > work. > > Ultimately, I think, we cannot do without a decent heuristic, where we > evaluate some statistics on the number of expected aromatic rings vs. > number of all rings, and then apply the best suited Ring Detection > Algorithm. > > Some of you might have noticed that virtually not literature is > available on the aromaticity perception problem. > I just found this: > (1) Rooskozel, B. L.; Jorgensen, W. L. ComputerAssisted Mechanistic > Evaluation of OrganicReactions .2. Perception of Rings, Aromaticity, > and Tautomers. Journal of Chemical Information and Computer Sciences > 1981, 21, 101111. > > and I will try to get a copy and see what they did. > > > If anybody interested in code / statistics, please let me know. > > Oh, yes, please. We would be very interested in that. > It could go into a theory manual, which we urgently need to start. yes, I've already agreed I will write a summary to be included in the next CDK news Best, Nina 
From: SourceForge.net <noreply@so...>  20050412 08:26:16

Feature Requests item #1181323, was opened at 20050412 09:26 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=370024&aid=1181323&group_id=20024 Category: cdk.fingerprint Group: None Status: Open Priority: 5 Submitted By: Noel O\'Boyle (baoilleach) Assigned to: Christoph Steinbeck (steinbeck) Summary: Test for very slow fingerprints Initial Comment: I have been calculating fingerprints for 3000 'reallife' molecules, using the default settings for the FingerPrinter class (which are not described in the API JavaDoc  I think they probably should be). Most molecules took a fraction of a second to calculate. However, a couple of them took up to 8 hours. This was due to a large number of subgraphs (I think). Is there any way to guesstimate whether a particular molecule will be very slow to FingerPrint, so that it can be left out of a screen if desired? In the end, it took around 4 days to calculate fingerprints for the 3000 molecules. To be fair to FingerPrinter, the slow molecules did not look very druglike, but I would have prefered to leave 6 molecules out and complete the calculation in one hour, rather than include them, and take 4 days. If you are interested, I have attached one of the slow molecules. Noel  You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=370024&aid=1181323&group_id=20024 
From: Nina Jeliazkova <nina@ac...>  20050418 11:33:51

Noel, all, I have run into the problem of slow fingerprints (and smiles as well) some months ago, while playing with NCI dataset. There are some molecules in this dataset which can run for two days. In fact the slow part is AllRingsFinder class and although the algorithm implemented for finding all rings is published, it is not very efficient in some cases. I could provide statistics for timing for almost all NCI dataset if anybody is interested. A test I had developed is as follows: 1) calculate the spanning tree of the molecule (I would be glad to contribute the code to CDK, I couldn't find spanning tree functionality some months ago, haven't checked recently). This is a classic and fast algorithms, so not problems with timing. 2) identify the number of cyclic bonds (this is straightforward from a spanning tree) 3) identify the maximum bonds per atom 4) calling AllRingsFinder is safe for compounds with the number of cyclic bonds less than about 37 (this is heuristic ! ) and maximum bonds per atom <= 4 (yes, there are some exotic structures within NCI dataset with more than 4 bonds per atom) This makes things safe (btw, some structures which could possibly go fast will be missed), but nevertheless it is just an workaround. The better solution is to have a flag inside the AllRingsFinder, so that if it is called in a thread, one just kills the thread if the allowed time is exhausted. Haven't tried this. If anybody interested in code / statistics, please let me know. Regards, Nina >< Assoc. Prof. Dr. Nina NikolovaJeliazkova Institute for Parallel Processing Bulgarian Academy of Sciences 25a "acad. G.Bonchev" str. Sofia 1113 Bulgaria Phone : +359 2 979 6616 Mobile: +359 088 6802011 Fax : +359 2 8707273 http://luna.acad.bg/nina >< "SourceForge.net" <noreply@...> wrote: > Feature Requests item #1181323, was opened at 20050412 09:26 > Message generated for change (Tracker Item Submitted) made by Item Submitter > You can respond by visiting: > https://sourceforge.net/tracker/?func=detail&atid=370024&aid=1181323&group_id=20024 > > Category: cdk.fingerprint > Group: None > Status: Open > Priority: 5 > Submitted By: Noel O\'Boyle (baoilleach) > Assigned to: Christoph Steinbeck (steinbeck) > Summary: Test for very slow fingerprints > > Initial Comment: > I have been calculating fingerprints for 3000 > 'reallife' molecules, using the default settings for > the FingerPrinter class (which are not described in the > API JavaDoc  I think they probably should be). Most > molecules took a fraction of a second to calculate. > However, a couple of them took up to 8 hours. > This was due to a large number of subgraphs (I think). > Is there any way to guesstimate whether a particular > molecule will be very slow to FingerPrint, so that it > can be left out of a screen if desired? In the end, it > took around 4 days to calculate fingerprints for the > 3000 molecules. To be fair to FingerPrinter, the slow > molecules did not look very druglike, but I would have > prefered to leave 6 molecules out and complete the > calculation in one hour, rather than include them, and > take 4 days. > If you are interested, I have attached one of the slow > molecules. > > Noel > >  > > You can respond by visiting: > https://sourceforge.net/tracker/?func=detail&atid=370024&aid=1181323&group_id=20024 > > >  > SF email is sponsored by  The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Cdkdevel mailing list > Cdkdevel@... > https://lists.sourceforge.net/lists/listinfo/cdkdevel > 
From: Noel O'Boyle <noel.oboyle2@ma...>  20050418 11:54:46

In fact, I was also using a subset of the NCI dataset, and I see that this is a general problem with the NCI dataset (c.f. Chemoinformatics Concepts, Methods, and Tools for Drug Discovery, Bajorath, J=C3=BCrgen, 2= 004)  it contains a number of molecules with a large number of cycles. =20 Presumably, the number of cycles correlates with the M.W., so if I am interested in druglike molecules (which I was), I could just apply a cutoff, maybe twice the ruleoffive value, 1000, which would be pretty safe (just guessing here). On the other hand, it is only necessary to calculate the fingerprint for a given molecule once. Perhaps an SQL database of fingerprints for the NCI dataset would be very useful. Better still (for me :), an n x n matrix of Tanimoto values. Anybody interested in making this publicly available somehow? On Mon, 20050418 at 12:33, Nina Jeliazkova wrote: > Noel, all, >=20 > I have run into the problem of slow fingerprints (and smiles as well) s= ome > months ago, while playing with NCI dataset. There are some molecules in= this > dataset which can run for two days. >=20 > In fact the slow part is AllRingsFinder class and although the algorith= m > implemented for finding all rings is published, it is not very efficien= t in > some cases. I could provide statistics for timing for almost all NCI d= ataset > if anybody is interested. >=20 > A test I had developed is as follows: >=20 > 1) calculate the spanning tree of the molecule (I would be glad to cont= ribute > the code to CDK, I couldn't find spanning tree functionality some month= s ago, > haven't checked recently).=20 > This is a classic and fast algorithms, so not problems with timing. >=20 > 2) identify the number of cyclic bonds (this is straightforward from a > spanning tree) >=20 > 3) identify the maximum bonds per atom=20 >=20 > 4) calling AllRingsFinder is safe for compounds with the number of cycl= ic > bonds less than about 37 (this is heuristic ! ) and maximum bonds per a= tom <=3D > 4 (yes, there are some exotic structures within NCI dataset with more t= han 4 > bonds per atom) >=20 > This makes things safe (btw, some structures which could possibly go fa= st will > be missed), but nevertheless it is just an workaround. >=20 > The better solution is to have a flag inside the AllRingsFinder, so tha= t if it > is called in a thread, one just kills the thread if the allowed time is > exhausted. Haven't tried this. >=20 > If anybody interested in code / statistics, please let me know.=20 >=20 > Regards, > Nina >=20 > >< > Assoc. Prof. Dr. Nina NikolovaJeliazkova >=20 > Institute for Parallel Processing > Bulgarian Academy of Sciences > 25a "acad. G.Bonchev" str.=20 > Sofia 1113 > Bulgaria >=20 > Phone : +359 2 979 6616 > Mobile: +359 088 6802011 > Fax : +359 2 8707273 > http://luna.acad.bg/nina > >< >=20 > "SourceForge.net" <noreply@...> wrote: >=20 > > Feature Requests item #1181323, was opened at 20050412 09:26 > > Message generated for change (Tracker Item Submitted) made by Item Su= bmitter > > You can respond by visiting:=20 > > > https://sourceforge.net/tracker/?func=3Ddetail&atid=3D370024&aid=3D1181= 323&group_id=3D20024 > >=20 > > Category: cdk.fingerprint > > Group: None > > Status: Open > > Priority: 5 > > Submitted By: Noel O\'Boyle (baoilleach) > > Assigned to: Christoph Steinbeck (steinbeck) > > Summary: Test for very slow fingerprints > >=20 > > Initial Comment: > > I have been calculating fingerprints for 3000 > > 'reallife' molecules, using the default settings for > > the FingerPrinter class (which are not described in the > > API JavaDoc  I think they probably should be). Most > > molecules took a fraction of a second to calculate. > > However, a couple of them took up to 8 hours. > > This was due to a large number of subgraphs (I think). > > Is there any way to guesstimate whether a particular > > molecule will be very slow to FingerPrint, so that it > > can be left out of a screen if desired? In the end, it > > took around 4 days to calculate fingerprints for the > > 3000 molecules. To be fair to FingerPrinter, the slow > > molecules did not look very druglike, but I would have > > prefered to leave 6 molecules out and complete the > > calculation in one hour, rather than include them, and > > take 4 days. > > If you are interested, I have attached one of the slow > > molecules. > >=20 > > Noel > >=20 > > =  > >=20 > > You can respond by visiting:=20 > > > https://sourceforge.net/tracker/?func=3Ddetail&atid=3D370024&aid=3D1181= 323&group_id=3D20024 > >=20 > >=20 > >  > > SF email is sponsored by  The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real use= rs. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > > _______________________________________________ > > Cdkdevel mailing list > > Cdkdevel@... > > https://lists.sourceforge.net/lists/listinfo/cdkdevel > >=20 >=20 >=20 >=20 >=20 >=20 >  > SF email is sponsored by  The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users= =2E > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > _______________________________________________ > Cdkdevel mailing list > Cdkdevel@... > https://lists.sourceforge.net/lists/listinfo/cdkdevel 
From: Egon Willighagen <e.willighagen@sc...>  20050418 12:37:07

On Monday 18 April 2005 01:33 pm, Nina Jeliazkova wrote: > 1) calculate the spanning tree of the molecule (I would be glad to > contribute the code to CDK, I couldn't find spanning tree functionality > some months ago, haven't checked recently). > This is a classic and fast algorithms, so not problems with timing. Nina, there seems to be a spanning tree build in cdk.ringsearch.cyclebasis.SimpleCycleBasis.createMinimumCycleBasis(). Ulrich, is that true? If so, the minimum cycle basis is not the same (correct?); Would it be possible to split out the calculation of the spanning tree from the above method? Egon 
From: Ulrich Bauer <baueru@cs...>  20050418 18:26:52

On 18.04.2005, at 14:33, Egon Willighagen wrote: > On Monday 18 April 2005 01:33 pm, Nina Jeliazkova wrote: >> 1) calculate the spanning tree of the molecule (I would be glad to >> contribute the code to CDK, I couldn't find spanning tree >> functionality >> some months ago, haven't checked recently). >> This is a classic and fast algorithms, so not problems with timing. > > Nina, > > there seems to be a spanning tree build in > cdk.ringsearch.cyclebasis.SimpleCycleBasis.createMinimumCycleBasis(). > > Ulrich, is that true? > If so, the minimum cycle basis is not the same (correct?); Would it be > possible to split out the calculation of the spanning tree from the > above > method? > > Egon Yes, this is true; there is a calculation of a spanning tree, and this is not the same as a minimum cycle basis. The calculation of the spanning tree is very simple, but I am using the graph data structures form the JGraphT library. In which representation do you need/use the spanning tree? Best regards, Ulrich Bauer 
From: Christoph Steinbeck <c.steinbeck@un...>  20050418 14:25:10

Hi Nina, and others, thanks a lot for this nice analysis. cdk.AllRingsFinder has two problems: 1. The algorithm might be slow by design in some cases. 2. My implementation might be suboptimal. We can ignore point 2 for now :), because clearly the combinatorial=20 problem underlying point 1 will catch us anyway above a certain number=20 of rings in the molecule. In the case of the cdk.Fingerprinter, AllRingsFinder is used as a=20 preprocessing step to aromaticity detection. Marking bonds as aromatic clearly is a crucial step in fingerprint=20 calculation. Since we are thus not interested in all rings but only=20 those who are likely to be aromatic, we might be able to come up with a=20 better solution, which does it without AllRingsFinder. One idea that we recently had was to make the SmallestSetOfSmallestRings=20 (SSSR), which is fast even for C60, and to produce the SetOfAllRings by=20 a linear combination of the SSSR rings. Clearly, this will not help in all cases, especially when the whole ring=20 system is aromatic. Then again, doing the aromaticity detection just based on the SSSR will=20 work. Ultimately, I think, we cannot do without a decent heuristic, where we=20 evaluate some statistics on the number of expected aromatic rings vs.=20 number of all rings, and then apply the best suited Ring Detection=20 Algorithm. Some of you might have noticed that virtually not literature is=20 available on the aromaticity perception problem. I just found this: (1) Rooskozel, B. L.; Jorgensen, W. L. ComputerAssisted Mechanistic=20 Evaluation of OrganicReactions .2. Perception of Rings, Aromaticity,=20 and Tautomers. Journal of Chemical Information and Computer Sciences=20 1981, 21, 101111. and I will try to get a copy and see what they did. > If anybody interested in code / statistics, please let me know.=20 Oh, yes, please. We would be very interested in that. It could go into a theory manual, which we urgently need to start. Cheers, Chris =20 Priv. Doz. Dr. Christoph Steinbeck (c.steinbeck@...) Head of the Research Group for Molecular Informatics Cologne University BioInformatics Center (http://www.cubic.unikoeln.de) Z=FClpicher Str. 47, 50674 Cologne Tel: +49(0)2214707426 Fax: +49 (0) 2214707786 What is man but that lofty spirit  that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. Nina Jeliazkova wrote: > Noel, all, >=20 > I have run into the problem of slow fingerprints (and smiles as well) s= ome > months ago, while playing with NCI dataset. There are some molecules in= this > dataset which can run for two days. >=20 > In fact the slow part is AllRingsFinder class and although the algorith= m > implemented for finding all rings is published, it is not very efficien= t in > some cases. I could provide statistics for timing for almost all NCI d= ataset > if anybody is interested. >=20 > A test I had developed is as follows: >=20 > 1) calculate the spanning tree of the molecule (I would be glad to cont= ribute > the code to CDK, I couldn't find spanning tree functionality some month= s ago, > haven't checked recently).=20 > This is a classic and fast algorithms, so not problems with timing. >=20 > 2) identify the number of cyclic bonds (this is straightforward from a > spanning tree) >=20 > 3) identify the maximum bonds per atom=20 >=20 > 4) calling AllRingsFinder is safe for compounds with the number of cycl= ic > bonds less than about 37 (this is heuristic ! ) and maximum bonds per a= tom <=3D > 4 (yes, there are some exotic structures within NCI dataset with more t= han 4 > bonds per atom) >=20 > This makes things safe (btw, some structures which could possibly go fa= st will > be missed), but nevertheless it is just an workaround. >=20 > The better solution is to have a flag inside the AllRingsFinder, so tha= t if it > is called in a thread, one just kills the thread if the allowed time is > exhausted. Haven't tried this. >=20 > If anybody interested in code / statistics, please let me know.=20 >=20 > Regards, > Nina >=20 >=20 >>< >=20 > Assoc. Prof. Dr. Nina NikolovaJeliazkova >=20 > Institute for Parallel Processing > Bulgarian Academy of Sciences > 25a "acad. G.Bonchev" str.=20 > Sofia 1113 > Bulgaria >=20 > Phone : +359 2 979 6616 > Mobile: +359 088 6802011 > Fax : +359 2 8707273 > http://luna.acad.bg/nina >=20 >>< >=20 >=20 > "SourceForge.net" <noreply@...> wrote: >=20 >=20 >>Feature Requests item #1181323, was opened at 20050412 09:26 >>Message generated for change (Tracker Item Submitted) made by Item Subm= itter >>You can respond by visiting:=20 >> >=20 > https://sourceforge.net/tracker/?func=3Ddetail&atid=3D370024&aid=3D1181= 323&group_id=3D20024 >=20 >>Category: cdk.fingerprint >>Group: None >>Status: Open >>Priority: 5 >>Submitted By: Noel O\'Boyle (baoilleach) >>Assigned to: Christoph Steinbeck (steinbeck) >>Summary: Test for very slow fingerprints >> >>Initial Comment: >>I have been calculating fingerprints for 3000 >>'reallife' molecules, using the default settings for >>the FingerPrinter class (which are not described in the >>API JavaDoc  I think they probably should be). Most >>molecules took a fraction of a second to calculate. >>However, a couple of them took up to 8 hours. >>This was due to a large number of subgraphs (I think). >>Is there any way to guesstimate whether a particular >>molecule will be very slow to FingerPrint, so that it >>can be left out of a screen if desired? In the end, it >>took around 4 days to calculate fingerprints for the >>3000 molecules. To be fair to FingerPrinter, the slow >>molecules did not look very druglike, but I would have >>prefered to leave 6 molecules out and complete the >>calculation in one hour, rather than include them, and >>take 4 days. >>If you are interested, I have attached one of the slow >>molecules. >> >>Noel >> 
From: Nina Jeliazkova <nina@ac...>  20050418 14:44:53

Hi Chris, Christoph Steinbeck <c.steinbeck@...> wrote: > Hi Nina, and others, > > thanks a lot for this nice analysis. > > cdk.AllRingsFinder has two problems: > 1. The algorithm might be slow by design in some cases. yes, it is, but finding all possible rings IS a hard problem > 2. My implementation might be suboptimal. haven't checked :) > We can ignore point 2 for now :), because clearly the combinatorial > problem underlying point 1 will catch us anyway above a certain number > of rings in the molecule. exactly > > In the case of the cdk.Fingerprinter, AllRingsFinder is used as a > preprocessing step to aromaticity detection. > Marking bonds as aromatic clearly is a crucial step in fingerprint > calculation. Since we are thus not interested in all rings but only > those who are likely to be aromatic, we might be able to come up with a > better solution, which does it without AllRingsFinder. btw, some algorithms consider finding "relevant" cycles in the graph, I particularly found this one quite appealing in theory (and used in a chemistry context) http://www.combinatorics.org/Volume_4/PostScriptfiles/v4i1r9.ps > > One idea that we recently had was to make the SmallestSetOfSmallestRings > (SSSR), which is fast even for C60, and to produce the SetOfAllRings by > a linear combination of the SSSR rings. > Clearly, this will not help in all cases, especially when the whole ring > system is aromatic. > Then again, doing the aromaticity detection just based on the SSSR will > work. > > Ultimately, I think, we cannot do without a decent heuristic, where we > evaluate some statistics on the number of expected aromatic rings vs. > number of all rings, and then apply the best suited Ring Detection > Algorithm. > > Some of you might have noticed that virtually not literature is > available on the aromaticity perception problem. > I just found this: > (1) Rooskozel, B. L.; Jorgensen, W. L. ComputerAssisted Mechanistic > Evaluation of OrganicReactions .2. Perception of Rings, Aromaticity, > and Tautomers. Journal of Chemical Information and Computer Sciences > 1981, 21, 101111. > > and I will try to get a copy and see what they did. > > > If anybody interested in code / statistics, please let me know. > > Oh, yes, please. We would be very interested in that. > It could go into a theory manual, which we urgently need to start. yes, I've already agreed I will write a summary to be included in the next CDK news Best, Nina 
From: Peter MurrayRust <pm286@ca...>  20050418 19:00:49

At 15:44 18/04/2005, Nina Jeliazkova wrote: >Hi Chris, > >Christoph Steinbeck <c.steinbeck@...> wrote: > > > Hi Nina, and others, > > > > thanks a lot for this nice analysis. > > > > cdk.AllRingsFinder has two problems: > > 1. The algorithm might be slow by design in some cases. > >yes, it is, but finding all possible rings IS a hard problem I'm sure you're already aware of my concern about generating structure diagrams ("2D layout"). It's critical to have an abort from this after a few seconds at most. It is (I think) fairly quick to find a "FairlySmallSetOfFairlySmallRings" and this would be quite satisfactory for layout. Alternatively we should have a timeout after a given number of cycles. P. Peter MurrayRust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +441223763069 Fax: +44 1223 763076 