From: Egon W. <e.w...@sc...> - 2006-01-19 18:56:33
|
Hi Uli/Rajarshi, I'm glad to see that the test set has been added to CVS, and that new results get uploaded. I saw that Rajarshi removed a few molecules, that could not be converted into 3D models with Corina. So, I guess these (how many are there?) should be replaced by new molecules. Uli, can you create a list of say, 100 back ups? BTW, I promised to report the list of ZINC ids to the ZINC-developers, so that they know which molecules we used. Egon -- e.w...@sc... PhD student on Molecular Representation in Chemometrics Radboud University Nijmegen Blog: http://chem-bla-ics.blogspot.com/ http://www.cac.science.ru.nl/people/egonw/ GPG: 1024D/D6336BA6 |
From: Rajarshi G. <rx...@ps...> - 2006-01-19 19:07:52
|
On Thu, 2006-01-19 at 19:55 +0100, Egon Willighagen wrote: > Hi Uli/Rajarshi, > > I'm glad to see that the test set has been added to CVS, and that new results > get uploaded. > > I saw that Rajarshi removed a few molecules, that could not be converted into > 3D models with Corina. So, I guess these (how many are there?) 4 got dropped (serials 9, 253, 879, 963 from the original set that Uli sent me) > should be > replaced by new molecules. Uli, can you create a list of say, 100 back ups? > > BTW, I promised to report the list of ZINC ids to the ZINC-developers, so that > they know which molecules we used. projects/050501-0001/zinc_ids.txt in CVS contains the ID's of the 996 molecules that remain BTW, looks like I can't (reliably) use ADAPT to generate comparison data for some descriptors because it won't handle charged species. I recall that DRAGON calculated a number of ADAPT descriptors but I don't have access to it. I've placed the CDK generated data for 3 descriptors in the CVS ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- A sine curve goes off to infinity, or at least the end of the blackboard. -- Prof. Steiner |
From: Egon W. <e.w...@sc...> - 2006-01-19 19:12:16
|
On Thursday 19 January 2006 20:08, Rajarshi Guha wrote: > On Thu, 2006-01-19 at 19:55 +0100, Egon Willighagen wrote: > > I'm glad to see that the test set has been added to CVS, and that new > > results get uploaded. > > > > I saw that Rajarshi removed a few molecules, that could not be converted > > into 3D models with Corina. So, I guess these (how many are there?) > > 4 got dropped (serials 9, 253, 879, 963 from the original set that Uli > sent me) > > > should be > > replaced by new molecules. Uli, can you create a list of say, 100 back > > ups? > > > > BTW, I promised to report the list of ZINC ids to the ZINC-developers, so > > that they know which molecules we used. > > projects/050501-0001/zinc_ids.txt in CVS contains the ID's of the 996 > molecules that remain > > BTW, looks like I can't (reliably) use ADAPT to generate comparison data > for some descriptors because it won't handle charged species. I recall > that DRAGON calculated a number of ADAPT descriptors but I don't have > access to it. I've placed the CDK generated data for 3 descriptors in > the CVS Are there many charged species? Or we should make a third alternative, one with hydrogens, but without charges. BTW, two weeks ago a learned that Dragon has trouble with molecules with more than 300 atoms :) How does this work with our test data set? Egon -- e.w...@sc... PhD student on Molecular Representation in Chemometrics Radboud University Nijmegen Blog: http://chem-bla-ics.blogspot.com/ http://www.cac.science.ru.nl/people/egonw/ GPG: 1024D/D6336BA6 |
From: Rajarshi G. <rx...@ps...> - 2006-01-19 19:20:28
|
On Thu, 2006-01-19 at 20:12 +0100, Egon Willighagen wrote: > On Thursday 19 January 2006 20:08, Rajarshi Guha wrote: > > On Thu, 2006-01-19 at 19:55 +0100, Egon Willighagen wrote: > > > I'm glad to see that the test set has been added to CVS, and that new > > > results get uploaded. > > > > > > I saw that Rajarshi removed a few molecules, that could not be converted > > > into 3D models with Corina. So, I guess these (how many are there?) > > > > 4 got dropped (serials 9, 253, 879, 963 from the original set that Uli > > sent me) > > > > > should be > > > replaced by new molecules. Uli, can you create a list of say, 100 back > > > ups? > > > > > > BTW, I promised to report the list of ZINC ids to the ZINC-developers, so > > > that they know which molecules we used. > > > > projects/050501-0001/zinc_ids.txt in CVS contains the ID's of the 996 > > molecules that remain > > > > BTW, looks like I can't (reliably) use ADAPT to generate comparison data > > for some descriptors because it won't handle charged species. I recall > > that DRAGON calculated a number of ADAPT descriptors but I don't have > > access to it. I've placed the CDK generated data for 3 descriptors in > > the CVS > > Are there many charged species? 607 have a M CHG entry > Or we should make a third alternative, one > with hydrogens, but without charges. Might be a good idea - though if DRAGON can handle charged species I don't think we need bother. I only faced this problem because ADAPT is *old* and nobody has fiddled with the internal data structures for 10-15 years > BTW, two weeks ago a learned that Dragon has trouble with molecules with more > than 300 atoms :) How does this work with our test data set? We're good on atom count, max value is 81. I've attached histograms of MW and atom count ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- All theoretical chemistry is really physics; and all theoretical chemists know it. -- Richard P. Feynman |
From: Uli F. <u.f...@ch...> - 2006-01-19 20:02:53
|
I will provide four datasets that contains 1000 structures (with hydrogens and with charges, w/o hydrogens and w/ charges, w/ hydrogens and w/o charges, w/o hydrogens and w/o charges). Unfortunately, this will take some days, because I am away from Friday to Monday. This will delay the writing of the two CDKNews articles that rely on the descriptor QA dataset. I hope that I can make it until the end of next week. Rajarshi, could you please tell me the ZINC identifiers of the molecules corina was not able to process? Uli Rajarshi Guha wrote: > On Thu, 2006-01-19 at 20:12 +0100, Egon Willighagen wrote: >> On Thursday 19 January 2006 20:08, Rajarshi Guha wrote: >>> On Thu, 2006-01-19 at 19:55 +0100, Egon Willighagen wrote: >>>> I'm glad to see that the test set has been added to CVS, and that new >>>> results get uploaded. >>>> >>>> I saw that Rajarshi removed a few molecules, that could not be converted >>>> into 3D models with Corina. So, I guess these (how many are there?) >>> 4 got dropped (serials 9, 253, 879, 963 from the original set that Uli >>> sent me) >>> >>>> should be >>>> replaced by new molecules. Uli, can you create a list of say, 100 back >>>> ups? >>>> >>>> BTW, I promised to report the list of ZINC ids to the ZINC-developers, so >>>> that they know which molecules we used. >>> projects/050501-0001/zinc_ids.txt in CVS contains the ID's of the 996 >>> molecules that remain >>> >>> BTW, looks like I can't (reliably) use ADAPT to generate comparison data >>> for some descriptors because it won't handle charged species. I recall >>> that DRAGON calculated a number of ADAPT descriptors but I don't have >>> access to it. I've placed the CDK generated data for 3 descriptors in >>> the CVS >> Are there many charged species? > > 607 have a M CHG entry > >> Or we should make a third alternative, one >> with hydrogens, but without charges. > > Might be a good idea - though if DRAGON can handle charged species I > don't think we need bother. I only faced this problem because ADAPT is > *old* and nobody has fiddled with the internal data structures for 10-15 > years > >> BTW, two weeks ago a learned that Dragon has trouble with molecules with more >> than 300 atoms :) How does this work with our test data set? > > We're good on atom count, max value is 81. I've attached histograms of > MW and atom count > > ------------------------------------------------------------------- > Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> > GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE > ------------------------------------------------------------------- > All theoretical chemistry is really physics; and all theoretical > chemists > know it. > -- Richard P. Feynman > > > > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------ > |
From: Rajarshi G. <rx...@ps...> - 2006-01-19 21:46:54
|
On Thu, 2006-01-19 at 21:05 +0100, Uli Fechner wrote: > Rajarshi, could you please tell me the ZINC identifiers of the molecules > corina was not able to process? ZINC00033713 ZINC00644893 ZINC03138819 ZINC03618623 ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Q: Why did the mathematician name his dog "Cauchy"? A: Because he left a residue at every pole. |