From: Christoph S. <er...@do...> - 2007-09-29 12:23:04
|
Rajashi, this is *extremely* helpful for future development. Thanks a lot! What I would be interested in is: How long did it take to run the test (on what machine)? It would be great to get this running as a "nightly" and seeing the types of atoms which fail flagged up in color in 2D diagrams. Cheers, Chris Rajarshi Guha wrote: > Hi, I've put in a QA project that will be running the new atom types > against Pubchem > > As part of the testing, I ran 50,000 molecules through it and here's > a summary > > No. of molecules exhibiting type errors = 5060 > No. of elements involved in type errors = 100 > No. of molecules with bad C types = 2699 > No. of molecules with bad N types = 0 > No. of molecules with bad O types = 1889 > No. of molecules with bad P types = 396 > No. of molecules with bad S types = 186 > > The no of molecules and elements is unique. > > Also, the numbers for individual elements are overestimates. This is > because for a number of molecules such as 139358 [1], 14694528 [2] > and 485468 [3] the structures are inorganic complexes - in these > cases all the C's or O's are flagged as untyped. Not surprising! > > So the fact that less than 10% of this sample flags missing atom > types is not too bad. > > One way to work on this is to avoid using the whole of Pubchem but > work on something like a drug-like subset. That way we'd avoid > inorganic complexes. However, the issue becomes - where do we store > it? I'd suggest storing a compressed version in the cdk-qa repo > > [1] http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi? > db=pccompound&term=139358 > [2] http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=14694528 > [3] http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi? > db=pccompound&term=485468 > > ------------------------------------------------------------------- > Rajarshi Guha <rg...@in...> > GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE > ------------------------------------------------------------------- > I saw Elvis. He sat between me and Bigfoot on the UFO. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel -- PD Dr. Christoph Steinbeck Lecturer in Chemoinformatics Univ. Tuebingen, WSI-RA, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071-29-78978 Fax: (+49/0) 7071-29-5091 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |