[XMLPipeDB-developer] TallyEngine bugs
Brought to you by:
kdahlquist,
zugzugglug
From: Kam D. <kda...@lm...> - 2009-08-21 01:54:21
|
Hi, I'm starting this as a new thread (hopefully) so we can keep the threads separate by task. There are issues with the TallyEngine for each species, maybe they are related, but I don't know; I'm linking to their testing report pages where you can look at the results screen shot and the OriginalRowCounts data directly. Kam New species: Mycobacterium tuberculosis https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_M._tuberculosis_20090817 and P. aerugenosa https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._aerugenosa_20090817 * OrderedLocusNames is not being tallied * UniGene is not being tallied (UniGene doesn't exist for these two, but in other TallyEngine results for other species, it is actually listed with counts of zero. We should be consistent about counting it.) Old species: E. coli K12 https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_E._coli_K12_20090819 * There are discrepancies between the TallyEngine results and the gdb ** EchoBASE has 4022 records in gdb, not 4028 ** EcoGene has 4190 records in gdb, not 4199 ** Blattner has 4328 records in gdb, not 8352 * W3110 is not being counted (or is being conflated with Blattner) * EchoBASE, EcoGene, and Blattner are appearing twice in results with second instance giving -1 result for XML Count. * UniGene is not being tallied (UniGene doesn't exist for these two, but in other TallyEngine results for other species, it is actually listed with counts of zero. We should be consistent about counting it.) Arabidopsis https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_Arabidopsis_20090819 * Discrepancy between TallyEngine and OriginalRowCounts for the following systems: ** GeneId ** RefSeq ** TAIR ** UniGene ** In these cases, gdb has fewer records. * TAIR and UniGene are repeated in TallyEngine results, showing a -1 for the XML Count for the second result. Plasmodium https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20090819 * Discrepancies in TallyEngine Results versus OriginalRowCounts ** GeneId TallyEngine has 5264, but OriginalRowCounts is 5260 ** OrderedLocusNames in gdb is 5336 and much less in TallyEngine, also XML Count and Database Count are off by 1. * OrderedLocus and UniGene repeated twice in results with a -1 given for the XML Count the second time. Vibrio https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_V._cholerae_20090820 * OrderedLocusNames counts are off by one between the XML and database (slash issue?); also, because of the underscore issue, remember that all of these IDs got duplicated in the gdb +/- the underscore. * OrderedLocusNames and UniGene appear twice in results, second time with -1 in XML Count. |