Re: [XMLPipeDB-developer] Plasmodium bug/task list
Brought to you by:
kdahlquist,
zugzugglug
From: Richard B. <rbr...@gm...> - 2011-03-14 15:09:33
|
Thanks Dondi, Will review this after our call today. I have been a little worried as the DEBUG export has been going for 2.5 days with progress at 65% and 6.5 Gb of log files so far... /yikes Btw I have a work lunch meeting in Beverly Hills today so will be working from home afterwards instead of in the bio lab. Richard On Sun, Mar 13, 2011 at 9:55 PM, John David N. Dionisio <do...@lm...>wrote: > Thanks for the updates, Rich. > > I gave things a once-over and may have a lead. Here is what I found: > > - First, the TallyEngine customization for P. falciparum states the > following: > > > # Plasmodium falciparum > plasmodiumfalciparum_level_amount=2 > > plasmodiumfalciparum_element_level0=uniprot/entry/gene/name&type&ORF > plasmodiumfalciparum_element_level1=uniprot/entry/gene/name&type&UniGene > > plasmodiumfalciparum_query_level0=select count(*) from genenametype where > type = 'ORF'; > plasmodiumfalciparum_query_level1=select count(*) from genenametype where > type = 'UniGene'; > > plasmodiumfalciparum_table_name_level0=Ordered Locus > plasmodiumfalciparum_table_name_level1=UniGene > > > Thus, what is being counted by TallyEngine as "Ordered Locus" are the gene > names whose type is 'ORF' ("level0" properties). > > - Now, this is what the P. falciparum species profile does when harvesting > IDs > (PlasmodiumFalciparumUniProtSpeciesProfile.getSystemTableManagerCustomizations): > > > String sqlQuery = "select d.entrytype_gene_hjid as hjid, c.value " + > "from genenametype c inner join entrytype_genetype d " + > "on (c.entrytype_genetype_name_hjid = d.hjid) " + > "where (c.value similar to ? " + > "or c.value similar to ? " + > "or c.value similar to ?) " + > "and type <> 'ordered locus names' " + > "and type <> 'ORF' " + > "group by d.entrytype_gene_hjid, c.value"; > > > Note the condition on the second-to-last line --- the query actually > *omits* gene names whose type is 'ORF'! So the question is...which is > right? (I'm inclined to believe the Tally Engine here, since, the export > puts only one record in OrderedLocusNames) > > Still, comparing these two queries directly against the PostgreSQL database > would be educational, I think. Then, knowing which criteria are correct, > the appropriate action can then be taken, I think. > > Hope this helps... > > John David N. Dionisio, PhD > Associate Professor, Computer Science > Loyola Marymount University > > > > On Mar 12, 2011, at 9:48 AM, Richard Brous wrote: > > > Debug export is still going... 2.5GB of log files so far with progress at > 65%... > > > > I posted the link of the WARN log on the plasmodium page here: > https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum. > > Richard > > On Fri, Mar 11, 2011 at 1:06 PM, Richard Brous <rbr...@gm...> > wrote: > > Hi all, > > > > Have been working through several Plasmodium gdb exports in an attempt to > source why only one gene id makes it into the Ordered Locus table. > > > > I have reviewed the logger file while set to "WARN" and wasn't able to > determine anything which would suggest an error. I will post this log file > to the wiki later today when I get home. > > > > I then upped the logger verbosity to "DEBUG" and file size to 100MB with > hopes that more detail will surface the issue, but my export is on hour 20 > and still going (although its nearly complete). What I didn't expect was the > size of the log files and that it seems only the last 3 are kept with > earlier logs being overwritten =( I fear that the information I need it in > one of the earlier files which are now lost. > > > > Unless a better suggestion is offered I'm going to rerun an export again > with 'DEBUG" verbosity and up the file sizes to near 1 GB each and hope that > 3 GB total will be enough to hold the complete export log. > > > > More info as it comes... > > > > Richard > > > > > > > > > > > > On Fri, Mar 4, 2011 at 3:17 PM, Kam Dahlquist <kda...@lm...> > wrote: > > Hi, > > > > I've completed testing the Plasmodium gdb I exported last November and > updated the SourceForge wiki. > > > > Plasmodium has it's own task list page, which I've updated here: > https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List > > > > The testing report can be found here: > https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115 > > > > The source files and gdb are on a new Plasmodium falciparum page on the > Fall 2010 BiolDB wiki: > https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum > > > > Here is the list of bugs/action items that I've listed: > > > > 1. The OrderedLocusNames table in the gdb only has 1 ID out of 5345 > repored by the TallyEngine. This also affects all other tables related to > OrderedLocusNames. > > > > 2. The GeneId table in the database has 6 fewer IDs than reported by the > TallyEngine (Mycobacterium smegmatis and Mycobacterium tuberculosis also > have mysterious GeneId issues with the TallyEngine). > > > > 3. The count for EMBL IDs in the gdb also seems low, it's lower than the > 2009 version of the gdb. There's no way to tell at this point whether this > is due to a change in annotation by UniProt or is a bug with GenMAPP > Builder. > > > > Thanks, > > Kam > > > > > > > ------------------------------------------------------------------------------ > > What You Don't Know About Data Connectivity CAN Hurt You > > This paper provides an overview of data connectivity, details > > its effect on application quality, and explores various alternative > > solutions. http://p.sf.net/sfu/progress-d2d > > _______________________________________________ > > xmlpipedb-developer mailing list > > xml...@li... > > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > > > > > > > > <ATT00001..txt><ATT00002..txt> > > > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > xmlpipedb-developer mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > |