Re: [XMLPipeDB-developer] Plasmodium bug/task list
Brought to you by:
kdahlquist,
zugzugglug
From: Kam D. <kda...@lm...> - 2011-03-14 17:33:06
|
Hi, I looked up an assortment of IDs in UniProt and I can confirm that it appears that the IDs are found in the ORF tag, not the OrderedLocus tag (except for the one that got captured in the export). Best, Kam At 08:09 AM 3/14/2011, you wrote: >Thanks Dondi, > >Will review this after our call today. I have been a little worried >as the DEBUG export has been going for 2.5 days with progress at 65% >and 6.5 Gb of log files so far... /yikes > >Btw I have a work lunch meeting in Beverly Hills today so will be >working from home afterwards instead of in the bio lab. > >Richard > >On Sun, Mar 13, 2011 at 9:55 PM, John David N. Dionisio ><<mailto:do...@lm...>do...@lm...> wrote: >Thanks for the updates, Rich. > >I gave things a once-over and may have a lead. Here is what I found: > >- First, the TallyEngine customization for P. falciparum states the following: > > ># Plasmodium falciparum >plasmodiumfalciparum_level_amount=2 > >plasmodiumfalciparum_element_level0=uniprot/entry/gene/name&type&ORF >plasmodiumfalciparum_element_level1=uniprot/entry/gene/name&type&UniGene > >plasmodiumfalciparum_query_level0=select count(*) from genenametype >where type = 'ORF'; >plasmodiumfalciparum_query_level1=select count(*) from genenametype >where type = 'UniGene'; > >plasmodiumfalciparum_table_name_level0=Ordered Locus >plasmodiumfalciparum_table_name_level1=UniGene > > >Thus, what is being counted by TallyEngine as "Ordered Locus" are >the gene names whose type is 'ORF' ("level0" properties). > >- Now, this is what the P. falciparum species profile does when >harvesting IDs >(PlasmodiumFalciparumUniProtSpeciesProfile.getSystemTableManagerCustomizations): > > > String sqlQuery = "select d.entrytype_gene_hjid as hjid, c.value " + > "from genenametype c inner join entrytype_genetype d " + > "on (c.entrytype_genetype_name_hjid = d.hjid) " + > "where (c.value similar to ? " + > "or c.value similar to ? " + > "or c.value similar to ?) " + > "and type <> 'ordered locus names' " + > "and type <> 'ORF' " + > "group by d.entrytype_gene_hjid, c.value"; > > >Note the condition on the second-to-last line --- the query actually >*omits* gene names whose type is 'ORF'! So the question is...which >is right? (I'm inclined to believe the Tally Engine here, since, >the export puts only one record in OrderedLocusNames) > >Still, comparing these two queries directly against the PostgreSQL >database would be educational, I think. Then, knowing which >criteria are correct, the appropriate action can then be taken, I think. > >Hope this helps... > >John David N. Dionisio, PhD >Associate Professor, Computer Science >Loyola Marymount University > > > >On Mar 12, 2011, at 9:48 AM, Richard Brous wrote: > > > Debug export is still going... 2.5GB of log files so far with > progress at 65%... > > > > I posted the link of the WARN log on the plasmodium page here: > <https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum>https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum. > > Richard > > On Fri, Mar 11, 2011 at 1:06 PM, Richard Brous > <<mailto:rbr...@gm...>rbr...@gm...> wrote: > > Hi all, > > > > Have been working through several Plasmodium gdb exports in an > attempt to source why only one gene id makes it into the Ordered Locus table. > > > > I have reviewed the logger file while set to "WARN" and wasn't > able to determine anything which would suggest an error. I will > post this log file to the wiki later today when I get home. > > > > I then upped the logger verbosity to "DEBUG" and file size to > 100MB with hopes that more detail will surface the issue, but my > export is on hour 20 and still going (although its nearly > complete). What I didn't expect was the size of the log files and > that it seems only the last 3 are kept with earlier logs being > overwritten =( I fear that the information I need it in one of the > earlier files which are now lost. > > > > Unless a better suggestion is offered I'm going to rerun an > export again with 'DEBUG" verbosity and up the file sizes to near 1 > GB each and hope that 3 GB total will be enough to hold the > complete export log. > > > > More info as it comes... > > > > Richard > > > > > > > > > > > > On Fri, Mar 4, 2011 at 3:17 PM, Kam Dahlquist > <<mailto:kda...@lm...>kda...@lm...> wrote: > > Hi, > > > > I've completed testing the Plasmodium gdb I exported last > November and updated the SourceForge wiki. > > > > Plasmodium has it's own task list page, which I've updated > here: > <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List > > > > The testing report can be found > here: > <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115 > > > > The source files and gdb are on a new Plasmodium falciparum page > on the Fall 2010 BiolDB > wiki: > <https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum>https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum > > > > Here is the list of bugs/action items that I've listed: > > > > 1. The OrderedLocusNames table in the gdb only has 1 ID out of > 5345 repored by the TallyEngine. This also affects all other tables > related to OrderedLocusNames. > > > > 2. The GeneId table in the database has 6 fewer IDs than > reported by the TallyEngine (Mycobacterium smegmatis and > Mycobacterium tuberculosis also have mysterious GeneId issues with > the TallyEngine). > > > > 3. The count for EMBL IDs in the gdb also seems low, it's lower > than the 2009 version of the gdb. There's no way to tell at this > point whether this is due to a change in annotation by UniProt or > is a bug with GenMAPP Builder. > > > > Thanks, > > Kam > > > > > > > ------------------------------------------------------------------------------ > > What You Don't Know About Data Connectivity CAN Hurt You > > This paper provides an overview of data connectivity, details > > its effect on application quality, and explores various alternative > > solutions. > <http://p.sf.net/sfu/progress-d2d>http://p.sf.net/sfu/progress-d2d > > _______________________________________________ > > xmlpipedb-developer mailing list > > > <mailto:xml...@li...>xml...@li... > > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > > > > > > > > <ATT00001..txt><ATT00002..txt> > > >------------------------------------------------------------------------------ >Colocation vs. Managed Hosting >A question and answer guide to determining the best fit >for your organization - today and in the future. ><http://p.sf.net/sfu/internap-sfd2d>http://p.sf.net/sfu/internap-sfd2d >_______________________________________________ >xmlpipedb-developer mailing list ><mailto:xml...@li...>xml...@li... >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > > > |