Re: [XMLPipeDB-developer] Plasmodium bug/task list
Brought to you by:
kdahlquist,
zugzugglug
From: Kam D. <kda...@lm...> - 2011-03-16 21:59:58
|
Hi, I've taken a look at the list of IDs and did a quick comparison with both the older released gdb and also a list I downloaded from the Broad Institute Plasmodium database. I think we can safely go with the query on the ORF tag for our export--all of those different ID forms are valid. There are about 400 IDs that are different in the older released gdb than in the new query; I'm going to further investigate those. I suspect that the difference is mainly due to a +/- underscore issue that we might need to solve. However, we should go forward with capturing all the IDs from the ORF tag, I don't see a need to restrict to a particular pattern there. Best, Kam At 09:48 PM 3/14/2011, you wrote: >Hi all, > >So I went ahead and did raw sql queries of the Postgres data and >turned up the following: > >select * from genenametype where type = 'ordered locus' >Returned zero gene ids > >select * from genenametype where type = 'ORF' >Returned 5345 gene ids >The type = 'ORF' query was exported into excel and posted to the >biodb wiki on the Spring 2011 Plasmodium page. > >There are many many patterns in regards to gene ids, here the the >prefixes from my cursory look: >MAL >PF##_ >PFA >PFB >PFC >PFD >PFE >PFF >PFI >PFL > >Richard > > >On Mon, Mar 14, 2011 at 10:32 AM, Kam Dahlquist ><<mailto:kda...@lm...>kda...@lm...> wrote: >Hi, > >I looked up an assortment of IDs in UniProt and I can confirm that >it appears that the IDs are found in the ORF tag, not the >OrderedLocus tag (except for the one that got captured in the export). > >Best, >Kam > > >At 08:09 AM 3/14/2011, you wrote: >>Thanks Dondi, >> >>Will review this after our call today. I have been a little worried >>as the DEBUG export has been going for 2.5 days with progress at >>65% and 6.5 Gb of log files so far... /yikes >> >>Btw I have a work lunch meeting in Beverly Hills today so will be >>working from home afterwards instead of in the bio lab. >> >>Richard >> >>On Sun, Mar 13, 2011 at 9:55 PM, John David N. Dionisio >><<mailto:do...@lm...>do...@lm...> wrote: >>Thanks for the updates, Rich. >>I gave things a once-over and may have a lead. Here is what I found: >>- First, the TallyEngine customization for P. falciparum states the >>following: >> >># Plasmodium falciparum >>plasmodiumfalciparum_level_amount=2 >>plasmodiumfalciparum_element_level0=uniprot/entry/gene/name&type&ORF >>plasmodiumfalciparum_element_level1=uniprot/entry/gene/name&type&UniGene >>plasmodiumfalciparum_query_level0=select count(*) from genenametype >>where type = 'ORF'; >>plasmodiumfalciparum_query_level1=select count(*) from genenametype >>where type = 'UniGene'; >>plasmodiumfalciparum_table_name_level0=Ordered Locus >>plasmodiumfalciparum_table_name_level1=UniGene >> >>Thus, what is being counted by TallyEngine as "Ordered Locus" are >>the gene names whose type is 'ORF' ("level0" properties). >>- Now, this is what the P. falciparum species profile does when >>harvesting IDs >>(PlasmodiumFalciparumUniProtSpeciesProfile.getSystemTableManagerCustomizations): >> >> String sqlQuery = "select d.entrytype_gene_hjid as hjid, c.value " + >> "from genenametype c inner join entrytype_genetype d " + >> "on (c.entrytype_genetype_name_hjid = d.hjid) " + >> "where (c.value similar to ? " + >> "or c.value similar to ? " + >> "or c.value similar to ?) " + >> "and type <> 'ordered locus names' " + >> "and type <> 'ORF' " + >> "group by d.entrytype_gene_hjid, c.value"; >> >>Note the condition on the second-to-last line --- the query >>actually *omits* gene names whose type is 'ORF'! So the question >>is...which is right? (I'm inclined to believe the Tally Engine >>here, since, the export puts only one record in OrderedLocusNames) >>Still, comparing these two queries directly against the PostgreSQL >>database would be educational, I think. Then, knowing which >>criteria are correct, the appropriate action can then be taken, I think. >> >>Hope this helps... >>John David N. Dionisio, PhD >>Associate Professor, Computer Science >>Loyola Marymount University >> >> >>On Mar 12, 2011, at 9:48 AM, Richard Brous wrote: >> > Debug export is still going... 2.5GB of log files so far with >> progress at 65%... >> > >> > I posted the link of the WARN log on the plasmodium page here: >> <https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum>https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum >> . >> > Richard >> > On Fri, Mar 11, 2011 at 1:06 PM, Richard Brous >> <<mailto:rbr...@gm...>rbr...@gm...> wrote: >> > Hi all, >> > >> > Have been working through several Plasmodium gdb exports in an >> attempt to source why only one gene id makes it into the Ordered Locus table. >> > >> > I have reviewed the logger file while set to "WARN" and wasn't >> able to determine anything which would suggest an error. I will >> post this log file to the wiki later today when I get home. >> > >> > I then upped the logger verbosity to "DEBUG" and file size to >> 100MB with hopes that more detail will surface the issue, but my >> export is on hour 20 and still going (although its nearly >> complete). What I didn't expect was the size of the log files and >> that it seems only the last 3 are kept with earlier logs being >> overwritten =( I fear that the information I need it in one of the >> earlier files which are now lost. >> > >> > Unless a better suggestion is offered I'm going to rerun an >> export again with 'DEBUG" verbosity and up the file sizes to near >> 1 GB each and hope that 3 GB total will be enough to hold the >> complete export log. >> > >> > More info as it comes... >> > >> > Richard >> > >> > >> > >> > >> > >> > On Fri, Mar 4, 2011 at 3:17 PM, Kam Dahlquist >> <<mailto:kda...@lm...>kda...@lm...> wrote: >> > Hi, >> > >> > I've completed testing the Plasmodium gdb I exported last >> November and updated the SourceForge wiki. >> > >> > Plasmodium has it's own task list page, which I've updated >> here: >> <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List >> >> > >> > The testing report can be found >> here: >> <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115 >> >> > >> > The source files and gdb are on a new Plasmodium falciparum page >> on the Fall 2010 BiolDB >> wiki: >> <https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum>https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum >> >> > >> > Here is the list of bugs/action items that I've listed: >> > >> > 1. The OrderedLocusNames table in the gdb only has 1 ID out of >> 5345 repored by the TallyEngine. This also affects all other >> tables related to OrderedLocusNames. >> > >> > 2. The GeneId table in the database has 6 fewer IDs than >> reported by the TallyEngine (Mycobacterium smegmatis and >> Mycobacterium tuberculosis also have mysterious GeneId issues with >> the TallyEngine). >> > >> > 3. The count for EMBL IDs in the gdb also seems low, it's lower >> than the 2009 version of the gdb. There's no way to tell at this >> point whether this is due to a change in annotation by UniProt or >> is a bug with GenMAPP Builder. >> > >> > Thanks, >> > Kam >> > >> > >> > >> ------------------------------------------------------------------------------ >> > What You Don't Know About Data Connectivity CAN Hurt You >> > This paper provides an overview of data connectivity, details >> > its effect on application quality, and explores various alternative >> > solutions. >> <http://p.sf.net/sfu/progress-d2d>http://p.sf.net/sfu/progress-d2d >> > _______________________________________________ >> > xmlpipedb-developer mailing list >> > >> <mailto:xml...@li...>xml...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> > >> > >> > >> > <ATT00001..txt><ATT00002..txt> >> >>------------------------------------------------------------------------------ >> >>Colocation vs. Managed Hosting >>A question and answer guide to determining the best fit >>for your organization - today and in the future. >><http://p.sf.net/sfu/internap-sfd2d>http://p.sf.net/sfu/internap-sfd2d >>_______________________________________________ >>xmlpipedb-developer mailing list >><mailto:xml...@li...>xml...@li... >> >>https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> >> > >------------------------------------------------------------------------------ >Colocation vs. Managed Hosting >A question and answer guide to determining the best fit >for your organization - today and in the future. ><http://p.sf.net/sfu/internap-sfd2d>http://p.sf.net/sfu/internap-sfd2d >_______________________________________________ >xmlpipedb-developer mailing list ><mailto:xml...@li...>xml...@li... >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > > > |