Re: [XMLPipeDB-developer] Plasmodium bug/task list
Brought to you by:
kdahlquist,
zugzugglug
|
From: Kam D. <kda...@lm...> - 2011-03-16 21:59:58
|
Hi,
I've taken a look at the list of IDs and did a quick comparison with
both the older released gdb and also a list I downloaded from the
Broad Institute Plasmodium database. I think we can safely go with
the query on the ORF tag for our export--all of those different ID
forms are valid. There are about 400 IDs that are different in the
older released gdb than in the new query; I'm going to further
investigate those. I suspect that the difference is mainly due to a
+/- underscore issue that we might need to solve. However, we should
go forward with capturing all the IDs from the ORF tag, I don't see a
need to restrict to a particular pattern there.
Best,
Kam
At 09:48 PM 3/14/2011, you wrote:
>Hi all,
>
>So I went ahead and did raw sql queries of the Postgres data and
>turned up the following:
>
>select * from genenametype where type = 'ordered locus'
>Returned zero gene ids
>
>select * from genenametype where type = 'ORF'
>Returned 5345 gene ids
>The type = 'ORF' query was exported into excel and posted to the
>biodb wiki on the Spring 2011 Plasmodium page.
>
>There are many many patterns in regards to gene ids, here the the
>prefixes from my cursory look:
>MAL
>PF##_
>PFA
>PFB
>PFC
>PFD
>PFE
>PFF
>PFI
>PFL
>
>Richard
>
>
>On Mon, Mar 14, 2011 at 10:32 AM, Kam Dahlquist
><<mailto:kda...@lm...>kda...@lm...> wrote:
>Hi,
>
>I looked up an assortment of IDs in UniProt and I can confirm that
>it appears that the IDs are found in the ORF tag, not the
>OrderedLocus tag (except for the one that got captured in the export).
>
>Best,
>Kam
>
>
>At 08:09 AM 3/14/2011, you wrote:
>>Thanks Dondi,
>>
>>Will review this after our call today. I have been a little worried
>>as the DEBUG export has been going for 2.5 days with progress at
>>65% and 6.5 Gb of log files so far... /yikes
>>
>>Btw I have a work lunch meeting in Beverly Hills today so will be
>>working from home afterwards instead of in the bio lab.
>>
>>Richard
>>
>>On Sun, Mar 13, 2011 at 9:55 PM, John David N. Dionisio
>><<mailto:do...@lm...>do...@lm...> wrote:
>>Thanks for the updates, Rich.
>>I gave things a once-over and may have a lead. Here is what I found:
>>- First, the TallyEngine customization for P. falciparum states the
>>following:
>>
>># Plasmodium falciparum
>>plasmodiumfalciparum_level_amount=2
>>plasmodiumfalciparum_element_level0=uniprot/entry/gene/name&type&ORF
>>plasmodiumfalciparum_element_level1=uniprot/entry/gene/name&type&UniGene
>>plasmodiumfalciparum_query_level0=select count(*) from genenametype
>>where type = 'ORF';
>>plasmodiumfalciparum_query_level1=select count(*) from genenametype
>>where type = 'UniGene';
>>plasmodiumfalciparum_table_name_level0=Ordered Locus
>>plasmodiumfalciparum_table_name_level1=UniGene
>>
>>Thus, what is being counted by TallyEngine as "Ordered Locus" are
>>the gene names whose type is 'ORF' ("level0" properties).
>>- Now, this is what the P. falciparum species profile does when
>>harvesting IDs
>>(PlasmodiumFalciparumUniProtSpeciesProfile.getSystemTableManagerCustomizations):
>>
>> String sqlQuery = "select d.entrytype_gene_hjid as hjid, c.value " +
>> "from genenametype c inner join entrytype_genetype d " +
>> "on (c.entrytype_genetype_name_hjid = d.hjid) " +
>> "where (c.value similar to ? " +
>> "or c.value similar to ? " +
>> "or c.value similar to ?) " +
>> "and type <> 'ordered locus names' " +
>> "and type <> 'ORF' " +
>> "group by d.entrytype_gene_hjid, c.value";
>>
>>Note the condition on the second-to-last line --- the query
>>actually *omits* gene names whose type is 'ORF'! So the question
>>is...which is right? (I'm inclined to believe the Tally Engine
>>here, since, the export puts only one record in OrderedLocusNames)
>>Still, comparing these two queries directly against the PostgreSQL
>>database would be educational, I think. Then, knowing which
>>criteria are correct, the appropriate action can then be taken, I think.
>>
>>Hope this helps...
>>John David N. Dionisio, PhD
>>Associate Professor, Computer Science
>>Loyola Marymount University
>>
>>
>>On Mar 12, 2011, at 9:48 AM, Richard Brous wrote:
>> > Debug export is still going... 2.5GB of log files so far with
>> progress at 65%...
>> >
>> > I posted the link of the WARN log on the plasmodium page here:
>> <https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum>https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum
>> .
>> > Richard
>> > On Fri, Mar 11, 2011 at 1:06 PM, Richard Brous
>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>> > Hi all,
>> >
>> > Have been working through several Plasmodium gdb exports in an
>> attempt to source why only one gene id makes it into the Ordered Locus table.
>> >
>> > I have reviewed the logger file while set to "WARN" and wasn't
>> able to determine anything which would suggest an error. I will
>> post this log file to the wiki later today when I get home.
>> >
>> > I then upped the logger verbosity to "DEBUG" and file size to
>> 100MB with hopes that more detail will surface the issue, but my
>> export is on hour 20 and still going (although its nearly
>> complete). What I didn't expect was the size of the log files and
>> that it seems only the last 3 are kept with earlier logs being
>> overwritten =( I fear that the information I need it in one of the
>> earlier files which are now lost.
>> >
>> > Unless a better suggestion is offered I'm going to rerun an
>> export again with 'DEBUG" verbosity and up the file sizes to near
>> 1 GB each and hope that 3 GB total will be enough to hold the
>> complete export log.
>> >
>> > More info as it comes...
>> >
>> > Richard
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Mar 4, 2011 at 3:17 PM, Kam Dahlquist
>> <<mailto:kda...@lm...>kda...@lm...> wrote:
>> > Hi,
>> >
>> > I've completed testing the Plasmodium gdb I exported last
>> November and updated the SourceForge wiki.
>> >
>> > Plasmodium has it's own task list page, which I've updated
>> here:
>> <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Plasmodium_falciparum_Task_List
>>
>> >
>> > The testing report can be found
>> here:
>> <https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115>https://sourceforge.net/apps/mediawiki/xmlpipedb/index.php?title=Gene_Database_Testing_Report_P._falciparum_20101115
>>
>> >
>> > The source files and gdb are on a new Plasmodium falciparum page
>> on the Fall 2010 BiolDB
>> wiki:
>> <https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum>https://www.cs.lmu.edu/biodb/fall2010/index.php/Plasmodium_falciparum
>>
>> >
>> > Here is the list of bugs/action items that I've listed:
>> >
>> > 1. The OrderedLocusNames table in the gdb only has 1 ID out of
>> 5345 repored by the TallyEngine. This also affects all other
>> tables related to OrderedLocusNames.
>> >
>> > 2. The GeneId table in the database has 6 fewer IDs than
>> reported by the TallyEngine (Mycobacterium smegmatis and
>> Mycobacterium tuberculosis also have mysterious GeneId issues with
>> the TallyEngine).
>> >
>> > 3. The count for EMBL IDs in the gdb also seems low, it's lower
>> than the 2009 version of the gdb. There's no way to tell at this
>> point whether this is due to a change in annotation by UniProt or
>> is a bug with GenMAPP Builder.
>> >
>> > Thanks,
>> > Kam
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > What You Don't Know About Data Connectivity CAN Hurt You
>> > This paper provides an overview of data connectivity, details
>> > its effect on application quality, and explores various alternative
>> > solutions.
>> <http://p.sf.net/sfu/progress-d2d>http://p.sf.net/sfu/progress-d2d
>> > _______________________________________________
>> > xmlpipedb-developer mailing list
>> >
>> <mailto:xml...@li...>xml...@li...
>>
>> > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>> >
>> >
>> >
>> > <ATT00001..txt><ATT00002..txt>
>>
>>------------------------------------------------------------------------------
>>
>>Colocation vs. Managed Hosting
>>A question and answer guide to determining the best fit
>>for your organization - today and in the future.
>><http://p.sf.net/sfu/internap-sfd2d>http://p.sf.net/sfu/internap-sfd2d
>>_______________________________________________
>>xmlpipedb-developer mailing list
>><mailto:xml...@li...>xml...@li...
>>
>>https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>
>>
>
>------------------------------------------------------------------------------
>Colocation vs. Managed Hosting
>A question and answer guide to determining the best fit
>for your organization - today and in the future.
><http://p.sf.net/sfu/internap-sfd2d>http://p.sf.net/sfu/internap-sfd2d
>_______________________________________________
>xmlpipedb-developer mailing list
><mailto:xml...@li...>xml...@li...
>https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>
>
>
|