Re: [XMLPipeDB-developer] GenMAPP multitaxon support - CMSI 486T
Brought to you by:
kdahlquist,
zugzugglug
|
From: Richard B. <rbr...@gm...> - 2011-08-11 00:51:44
|
Updated repository to include all Gene Ontology changes discussed during our
meeting yesterday.
Digging into TableManager next.
Richard
On Fri, Aug 5, 2011 at 10:06 AM, Richard Brous <rbr...@gm...> wrote:
> whew... thanks for the detailed reply. I will digest this a bit and get
> back to you with further questions.
>
> rb
>
> On Thu, Aug 4, 2011 at 11:18 PM, John David N. Dionisio <do...@lm...>wrote:
>
>> Greetings,
>>
>> Sorry for the delay. I wasn't able to walk through the relevant code
>> until this evening.
>>
>> As Kam said, GOA serves as the link between the UniProt and GO IDs. It
>> essentially determines which GO IDs get exported by using GOA to see which
>> GO IDs are associated with an exported UniProt ID. The
>> populateUniprotGoTableFromSQL, in its current form, extracts the GO
>> association records that match the given taxon ID then exports, as
>> UniProt-GO pairs, the GO and UniProt IDs referenced within that GO
>> association record. Processing that follows this is then based on the GO
>> IDs that got exported --- and that's how the current code avoids exporting
>> the entire list of GO terms.
>>
>> The operative query is on the second line of
>> populateUniprotGoTableFromSQL:
>>
>> String uniProtAndGOIDSQL = "select db_object_id, go_id,
>> evidence_code, with_or_from from goa where db like '%UniProt%' and taxon =
>> 'taxon:" + taxon + "'";
>>
>> In plain English, this selects the GOA records whose database is UniProt
>> and whose taxon ID is the given taxon. An additional condition is added for
>> the "aspect" (All, Component, Function, or Process) that is to be exported.
>> This is another reduction filter, to further shrink the number of exported
>> GO terms and thus avoid MAPPFinder issues later on.
>>
>> Given this, the proper expansion here is to change the taxon predicate to
>> a multiple predicate. That is, this method can be changed to now accept a
>> collection or array of taxon IDs, and the base query should then be changed
>> so that it accepts any taxon from that collection. More or less, you want:
>>
>> private void populateUniprotGoTableFromSQL(char chosenAspect, int[]
>> taxons) throws SQLException {
>>
>> ...then, instead of the single string, you want to iterate through the
>> taxon IDs:
>>
>> StringBuilder baseQueryBuilder = new StringBuilder("select
>> db_object_id, go_id, evidence_code, with_or_from from goa where db like
>> '%UniProt%'");
>> boolean first = true;
>> for (int taxon: taxons) {
>> baseQueryBuilder.append(first ? " and (" : " or ");
>> baseQueryBuilder
>> .append("taxon = 'taxon:")
>> .append(taxon).append("'");
>> first = false;
>> }
>> baseQueryBuilder.append(")");
>>
>> ...and so on. I just sort of rattled this off so there may be little
>> glitches, but anyway this is just to give you an overall idea.
>>
>> Put another way, no, you do not need to iterate this method for each taxon
>> ID. Instead, you can still call this method once, with the multiplicity of
>> taxon IDs emerging in terms of the actual condition used for selecting the
>> GO terms to be exported (based on the available GOA records, which as you
>> may recall are loaded from .goa files).
>>
>> As a side note, right here you have an opportunity for a little sanity
>> check regarding the content of the relational database: GO terms will only
>> be exported if GOA records for the desired taxon IDs have been imported into
>> the database. So, as a pre-flight check, one can see if there are any GOA
>> records at all for each chosen taxon ID. If there are none, then the .goa
>> file for that species needs to be imported into the relational database.
>>
>> Hope this helps...
>>
>> John David N. Dionisio, PhD
>> Associate Professor, Computer Science
>> Loyola Marymount University
>>
>>
>> On Aug 4, 2011, at 1:00 PM, Kam Dahlquist wrote:
>>
>> > Hi,
>> >
>> > Dondi will have to chime in on this, but I think this is where things
>> are going to get tricky.
>> >
>> > The final gdb does not actually contain the entire GO, it gets trimmed
>> somehow based on the GO associations for a particular species. This is
>> because MAPPFinder cannot handle loading the entire GO. Since there is some
>> type of species-specific trimming going on, it's quite possible that this
>> will need to iterate.
>> >
>> > However, I don't have the foggiest idea of how this works, so Dondi will
>> have to chime in.
>> >
>> > Best,
>> > Kam
>> >
>> > At 12:09 AM 8/4/2011, you wrote:
>> >> Wednesday 8/3/11 progress:
>> >>
>> >> 1. After following the ExportPanel1.java ground zero code of:
>> databaseProfile.setSelectedSpeciesProfile( selectedProfile );
>> >>
>> >> I found the method in DatabaseProfile.java plus a getter method;
>> >> SpeciesProfile setSelectedSpeciesProfile( speciesProfile ) and
>> SpeciesProfile getSelectedSpeciesProfile( speciesProfile )
>> >>
>> >> I created two new methods that each handle List<Object> of
>> SpeciesProfiles argument instead of a single SpeciesProfile;
>> setSelectedSpeciesProfiles and getSelectedSpeciesProfiles.
>> >>
>> >> This enabled the ExportPanel1 ground zero code to become:
>> databaseProfile.setSelectedSpeciesProfiles(selectedSpecies);
>> >>
>> >> 2. public static void export() on line 104 in ExportToGenMAPP.java
>> >>
>> >> On line 107 ExportGoData is instantiated which I found in
>> ExportGoData.java and calls a method: public void export(char chosenAspect,
>> int taxon).
>> >>
>> >> Within export, taxon id is required for another method: private void
>> populateGoTables(char chosenAspect, int taxon).
>> >>
>> >> Within populateGoTables, taxon id is required for another method:
>> private void populateUniprotGoTableFromSQL( char chosenAspect, int taxon).
>> >>
>> >> But, if the export to GDB process starts off with exporting GO data,
>> doesn't it only need to do that once no matter how many species are
>> selected? As you probably realize, I'm leading towards not having to iterate
>> through this for each taxon id if possible.
>> >>
>> >> Also, how does the export actually work? How are GO ids and UniProt ids
>> related within the table?
>> >>
>> >> Thanks!
>> >>
>> >> Richard
>> >>
>> >>
>> > <ATT00001..txt><ATT00002..txt>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
>> The must-attend event for mobile developers. Connect with experts.
>> Get tools for creating Super Apps. See the latest technologies.
>> Sessions, hands-on labs, demos & much more. Register early & save!
>> http://p.sf.net/sfu/rim-blackberry-1
>> _______________________________________________
>> xmlpipedb-developer mailing list
>> xml...@li...
>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>
>
>
|