Re: [XMLPipeDB-developer] GenMAPP multitaxon support - CMSI 486T
Brought to you by:
kdahlquist,
zugzugglug
|
From: Richard B. <rbr...@gm...> - 2011-08-05 17:06:28
|
whew... thanks for the detailed reply. I will digest this a bit and get back
to you with further questions.
rb
On Thu, Aug 4, 2011 at 11:18 PM, John David N. Dionisio <do...@lm...>wrote:
> Greetings,
>
> Sorry for the delay. I wasn't able to walk through the relevant code until
> this evening.
>
> As Kam said, GOA serves as the link between the UniProt and GO IDs. It
> essentially determines which GO IDs get exported by using GOA to see which
> GO IDs are associated with an exported UniProt ID. The
> populateUniprotGoTableFromSQL, in its current form, extracts the GO
> association records that match the given taxon ID then exports, as
> UniProt-GO pairs, the GO and UniProt IDs referenced within that GO
> association record. Processing that follows this is then based on the GO
> IDs that got exported --- and that's how the current code avoids exporting
> the entire list of GO terms.
>
> The operative query is on the second line of populateUniprotGoTableFromSQL:
>
> String uniProtAndGOIDSQL = "select db_object_id, go_id,
> evidence_code, with_or_from from goa where db like '%UniProt%' and taxon =
> 'taxon:" + taxon + "'";
>
> In plain English, this selects the GOA records whose database is UniProt
> and whose taxon ID is the given taxon. An additional condition is added for
> the "aspect" (All, Component, Function, or Process) that is to be exported.
> This is another reduction filter, to further shrink the number of exported
> GO terms and thus avoid MAPPFinder issues later on.
>
> Given this, the proper expansion here is to change the taxon predicate to a
> multiple predicate. That is, this method can be changed to now accept a
> collection or array of taxon IDs, and the base query should then be changed
> so that it accepts any taxon from that collection. More or less, you want:
>
> private void populateUniprotGoTableFromSQL(char chosenAspect, int[]
> taxons) throws SQLException {
>
> ...then, instead of the single string, you want to iterate through the
> taxon IDs:
>
> StringBuilder baseQueryBuilder = new StringBuilder("select db_object_id,
> go_id, evidence_code, with_or_from from goa where db like '%UniProt%'");
> boolean first = true;
> for (int taxon: taxons) {
> baseQueryBuilder.append(first ? " and (" : " or ");
> baseQueryBuilder
> .append("taxon = 'taxon:")
> .append(taxon).append("'");
> first = false;
> }
> baseQueryBuilder.append(")");
>
> ...and so on. I just sort of rattled this off so there may be little
> glitches, but anyway this is just to give you an overall idea.
>
> Put another way, no, you do not need to iterate this method for each taxon
> ID. Instead, you can still call this method once, with the multiplicity of
> taxon IDs emerging in terms of the actual condition used for selecting the
> GO terms to be exported (based on the available GOA records, which as you
> may recall are loaded from .goa files).
>
> As a side note, right here you have an opportunity for a little sanity
> check regarding the content of the relational database: GO terms will only
> be exported if GOA records for the desired taxon IDs have been imported into
> the database. So, as a pre-flight check, one can see if there are any GOA
> records at all for each chosen taxon ID. If there are none, then the .goa
> file for that species needs to be imported into the relational database.
>
> Hope this helps...
>
> John David N. Dionisio, PhD
> Associate Professor, Computer Science
> Loyola Marymount University
>
>
> On Aug 4, 2011, at 1:00 PM, Kam Dahlquist wrote:
>
> > Hi,
> >
> > Dondi will have to chime in on this, but I think this is where things are
> going to get tricky.
> >
> > The final gdb does not actually contain the entire GO, it gets trimmed
> somehow based on the GO associations for a particular species. This is
> because MAPPFinder cannot handle loading the entire GO. Since there is some
> type of species-specific trimming going on, it's quite possible that this
> will need to iterate.
> >
> > However, I don't have the foggiest idea of how this works, so Dondi will
> have to chime in.
> >
> > Best,
> > Kam
> >
> > At 12:09 AM 8/4/2011, you wrote:
> >> Wednesday 8/3/11 progress:
> >>
> >> 1. After following the ExportPanel1.java ground zero code of:
> databaseProfile.setSelectedSpeciesProfile( selectedProfile );
> >>
> >> I found the method in DatabaseProfile.java plus a getter method;
> >> SpeciesProfile setSelectedSpeciesProfile( speciesProfile ) and
> SpeciesProfile getSelectedSpeciesProfile( speciesProfile )
> >>
> >> I created two new methods that each handle List<Object> of
> SpeciesProfiles argument instead of a single SpeciesProfile;
> setSelectedSpeciesProfiles and getSelectedSpeciesProfiles.
> >>
> >> This enabled the ExportPanel1 ground zero code to become:
> databaseProfile.setSelectedSpeciesProfiles(selectedSpecies);
> >>
> >> 2. public static void export() on line 104 in ExportToGenMAPP.java
> >>
> >> On line 107 ExportGoData is instantiated which I found in
> ExportGoData.java and calls a method: public void export(char chosenAspect,
> int taxon).
> >>
> >> Within export, taxon id is required for another method: private void
> populateGoTables(char chosenAspect, int taxon).
> >>
> >> Within populateGoTables, taxon id is required for another method:
> private void populateUniprotGoTableFromSQL( char chosenAspect, int taxon).
> >>
> >> But, if the export to GDB process starts off with exporting GO data,
> doesn't it only need to do that once no matter how many species are
> selected? As you probably realize, I'm leading towards not having to iterate
> through this for each taxon id if possible.
> >>
> >> Also, how does the export actually work? How are GO ids and UniProt ids
> related within the table?
> >>
> >> Thanks!
> >>
> >> Richard
> >>
> >>
> > <ATT00001..txt><ATT00002..txt>
>
>
>
> ------------------------------------------------------------------------------
> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
> The must-attend event for mobile developers. Connect with experts.
> Get tools for creating Super Apps. See the latest technologies.
> Sessions, hands-on labs, demos & much more. Register early & save!
> http://p.sf.net/sfu/rim-blackberry-1
> _______________________________________________
> xmlpipedb-developer mailing list
> xml...@li...
> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>
|