Re: [XMLPipeDB-developer] GenMAPP multitaxon support - CMSI 486T
Brought to you by:
kdahlquist,
zugzugglug
From: Richard B. <rbr...@gm...> - 2011-08-05 17:06:28
|
whew... thanks for the detailed reply. I will digest this a bit and get back to you with further questions. rb On Thu, Aug 4, 2011 at 11:18 PM, John David N. Dionisio <do...@lm...>wrote: > Greetings, > > Sorry for the delay. I wasn't able to walk through the relevant code until > this evening. > > As Kam said, GOA serves as the link between the UniProt and GO IDs. It > essentially determines which GO IDs get exported by using GOA to see which > GO IDs are associated with an exported UniProt ID. The > populateUniprotGoTableFromSQL, in its current form, extracts the GO > association records that match the given taxon ID then exports, as > UniProt-GO pairs, the GO and UniProt IDs referenced within that GO > association record. Processing that follows this is then based on the GO > IDs that got exported --- and that's how the current code avoids exporting > the entire list of GO terms. > > The operative query is on the second line of populateUniprotGoTableFromSQL: > > String uniProtAndGOIDSQL = "select db_object_id, go_id, > evidence_code, with_or_from from goa where db like '%UniProt%' and taxon = > 'taxon:" + taxon + "'"; > > In plain English, this selects the GOA records whose database is UniProt > and whose taxon ID is the given taxon. An additional condition is added for > the "aspect" (All, Component, Function, or Process) that is to be exported. > This is another reduction filter, to further shrink the number of exported > GO terms and thus avoid MAPPFinder issues later on. > > Given this, the proper expansion here is to change the taxon predicate to a > multiple predicate. That is, this method can be changed to now accept a > collection or array of taxon IDs, and the base query should then be changed > so that it accepts any taxon from that collection. More or less, you want: > > private void populateUniprotGoTableFromSQL(char chosenAspect, int[] > taxons) throws SQLException { > > ...then, instead of the single string, you want to iterate through the > taxon IDs: > > StringBuilder baseQueryBuilder = new StringBuilder("select db_object_id, > go_id, evidence_code, with_or_from from goa where db like '%UniProt%'"); > boolean first = true; > for (int taxon: taxons) { > baseQueryBuilder.append(first ? " and (" : " or "); > baseQueryBuilder > .append("taxon = 'taxon:") > .append(taxon).append("'"); > first = false; > } > baseQueryBuilder.append(")"); > > ...and so on. I just sort of rattled this off so there may be little > glitches, but anyway this is just to give you an overall idea. > > Put another way, no, you do not need to iterate this method for each taxon > ID. Instead, you can still call this method once, with the multiplicity of > taxon IDs emerging in terms of the actual condition used for selecting the > GO terms to be exported (based on the available GOA records, which as you > may recall are loaded from .goa files). > > As a side note, right here you have an opportunity for a little sanity > check regarding the content of the relational database: GO terms will only > be exported if GOA records for the desired taxon IDs have been imported into > the database. So, as a pre-flight check, one can see if there are any GOA > records at all for each chosen taxon ID. If there are none, then the .goa > file for that species needs to be imported into the relational database. > > Hope this helps... > > John David N. Dionisio, PhD > Associate Professor, Computer Science > Loyola Marymount University > > > On Aug 4, 2011, at 1:00 PM, Kam Dahlquist wrote: > > > Hi, > > > > Dondi will have to chime in on this, but I think this is where things are > going to get tricky. > > > > The final gdb does not actually contain the entire GO, it gets trimmed > somehow based on the GO associations for a particular species. This is > because MAPPFinder cannot handle loading the entire GO. Since there is some > type of species-specific trimming going on, it's quite possible that this > will need to iterate. > > > > However, I don't have the foggiest idea of how this works, so Dondi will > have to chime in. > > > > Best, > > Kam > > > > At 12:09 AM 8/4/2011, you wrote: > >> Wednesday 8/3/11 progress: > >> > >> 1. After following the ExportPanel1.java ground zero code of: > databaseProfile.setSelectedSpeciesProfile( selectedProfile ); > >> > >> I found the method in DatabaseProfile.java plus a getter method; > >> SpeciesProfile setSelectedSpeciesProfile( speciesProfile ) and > SpeciesProfile getSelectedSpeciesProfile( speciesProfile ) > >> > >> I created two new methods that each handle List<Object> of > SpeciesProfiles argument instead of a single SpeciesProfile; > setSelectedSpeciesProfiles and getSelectedSpeciesProfiles. > >> > >> This enabled the ExportPanel1 ground zero code to become: > databaseProfile.setSelectedSpeciesProfiles(selectedSpecies); > >> > >> 2. public static void export() on line 104 in ExportToGenMAPP.java > >> > >> On line 107 ExportGoData is instantiated which I found in > ExportGoData.java and calls a method: public void export(char chosenAspect, > int taxon). > >> > >> Within export, taxon id is required for another method: private void > populateGoTables(char chosenAspect, int taxon). > >> > >> Within populateGoTables, taxon id is required for another method: > private void populateUniprotGoTableFromSQL( char chosenAspect, int taxon). > >> > >> But, if the export to GDB process starts off with exporting GO data, > doesn't it only need to do that once no matter how many species are > selected? As you probably realize, I'm leading towards not having to iterate > through this for each taxon id if possible. > >> > >> Also, how does the export actually work? How are GO ids and UniProt ids > related within the table? > >> > >> Thanks! > >> > >> Richard > >> > >> > > <ATT00001..txt><ATT00002..txt> > > > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > The must-attend event for mobile developers. Connect with experts. > Get tools for creating Super Apps. See the latest technologies. > Sessions, hands-on labs, demos & much more. Register early & save! > http://p.sf.net/sfu/rim-blackberry-1 > _______________________________________________ > xmlpipedb-developer mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > |