Re: [XMLPipeDB-developer] GenMAPP multitaxon support - CMSI 486T
Brought to you by:
kdahlquist,
zugzugglug
From: Richard B. <rbr...@gm...> - 2011-08-11 00:51:44
|
Updated repository to include all Gene Ontology changes discussed during our meeting yesterday. Digging into TableManager next. Richard On Fri, Aug 5, 2011 at 10:06 AM, Richard Brous <rbr...@gm...> wrote: > whew... thanks for the detailed reply. I will digest this a bit and get > back to you with further questions. > > rb > > On Thu, Aug 4, 2011 at 11:18 PM, John David N. Dionisio <do...@lm...>wrote: > >> Greetings, >> >> Sorry for the delay. I wasn't able to walk through the relevant code >> until this evening. >> >> As Kam said, GOA serves as the link between the UniProt and GO IDs. It >> essentially determines which GO IDs get exported by using GOA to see which >> GO IDs are associated with an exported UniProt ID. The >> populateUniprotGoTableFromSQL, in its current form, extracts the GO >> association records that match the given taxon ID then exports, as >> UniProt-GO pairs, the GO and UniProt IDs referenced within that GO >> association record. Processing that follows this is then based on the GO >> IDs that got exported --- and that's how the current code avoids exporting >> the entire list of GO terms. >> >> The operative query is on the second line of >> populateUniprotGoTableFromSQL: >> >> String uniProtAndGOIDSQL = "select db_object_id, go_id, >> evidence_code, with_or_from from goa where db like '%UniProt%' and taxon = >> 'taxon:" + taxon + "'"; >> >> In plain English, this selects the GOA records whose database is UniProt >> and whose taxon ID is the given taxon. An additional condition is added for >> the "aspect" (All, Component, Function, or Process) that is to be exported. >> This is another reduction filter, to further shrink the number of exported >> GO terms and thus avoid MAPPFinder issues later on. >> >> Given this, the proper expansion here is to change the taxon predicate to >> a multiple predicate. That is, this method can be changed to now accept a >> collection or array of taxon IDs, and the base query should then be changed >> so that it accepts any taxon from that collection. More or less, you want: >> >> private void populateUniprotGoTableFromSQL(char chosenAspect, int[] >> taxons) throws SQLException { >> >> ...then, instead of the single string, you want to iterate through the >> taxon IDs: >> >> StringBuilder baseQueryBuilder = new StringBuilder("select >> db_object_id, go_id, evidence_code, with_or_from from goa where db like >> '%UniProt%'"); >> boolean first = true; >> for (int taxon: taxons) { >> baseQueryBuilder.append(first ? " and (" : " or "); >> baseQueryBuilder >> .append("taxon = 'taxon:") >> .append(taxon).append("'"); >> first = false; >> } >> baseQueryBuilder.append(")"); >> >> ...and so on. I just sort of rattled this off so there may be little >> glitches, but anyway this is just to give you an overall idea. >> >> Put another way, no, you do not need to iterate this method for each taxon >> ID. Instead, you can still call this method once, with the multiplicity of >> taxon IDs emerging in terms of the actual condition used for selecting the >> GO terms to be exported (based on the available GOA records, which as you >> may recall are loaded from .goa files). >> >> As a side note, right here you have an opportunity for a little sanity >> check regarding the content of the relational database: GO terms will only >> be exported if GOA records for the desired taxon IDs have been imported into >> the database. So, as a pre-flight check, one can see if there are any GOA >> records at all for each chosen taxon ID. If there are none, then the .goa >> file for that species needs to be imported into the relational database. >> >> Hope this helps... >> >> John David N. Dionisio, PhD >> Associate Professor, Computer Science >> Loyola Marymount University >> >> >> On Aug 4, 2011, at 1:00 PM, Kam Dahlquist wrote: >> >> > Hi, >> > >> > Dondi will have to chime in on this, but I think this is where things >> are going to get tricky. >> > >> > The final gdb does not actually contain the entire GO, it gets trimmed >> somehow based on the GO associations for a particular species. This is >> because MAPPFinder cannot handle loading the entire GO. Since there is some >> type of species-specific trimming going on, it's quite possible that this >> will need to iterate. >> > >> > However, I don't have the foggiest idea of how this works, so Dondi will >> have to chime in. >> > >> > Best, >> > Kam >> > >> > At 12:09 AM 8/4/2011, you wrote: >> >> Wednesday 8/3/11 progress: >> >> >> >> 1. After following the ExportPanel1.java ground zero code of: >> databaseProfile.setSelectedSpeciesProfile( selectedProfile ); >> >> >> >> I found the method in DatabaseProfile.java plus a getter method; >> >> SpeciesProfile setSelectedSpeciesProfile( speciesProfile ) and >> SpeciesProfile getSelectedSpeciesProfile( speciesProfile ) >> >> >> >> I created two new methods that each handle List<Object> of >> SpeciesProfiles argument instead of a single SpeciesProfile; >> setSelectedSpeciesProfiles and getSelectedSpeciesProfiles. >> >> >> >> This enabled the ExportPanel1 ground zero code to become: >> databaseProfile.setSelectedSpeciesProfiles(selectedSpecies); >> >> >> >> 2. public static void export() on line 104 in ExportToGenMAPP.java >> >> >> >> On line 107 ExportGoData is instantiated which I found in >> ExportGoData.java and calls a method: public void export(char chosenAspect, >> int taxon). >> >> >> >> Within export, taxon id is required for another method: private void >> populateGoTables(char chosenAspect, int taxon). >> >> >> >> Within populateGoTables, taxon id is required for another method: >> private void populateUniprotGoTableFromSQL( char chosenAspect, int taxon). >> >> >> >> But, if the export to GDB process starts off with exporting GO data, >> doesn't it only need to do that once no matter how many species are >> selected? As you probably realize, I'm leading towards not having to iterate >> through this for each taxon id if possible. >> >> >> >> Also, how does the export actually work? How are GO ids and UniProt ids >> related within the table? >> >> >> >> Thanks! >> >> >> >> Richard >> >> >> >> >> > <ATT00001..txt><ATT00002..txt> >> >> >> >> ------------------------------------------------------------------------------ >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >> The must-attend event for mobile developers. Connect with experts. >> Get tools for creating Super Apps. See the latest technologies. >> Sessions, hands-on labs, demos & much more. Register early & save! >> http://p.sf.net/sfu/rim-blackberry-1 >> _______________________________________________ >> xmlpipedb-developer mailing list >> xml...@li... >> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> > > |