[XMLPipeDB-developer] Info table

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I think we had better leave the info table with only one record 
(option 2).  The species names can be separated by pipes " | " as 
they are in the Systems table where there are multiple species.  To 
my knowledge, the only time GenMAPP needs to access the info table is 
for the "DisplayOrder" field, I don't know what would happen if there 
were multiple records there. I know that the spec for the table says 
that it should only be one record, but I don't know if it would crash 
if there were multiple records.  To be on the safe side, I think we 
should just keep it to the one record.

Best,
Kam

At 09:45 PM 8/10/2011, John David N. Dionisio wrote:
>Greetings,
>
>I think we have to turn to Dr. Dahlquist's GenMAPP knowledge here to 
>get the definitive answer.  I see two choices:
>
>- The Info table should have one record for each species that the 
>.gdb holds, in which case the change you need is to wrap that single 
>submit call inside a loop, so that submit is called once for each 
>chosen species.
>
>- The Info table should always have one record, and if the .gdb 
>holds multiple species, the "Species" column should be some 
>concatenation of multiple species names.  In this case, you would 
>still call submit only once, but the value you send into the 
>"Species" column is some accumulation of all chosen species names.
>
>Admittedly I don't know which way is right (I assumed the former as 
>of our Tuesday meeting, but on further examination I'm no longer 
>quite so sure).
>
>For Kam --- what does GenMAPP expect to see in the Info table if the 
>opened .gdb contains multiple species?
>
>John David N. Dionisio, PhD
>Associate Professor, Computer Science
>Loyola Marymount University
>
>
>On Aug 10, 2011, at 9:36 PM, Richard Brous wrote:
>
> > OK, continued to review ExportToGenMAPP and dug into the creation 
> of the first TableManager tmA on line 118.
> >
> > In reading through the method, my understanding is that it 
> creates a new TableManager based on the selectedDatabaseProfile 
> (which is UniProt).
> >
> > This is performed by the method getInfoTableManager() which then 
> calls method submit(String tableName, QueryType queryType, 
> String[][] columnNamesToValues);
> >
> > the code is as follows:
> >
> > tableManager.submit("Info", QueryType.insert, new String[][] { { 
> "Owner", owner }, { "Version", new 
> SimpleDateFormat("yyyyMMdd").format(version) }, { "MODSystem", 
> modSystem }, { "Species", speciesProfile.getSpeciesName() }, { 
> "Modify", new SimpleDateFormat("yyyyMMdd").format(modify) }, { 
> "DisplayOrder", displayOrder }, { "Notes", notes } });
> >
> >
> > The modification of this line centers on { "Species", 
> speciesProfile.getSpeciesName() }, since it originally processed a 
> single species.
> >
> > So now I need to populate the arguments with the species 
> contained within selectedDatabaseprofile.selectedSpeciesProfiles.
> >
> > I think I'll start with the baseArgument up to MODSystem, then 
> append as many species as necessary, and then cap off the end with 
> the rest starting at Modify. (similar to your approach in 
> ExportGoData, populateUniprotGoTableFromSQL(char chosenAspect, 
> List<Integer> taxonIds) line 513
> >
> > Please let me know if this approach or analysis is off track.
> >
> > Thanks!
> >
> > Richard
> >
> >
> >
> >
> >
> >
> > On Wed, Aug 10, 2011 at 5:50 PM, Richard Brous <rbr...@gm...> wrote:
> > Updated repository to include all Gene Ontology changes discussed 
> during our meeting yesterday.
> >
> > Digging into TableManager next.
> >
> > Richard
> >
> > On Fri, Aug 5, 2011 at 10:06 AM, Richard Brous <rbr...@gm...> wrote:
> > whew... thanks for the detailed reply. I will digest this a bit 
> and get back to you with further questions.
> >
> > rb
> >
> > On Thu, Aug 4, 2011 at 11:18 PM, John David N. Dionisio 
> <do...@lm...> wrote:
> > Greetings,
> >
> > Sorry for the delay.  I wasn't able to walk through the relevant 
> code until this evening.
> >
> > As Kam said, GOA serves as the link between the UniProt and GO 
> IDs.  It essentially determines which GO IDs get exported by using 
> GOA to see which GO IDs are associated with an exported UniProt 
> ID.  The populateUniprotGoTableFromSQL, in its current form, 
> extracts the GO association records that match the given taxon ID 
> then exports, as UniProt-GO pairs, the GO and UniProt IDs 
> referenced within that GO association record.  Processing that 
> follows this is then based on the GO IDs that got exported --- and 
> that's how the current code avoids exporting the entire list of GO terms.
> >
> > The operative query is on the second line of populateUniprotGoTableFromSQL:
> >
> >        String uniProtAndGOIDSQL = "select db_object_id, go_id, 
> evidence_code, with_or_from from goa where db like '%UniProt%' and 
> taxon = 'taxon:" + taxon + "'";
> >
> > In plain English, this selects the GOA records whose database is 
> UniProt and whose taxon ID is the given taxon.  An additional 
> condition is added for the "aspect" (All, Component, Function, or 
> Process) that is to be exported.  This is another reduction filter, 
> to further shrink the number of exported GO terms and thus avoid 
> MAPPFinder issues later on.
> >
> > Given this, the proper expansion here is to change the taxon 
> predicate to a multiple predicate.  That is, this method can be 
> changed to now accept a collection or array of taxon IDs, and the 
> base query should then be changed so that it accepts any taxon from 
> that collection.  More or less, you want:
> >
> >    private void populateUniprotGoTableFromSQL(char chosenAspect, 
> int[] taxons) throws SQLException {
> >
> > ...then, instead of the single string, you want to iterate 
> through the taxon IDs:
> >
> >    StringBuilder baseQueryBuilder = new StringBuilder("select 
> db_object_id, go_id, evidence_code, with_or_from from goa where db 
> like '%UniProt%'");
> >    boolean first = true;
> >    for (int taxon: taxons) {
> >        baseQueryBuilder.append(first ? " and (" : " or ");
> >        baseQueryBuilder
> >            .append("taxon = 'taxon:")
> >            .append(taxon).append("'");
> >        first = false;
> >    }
> >    baseQueryBuilder.append(")");
> >
> > ...and so on.  I just sort of rattled this off so there may be 
> little glitches, but anyway this is just to give you an overall idea.
> >
> > Put another way, no, you do not need to iterate this method for 
> each taxon ID.  Instead, you can still call this method once, with 
> the multiplicity of taxon IDs emerging in terms of the actual 
> condition used for selecting the GO terms to be exported (based on 
> the available GOA records, which as you may recall are loaded from .goa files).
> >
> > As a side note, right here you have an opportunity for a little 
> sanity check regarding the content of the relational database: GO 
> terms will only be exported if GOA records for the desired taxon 
> IDs have been imported into the database.  So, as a pre-flight 
> check, one can see if there are any GOA records at all for each 
> chosen taxon ID.  If there are none, then the .goa file for that 
> species needs to be imported into the relational database.
> >
> > Hope this helps...
> >
> > John David N. Dionisio, PhD
> > Associate Professor, Computer Science
> > Loyola Marymount University
> >
> >
> > On Aug 4, 2011, at 1:00 PM, Kam Dahlquist wrote:
> >
> > > Hi,
> > >
> > > Dondi will have to chime in on this, but I think this is where 
> things are going to get tricky.
> > >
> > > The final gdb does not actually contain the entire GO, it gets 
> trimmed somehow based on the GO associations for a particular 
> species.  This is because MAPPFinder cannot handle loading the 
> entire GO.  Since there is some type of species-specific trimming 
> going on, it's quite possible that this will need to iterate.
> > >
> > > However, I don't have the foggiest idea of how this works, so 
> Dondi will have to chime in.
> > >
> > > Best,
> > > Kam
> > >
> > > At 12:09 AM 8/4/2011, you wrote:
> > >> Wednesday 8/3/11 progress:
> > >>
> > >> 1. After following the ExportPanel1.java ground zero code of: 
> databaseProfile.setSelectedSpeciesProfile( selectedProfile );
> > >>
> > >> I found the method in DatabaseProfile.java plus a getter method;
> > >> SpeciesProfile setSelectedSpeciesProfile( speciesProfile ) and 
> SpeciesProfile getSelectedSpeciesProfile( speciesProfile )
> > >>
> > >> I created two new methods that each handle List<Object> of 
> SpeciesProfiles argument instead of a single SpeciesProfile; 
> setSelectedSpeciesProfiles and getSelectedSpeciesProfiles.
> > >>
> > >> This enabled the ExportPanel1 ground zero code to become: 
> databaseProfile.setSelectedSpeciesProfiles(selectedSpecies);
> > >>
> > >> 2. public static void export() on line 104 in ExportToGenMAPP.java
> > >>
> > >> On line 107 ExportGoData is instantiated which I found in 
> ExportGoData.java and calls a method: public void export(char 
> chosenAspect, int taxon).
> > >>
> > >> Within export, taxon id is required for another method: 
> private void populateGoTables(char chosenAspect, int taxon).
> > >>
> > >> Within populateGoTables, taxon id is required for another 
> method: private void populateUniprotGoTableFromSQL( char 
> chosenAspect, int taxon).
> > >>
> > >> But, if the export to GDB process starts off with exporting GO 
> data, doesn't it only need to do that once no matter how many 
> species are selected? As you probably realize, I'm leading towards 
> not having to iterate through this for each taxon id if possible.
> > >>
> > >> Also, how does the export actually work? How are GO ids and 
> UniProt ids related within the table?
> > >>
> > >> Thanks!
> > >>
> > >> Richard
> > >>
> > >>
> > > <ATT00001..txt><ATT00002..txt>
> >
> >
> > 
> ------------------------------------------------------------------------------
> > BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
> > The must-attend event for mobile developers. Connect with experts.
> > Get tools for creating Super Apps. See the latest technologies.
> > Sessions, hands-on labs, demos & much more. Register early & save!
> > http://p.sf.net/sfu/rim-blackberry-1
> > _______________________________________________
> > xmlpipedb-developer mailing list
> > xml...@li...
> > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
> >
> >
> >
> > <ATT00001..txt><ATT00002..txt>
>
>
>------------------------------------------------------------------------------
>Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
>user administration capabilities and model configuration. Take
>the hassle out of deploying and managing Subversion and the
>tools developers use with it.
>http://p.sf.net/sfu/wandisco-dev2dev
>_______________________________________________
>xmlpipedb-developer mailing list
>xml...@li...
>https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer