Re: [XMLPipeDB-developer] Info table
Brought to you by:
kdahlquist,
zugzugglug
|
From: Richard B. <rbr...@gm...> - 2011-08-17 04:12:56
|
Submitted more changes to SourceForge: changes to ExportToGenMAPP,
DatabaseProfile and UniProtDatabaseProfile
interim change to getRelationsManager() to see first species in
selectedSpeciesProfiles<SpeciesProfile>
changed getSystemsTableManager() to iterate through each SpeciesProfile in
selectedSpeciesProfiles<SpeciesProfile>
interim change to getPrimarySystemTableManager to see first species in
selectedSpeciesProfiles<SpeciesProfile>
Compiles and runs. Will do a test export of Msmeg now and let it run while I
sleep.
Goodnight from the east coast!
Richard
On Tue, Aug 16, 2011 at 3:41 PM, John David N. Dionisio <do...@lm...>wrote:
> Yes, sounds right. Each species profile would then, in turn, call "submit"
> on the given table manager for each ID that should be exported. Presumably
> there is a taxon ID included with each record, so that when all is said and
> done you'll have the IDs of each chosen species, and you can tell which ID
> belongs to which species.
>
> If this goes well, your export should now have the Info table with the
> concatenated species names, with the system tables (UniProt, GeneID,
> OrderedLocus, etc.) having the IDs of the multiple species. The
> relationship tables should have just the IDs of the first species in the
> collection, per our gradual refactoring strategy. Presumably you can guess
> how the relationship tables should look once you have tackled that part of
> the code :)
>
> The QA check here, I would say, is to then match the ID counts *per
> species* against the equivalent counts in PostgreSQL (total count will not
> work now, since the user need not necessarily have chosen all of the species
> loaded in the PostgreSQL database). TallyEngine tweaks may be necessary
> after this; but we will not worry about that now. In the meantime, manual
> counts (i.e., issuing the queries manually) should do.
>
> Formulating those SQL queries will require an understanding of grouping and
> aggregate functions --- a fine exercise that is relevant to the database
> content of the course, I would say :)
>
> John David N. Dionisio, PhD
> Associate Professor, Computer Science
> Loyola Marymount University
>
>
>
> On Aug 16, 2011, at 3:32 PM, Richard Brous wrote:
>
> > OK, so its pretty much the last line of getSystemsTableManager() which
> needs to be modified to be aware of the collection of SpeciesProfile:
> >
> > tableManager =
> >
> > speciesProfile.getSystemsTableManagerCustomizations(tableManager, this);
> >
> > return tableManager;
> >
> > So I believe I need to iterate through each species of the collection
> before returning the final tableManager.
> >
> > I would enclose it in a for loop ( SpeciesProfile speciesProfile :
> speciesProfiles ) {
> >
> > }
> >
> > This would seem to be the correct implementation to ensure that each
> species customization would be applied.
> >
> >
> > Sound good??
> >
> > Richard
> >
> >
> >
> > On Tue, Aug 16, 2011 at 12:38 PM, Richard Brous <rbr...@gm...>
> wrote:
> > gotcha, heading to system tables.
> >
> > gotcha.
> >
> > gotcha.
> >
> > Thanks for the speedy reply =D
> >
> > Richard
> >
> > On Tue, Aug 16, 2011 at 12:21 PM, John David N. Dionisio <do...@lm...>
> wrote:
> > Hi Rich,
> >
> > Yeah, relations is pretty complicated. I suggest you skip that and go to
> just the system tables for now (i.e., the tables that hold the IDs
> themselves). The relationship tables hold pairs of IDs (i.e., which one
> from one system corresponds to the ID from another) and so is an additional
> level of complexity, I think.
> >
> > Regarding your prior question about the Info table --- I'm not or'ing the
> species names. I'm merely concatenating the names with pipes ("|") in
> between, as Kam specified. While the final value may look like an "or," it
> is ultimately just a string. The "|" could just have easily been a comma,
> semicolon, or other separator.
> >
> > Your prior guess as to how that would look, with the multiple {
> "Species", speciesName } pairs, would actually be equivalent to multiple
> values for the single Species column, which does not fit the relational
> model. The Species column can have only one value, and in this case it is a
> single string that is the concatenation of all selected species names,
> separated by "|" characters.
> >
> > Hope that clears things up...
> >
> > John David N. Dionisio, PhD
> > Associate Professor, Computer Science
> > Loyola Marymount University
> >
> >
> >
> > On Aug 16, 2011, at 12:11 PM, Richard Brous wrote:
> >
> > > OK, have moved onto getRelationsTableManager() ... this one seems
> pretty complicated...
> > >
> > > Have reviewed the method and submethods called and have the following
> questions:
> > >
> > > 1. What are the RelationshipTables that are stored in
> relationshipTables?
> > > aren't they species dependent? Can't seem to find where they were
> created?
> > > If not, then:
> > >
> > > 2. How should the if/else conditional be handled?
> > >
> > > if
> > >
> > >
> (speciesProfile.getSpeciesSpecificSystemTables().containsKey(stp.systemTable1)
> |
> > >
> > >
> speciesProfile.getSpeciesSpecificSystemTables().containsKey(stp.systemTable2))
> {
> > > tableManager =
> > >
> > > speciesProfile.getRelationsTableManagerCustomizations(stp.systemTable1,
> > > stp.
> > >
> > > systemTable2, templateDefinedSystemToSystemCode, tableManager);
> > > This obviously needs to be made aware of Lists of SpeciesProfiles... to
> build the correct tableManager, do I need this conditional to run through
> every species or should we enclose all the code (if and else) within a loop
> through each species?
> > >
> > > Thanks.
> > >
> > > Richard
> > >
> > >
> > > On Tue, Aug 16, 2011 at 11:02 AM, Richard Brous <rbr...@gm...>
> wrote:
> > > Sorry I typed and sent that out too quickly before getting my thoughts
> completely together...
> > >
> > > First off i thought that the array[][] portion of the submit needed to
> be in the following format: { "Species", speciesName1 }, { "Species",
> speciesName2 }, { "Species", speciesName3 },...
> > > but I see you specified it as: { "Species", speciesName1 | speciesName2
> | speciesName3 }, ... OR'ing each of the species names?
> > >
> > > I see what you did regarding the date object... yes no reason to
> recreate a second object...
> > >
> > > As mentioned previously I will continue through the next methods,
> submitting (well thought out) changes and emailing out updates...
> > >
> > > Richard
> > > On Tue, Aug 16, 2011 at 10:36 AM, Richard Brous <rbr...@gm...>
> wrote:
> > > Took a look at your code and realized what I had done wrong. I should
> have just broken out the loop as the second argument of the 3 in the submit
> method. That was a rookie move on my part.
> > >
> > > I am moving on to the next method and will keep my mind on the
> syntactic solution that I'm not thinking through prior to moving forward
> with an implementation.
> > >
> > > Richard
> > >
> > > On Mon, Aug 15, 2011 at 1:30 AM, John David N. Dionisio <do...@lm...>
> wrote:
> > > OK, everything is committed. I got up to the tweaks on the second
> panel in the wizard (Save As/GO Aspects), as well as the query change for
> exporting any combination of C, F, or P.
> > >
> > > I still have to do the GO OBO format check, plus the UI work on the
> remaining two wizard panels. But meanwhile, hope these latest changes work
> out well.
> > >
> > > John David N. Dionisio, PhD
> > > Associate Professor, Computer Science
> > > Loyola Marymount University
> > >
> > >
> > >
> > > On Aug 14, 2011, at 7:31 PM, Richard Brous wrote:
> > >
> > > > OK, using option 2 I have made changes to DatabaseProfile.java to
> allow for all species names to be included in the submit argument.
> > > > But I'm stuck on how to change my StringBuilder object to a type that
> submit wants. Help please!!
> > > >
> > > > Submitted the above and some comment changes to SourceForge this
> evening.
> > > >
> > > > Richard
> > > >
> > > > On Fri, Aug 12, 2011 at 12:08 PM, Kam Dahlquist <kda...@lm...>
> wrote:
> > > > Hi,
> > > >
> > > > I think we had better leave the info table with only one record
> > > > (option 2). The species names can be separated by pipes " | " as
> > > > they are in the Systems table where there are multiple species. To
> > > > my knowledge, the only time GenMAPP needs to access the info table is
> > > > for the "DisplayOrder" field, I don't know what would happen if there
> > > > were multiple records there. I know that the spec for the table says
> > > > that it should only be one record, but I don't know if it would crash
> > > > if there were multiple records. To be on the safe side, I think we
> > > > should just keep it to the one record.
> > > >
> > > > Best,
> > > > Kam
> > > >
> > > > At 09:45 PM 8/10/2011, John David N. Dionisio wrote:
> > > > >Greetings,
> > > > >
> > > > >I think we have to turn to Dr. Dahlquist's GenMAPP knowledge here to
> > > > >get the definitive answer. I see two choices:
> > > > >
> > > > >- The Info table should have one record for each species that the
> > > > >.gdb holds, in which case the change you need is to wrap that single
> > > > >submit call inside a loop, so that submit is called once for each
> > > > >chosen species.
> > > > >
> > > > >- The Info table should always have one record, and if the .gdb
> > > > >holds multiple species, the "Species" column should be some
> > > > >concatenation of multiple species names. In this case, you would
> > > > >still call submit only once, but the value you send into the
> > > > >"Species" column is some accumulation of all chosen species names.
> > > > >
> > > > >Admittedly I don't know which way is right (I assumed the former as
> > > > >of our Tuesday meeting, but on further examination I'm no longer
> > > > >quite so sure).
> > > > >
> > > > >For Kam --- what does GenMAPP expect to see in the Info table if the
> > > > >opened .gdb contains multiple species?
> > > > >
> > > > >John David N. Dionisio, PhD
> > > > >Associate Professor, Computer Science
> > > > >Loyola Marymount University
> > > > >
> > > > >
> > > > >On Aug 10, 2011, at 9:36 PM, Richard Brous wrote:
> > > > >
> > > > > > OK, continued to review ExportToGenMAPP and dug into the creation
> > > > > of the first TableManager tmA on line 118.
> > > > > >
> > > > > > In reading through the method, my understanding is that it
> > > > > creates a new TableManager based on the selectedDatabaseProfile
> > > > > (which is UniProt).
> > > > > >
> > > > > > This is performed by the method getInfoTableManager() which then
> > > > > calls method submit(String tableName, QueryType queryType,
> > > > > String[][] columnNamesToValues);
> > > > > >
> > > > > > the code is as follows:
> > > > > >
> > > > > > tableManager.submit("Info", QueryType.insert, new String[][] { {
> > > > > "Owner", owner }, { "Version", new
> > > > > SimpleDateFormat("yyyyMMdd").format(version) }, { "MODSystem",
> > > > > modSystem }, { "Species", speciesProfile.getSpeciesName() }, {
> > > > > "Modify", new SimpleDateFormat("yyyyMMdd").format(modify) }, {
> > > > > "DisplayOrder", displayOrder }, { "Notes", notes } });
> > > > > >
> > > > > >
> > > > > > The modification of this line centers on { "Species",
> > > > > speciesProfile.getSpeciesName() }, since it originally processed a
> > > > > single species.
> > > > > >
> > > > > > So now I need to populate the arguments with the species
> > > > > contained within selectedDatabaseprofile.selectedSpeciesProfiles.
> > > > > >
> > > > > > I think I'll start with the baseArgument up to MODSystem, then
> > > > > append as many species as necessary, and then cap off the end with
> > > > > the rest starting at Modify. (similar to your approach in
> > > > > ExportGoData, populateUniprotGoTableFromSQL(char chosenAspect,
> > > > > List<Integer> taxonIds) line 513
> > > > > >
> > > > > > Please let me know if this approach or analysis is off track.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Richard
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 10, 2011 at 5:50 PM, Richard Brous <
> rbr...@gm...> wrote:
> > > > > > Updated repository to include all Gene Ontology changes discussed
> > > > > during our meeting yesterday.
> > > > > >
> > > > > > Digging into TableManager next.
> > > > > >
> > > > > > Richard
> > > > > >
> > > > > > On Fri, Aug 5, 2011 at 10:06 AM, Richard Brous <
> rbr...@gm...> wrote:
> > > > > > whew... thanks for the detailed reply. I will digest this a bit
> > > > > and get back to you with further questions.
> > > > > >
> > > > > > rb
> > > > > >
> > > > > > On Thu, Aug 4, 2011 at 11:18 PM, John David N. Dionisio
> > > > > <do...@lm...> wrote:
> > > > > > Greetings,
> > > > > >
> > > > > > Sorry for the delay. I wasn't able to walk through the relevant
> > > > > code until this evening.
> > > > > >
> > > > > > As Kam said, GOA serves as the link between the UniProt and GO
> > > > > IDs. It essentially determines which GO IDs get exported by using
> > > > > GOA to see which GO IDs are associated with an exported UniProt
> > > > > ID. The populateUniprotGoTableFromSQL, in its current form,
> > > > > extracts the GO association records that match the given taxon ID
> > > > > then exports, as UniProt-GO pairs, the GO and UniProt IDs
> > > > > referenced within that GO association record. Processing that
> > > > > follows this is then based on the GO IDs that got exported --- and
> > > > > that's how the current code avoids exporting the entire list of GO
> terms.
> > > > > >
> > > > > > The operative query is on the second line of
> populateUniprotGoTableFromSQL:
> > > > > >
> > > > > > String uniProtAndGOIDSQL = "select db_object_id, go_id,
> > > > > evidence_code, with_or_from from goa where db like '%UniProt%' and
> > > > > taxon = 'taxon:" + taxon + "'";
> > > > > >
> > > > > > In plain English, this selects the GOA records whose database is
> > > > > UniProt and whose taxon ID is the given taxon. An additional
> > > > > condition is added for the "aspect" (All, Component, Function, or
> > > > > Process) that is to be exported. This is another reduction filter,
> > > > > to further shrink the number of exported GO terms and thus avoid
> > > > > MAPPFinder issues later on.
> > > > > >
> > > > > > Given this, the proper expansion here is to change the taxon
> > > > > predicate to a multiple predicate. That is, this method can be
> > > > > changed to now accept a collection or array of taxon IDs, and the
> > > > > base query should then be changed so that it accepts any taxon from
> > > > > that collection. More or less, you want:
> > > > > >
> > > > > > private void populateUniprotGoTableFromSQL(char chosenAspect,
> > > > > int[] taxons) throws SQLException {
> > > > > >
> > > > > > ...then, instead of the single string, you want to iterate
> > > > > through the taxon IDs:
> > > > > >
> > > > > > StringBuilder baseQueryBuilder = new StringBuilder("select
> > > > > db_object_id, go_id, evidence_code, with_or_from from goa where db
> > > > > like '%UniProt%'");
> > > > > > boolean first = true;
> > > > > > for (int taxon: taxons) {
> > > > > > baseQueryBuilder.append(first ? " and (" : " or ");
> > > > > > baseQueryBuilder
> > > > > > .append("taxon = 'taxon:")
> > > > > > .append(taxon).append("'");
> > > > > > first = false;
> > > > > > }
> > > > > > baseQueryBuilder.append(")");
> > > > > >
> > > > > > ...and so on. I just sort of rattled this off so there may be
> > > > > little glitches, but anyway this is just to give you an overall
> idea.
> > > > > >
> > > > > > Put another way, no, you do not need to iterate this method for
> > > > > each taxon ID. Instead, you can still call this method once, with
> > > > > the multiplicity of taxon IDs emerging in terms of the actual
> > > > > condition used for selecting the GO terms to be exported (based on
> > > > > the available GOA records, which as you may recall are loaded from
> .goa files).
> > > > > >
> > > > > > As a side note, right here you have an opportunity for a little
> > > > > sanity check regarding the content of the relational database: GO
> > > > > terms will only be exported if GOA records for the desired taxon
> > > > > IDs have been imported into the database. So, as a pre-flight
> > > > > check, one can see if there are any GOA records at all for each
> > > > > chosen taxon ID. If there are none, then the .goa file for that
> > > > > species needs to be imported into the relational database.
> > > > > >
> > > > > > Hope this helps...
> > > > > >
> > > > > > John David N. Dionisio, PhD
> > > > > > Associate Professor, Computer Science
> > > > > > Loyola Marymount University
> > > > > >
> > > > > >
> > > > > > On Aug 4, 2011, at 1:00 PM, Kam Dahlquist wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Dondi will have to chime in on this, but I think this is where
> > > > > things are going to get tricky.
> > > > > > >
> > > > > > > The final gdb does not actually contain the entire GO, it gets
> > > > > trimmed somehow based on the GO associations for a particular
> > > > > species. This is because MAPPFinder cannot handle loading the
> > > > > entire GO. Since there is some type of species-specific trimming
> > > > > going on, it's quite possible that this will need to iterate.
> > > > > > >
> > > > > > > However, I don't have the foggiest idea of how this works, so
> > > > > Dondi will have to chime in.
> > > > > > >
> > > > > > > Best,
> > > > > > > Kam
> > > > > > >
> > > > > > > At 12:09 AM 8/4/2011, you wrote:
> > > > > > >> Wednesday 8/3/11 progress:
> > > > > > >>
> > > > > > >> 1. After following the ExportPanel1.java ground zero code of:
> > > > > databaseProfile.setSelectedSpeciesProfile( selectedProfile );
> > > > > > >>
> > > > > > >> I found the method in DatabaseProfile.java plus a getter
> method;
> > > > > > >> SpeciesProfile setSelectedSpeciesProfile( speciesProfile ) and
> > > > > SpeciesProfile getSelectedSpeciesProfile( speciesProfile )
> > > > > > >>
> > > > > > >> I created two new methods that each handle List<Object> of
> > > > > SpeciesProfiles argument instead of a single SpeciesProfile;
> > > > > setSelectedSpeciesProfiles and getSelectedSpeciesProfiles.
> > > > > > >>
> > > > > > >> This enabled the ExportPanel1 ground zero code to become:
> > > > > databaseProfile.setSelectedSpeciesProfiles(selectedSpecies);
> > > > > > >>
> > > > > > >> 2. public static void export() on line 104 in
> ExportToGenMAPP.java
> > > > > > >>
> > > > > > >> On line 107 ExportGoData is instantiated which I found in
> > > > > ExportGoData.java and calls a method: public void export(char
> > > > > chosenAspect, int taxon).
> > > > > > >>
> > > > > > >> Within export, taxon id is required for another method:
> > > > > private void populateGoTables(char chosenAspect, int taxon).
> > > > > > >>
> > > > > > >> Within populateGoTables, taxon id is required for another
> > > > > method: private void populateUniprotGoTableFromSQL( char
> > > > > chosenAspect, int taxon).
> > > > > > >>
> > > > > > >> But, if the export to GDB process starts off with exporting GO
> > > > > data, doesn't it only need to do that once no matter how many
> > > > > species are selected? As you probably realize, I'm leading towards
> > > > > not having to iterate through this for each taxon id if possible.
> > > > > > >>
> > > > > > >> Also, how does the export actually work? How are GO ids and
> > > > > UniProt ids related within the table?
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >>
> > > > > > >> Richard
> > > > > > >>
> > > > > > >>
> > > > > > > <ATT00001..txt><ATT00002..txt>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> ------------------------------------------------------------------------------
> > > > > > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
> > > > > > The must-attend event for mobile developers. Connect with
> experts.
> > > > > > Get tools for creating Super Apps. See the latest technologies.
> > > > > > Sessions, hands-on labs, demos & much more. Register early &
> save!
> > > > > > http://p.sf.net/sfu/rim-blackberry-1
> > > > > > _______________________________________________
> > > > > > xmlpipedb-developer mailing list
> > > > > > xml...@li...
> > > > > > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
> > > > > >
> > > > > >
> > > > > >
> > > > > > <ATT00001..txt><ATT00002..txt>
> > > > >
> > > > >
> > > >
> >------------------------------------------------------------------------------
> > > > >Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> > > > >user administration capabilities and model configuration. Take
> > > > >the hassle out of deploying and managing Subversion and the
> > > > >tools developers use with it.
> > > > >http://p.sf.net/sfu/wandisco-dev2dev
> > > > >_______________________________________________
> > > > >xmlpipedb-developer mailing list
> > > > >xml...@li...
> > > > >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
> > > >
> > > >
> > > >
> ------------------------------------------------------------------------------
> > > > FREE DOWNLOAD - uberSVN with Social Coding for Subversion.
> > > > Subversion made easy with a complete admin console. Easy
> > > > to use, easy to manage, easy to install, easy to extend.
> > > > Get a Free download of the new open ALM Subversion platform now.
> > > > http://p.sf.net/sfu/wandisco-dev2dev
> > > > _______________________________________________
> > > > xmlpipedb-developer mailing list
> > > > xml...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
> > > >
> > > > <ATT00001..txt><ATT00002..txt>
> > >
> > >
> > >
> ------------------------------------------------------------------------------
> > > uberSVN's rich system and user administration capabilities and model
> > > configuration take the hassle out of deploying and managing Subversion
> and
> > > the tools developers use with it. Learn more about uberSVN and get a
> free
> > > download at: http://p.sf.net/sfu/wandisco-dev2dev
> > > _______________________________________________
> > > xmlpipedb-developer mailing list
> > > xml...@li...
> > > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
> > >
> > >
> > >
> > > <ATT00001..txt><ATT00002..txt>
> >
> >
> >
> ------------------------------------------------------------------------------
> > Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> > user administration capabilities and model configuration. Take
> > the hassle out of deploying and managing Subversion and the
> > tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> > _______________________________________________
> > xmlpipedb-developer mailing list
> > xml...@li...
> > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
> >
> >
> > <ATT00001..txt><ATT00002..txt>
>
>
>
> ------------------------------------------------------------------------------
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> _______________________________________________
> xmlpipedb-developer mailing list
> xml...@li...
> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>
|