Re: [XMLPipeDB-developer] System table subquery
Brought to you by:
kdahlquist,
zugzugglug
From: Richard B. <rbr...@gm...> - 2011-10-01 22:36:14
|
Reread my last line and of course was reminded to proof any quickly typed, edited and rewritten sentences before hitting send lol OLD AND CONFUSING SENTENCE: "But that said I have my head wrapped around the solution and will go ahead and code it out and complete getRelationshipTableManager() completely transitioned." NEW IMPROVED SENTENCE: "But that said, I have my head wrapped around the solution and will go ahead and complete the changes to getRelationshipTableManager(). These last changes will complete the transition from single species to a multiple species aware method." rb On Sat, Oct 1, 2011 at 3:29 PM, Richard Brous <rbr...@gm...> wrote: > Wow thanks for the detailed reply and anticipating questions that might > follow! > > Glad my understanding of what was going on was solid even though my nested > "else if for" code didn't actually make semantic sense. I realized it was > doing as you described but wanted to keep it in that form as a vehicle to > clearly relay my thinking. And since my maturity to object looping, I wasn't > sure if there was a slick way to fool the if else loop into thinking the > inner for loop was boolean. > > That leads to the transition of somehow combining the my "else if for" > conditional and the existing else code into a combined else: > > I didn't believe I had the option of dumping the existing else block since > it was doing something already (which also may be referenced by other code) > but I should have thought about combining them in the else block. I'm > sitting here thinking: Duh, it should have occurred to me. =/ > > But that said I have my head wrapped around the solution and will go ahead > and code it out and complete getRelationshipTableManager() completely > transitioned. > > Richard > > > On Sat, Oct 1, 2011 at 1:14 PM, John David N. Dionisio <do...@lm...>wrote: > >> Hi Rich, >> >> First, let's phrase out in English what gets done here overall. Then, >> still at the conceptual level, I'll talk about what needs to be done. >> Finally, we'll look at what needs to change in the code. In that part, we >> will also look at why you're getting syntax errors in your current working >> version (assuming that what you included is exactly the code that you >> currently have). At all points, be conscious of whether what I'm writing >> matches your current understanding, or not. If something clears up for you, >> then great; if not, let me know what isn't clear. >> >> 1. What is being done >> >> The method goes through the full list of relationship tables then, >> depending on those tables, chooses different algorithms for producing their >> data. >> >> The if conditions generally categorize these tables --- GeneOntology >> relationships are handled one way; UniProt relationships are handled another >> way; relationships that use other ID systems are handled yet another way. >> The condition on which you're stuck involves relationships involving an ID >> system that is *specific* to a particular species profile. In that >> situation, the code essentially defers all work to the specific species >> profile --- which makes sense, because, as a species-specific ID system, it >> is fair to assume that only the species profile knows how to handle its >> relationships to other ID systems. >> >> 2. What needs to be done, conceptually >> >> The change you are performing involves handling multiple species profiles. >> In other words, for every species profile that you are exporting, you need >> to give each of them an opportunity to handle the export to the currently >> chosen relationship table. That involves first checking if the current >> relationship table (e.g., "Blattner-GeneID") even applies to the species >> profile. If the relationship table does not involve species-specific ID >> systems, then that species profile effectively skips that relationship >> table. Otherwise, it then builds up the ID pairs to be exported, via the >> getSpeciesSpecificRelationshipTable method. >> >> 3. What needs to be done, in code >> >> So, given this view, let's look at the code: >> >> > else if >> ((selectedSpeciesProfiles.get(0).getSpeciesSpecificSystemTables().containsKey(stp.systemTable1) >> || >> > >> selectedSpeciesProfiles.get(0).getSpeciesSpecificSystemTables().containsKey(stp.systemTable2)) >> && >> > !stp.systemTable2.equals("GeneOntology")) { >> >> >> Note that, as written, this code checks *only one* species profile, and >> that is after all prior cases (GeneOntology, UniProt, two different ones) >> have already been checked. Your proposed change now is this: >> >> > else if( >> > for(SpeciesProfile species : selectedSpeciesProfiles) { >> > >> (species.getSpeciesSpecificSystemTables().containsKey(stp.systemTable1) || >> > >> species.getSpeciesSpecificSystemTables().containsKey(stp.systemTable2)) && >> > !stp.systemTable2.equals("GeneOntology") } >> > ) { >> >> >> You got the loop here, as described in part 2. You are indeed supposed to >> iterate through the selected species profiles, "ask" them if they need to do >> any exports with the current relationship table, then have them do so if >> they say "yes." >> >> Now, as to the problem with the actual code, look at how the process is >> phrased in English --- you *iterate first*, and *then* check if the >> relationship table matches. The source of your syntax error is that you are >> putting the for statement inside the if condition. If you step back for a >> moment, you'll see that this does not make semantic sense --- the if >> condition expects an expression that evaluates to a boolean value. The >> preface to a for statement is *not* such an expression --- in fact, it >> doesn't evaluate to anything. That is the heart of your syntax error. >> >> So, in reality, this last clause simply has to become a pure "else," since >> the condition has to be applied to *every single species profile*. The for >> loop can then take place within, and the if check happens *inside that*, >> since it has to happen for each selected species profile: >> >> else { >> for (SpeciesProfile species: selectedSpeciesProfiles) { >> if >> ((species.getSpeciesSpecificSystemTables().containsKey(stp.systemTable1) || >> >> species.getSpeciesSpecificSystemTables().containsKey(stp.systemTable2)) && >> !"GeneOntology".equals(stp.systemTable2)) { >> >> // Have the species profile do the relationship table export. >> >> } >> } >> } >> >> That's the overall structure. Now, some housecleaning: there is already a >> "pure else" clause at the bottom. However, if you look at that code, it >> effectively does nothing: it just emits a single record with blanks for its >> fields. I think you can safely skip that. Or, if you really do want to >> make sure that an inapplicable relationship table does have at least this >> dummy record, you can have a boolean in the code given above that indicates >> whether or not the relationship table was handled: >> >> else { >> boolean relationshipTableWasHandled = false; >> /* Same for loop and if statement. */ { >> // This would be in the case that a species profile does >> perform an export. >> relationshipTableWasHandled = true; >> } >> >> if (!relationshipTableWasHandled) { >> // Do the single-blank-row export here. >> } >> } >> >> So, that's the run of it. Hope this walkthrough and breakdown clears up >> the issues. >> >> John David N. Dionisio, PhD >> Associate Professor, Computer Science >> Associate Director, University Honors Program >> Loyola Marymount University >> >> >> On Oct 1, 2011, at 11:34 AM, Richard Brous wrote: >> >> > I'm hung up on the last else if conditional within >> getRelationshipTableManager() >> > >> > This is the "Species-X or X-Species" conditional, excluding GeneOntology >> > >> > The original single species conditional is: >> > >> > else if >> ((selectedSpeciesProfiles.get(0).getSpeciesSpecificSystemTables().containsKey(stp.systemTable1) >> || >> > >> selectedSpeciesProfiles.get(0).getSpeciesSpecificSystemTables().containsKey( >> stp.systemTable2)) && >> > !stp.systemTable2.equals("GeneOntology")) { >> > >> > Now it needs to be multispecies aware obviously so... >> > >> > Can I add a for loop within the else if and change the following >> tablemanager creation logic , such as: >> > >> > else if( >> > for(SpeciesProfile species : selectedSpeciesProfiles) { >> > >> (species.getSpeciesSpecificSystemTables().containsKey(stp.systemTable1) || >> > >> species.getSpeciesSpecificSystemTables().containsKey(stp.systemTable2)) && >> > !stp.systemTable2.equals("GeneOntology") } >> > ) { >> > >> > >> > adding the for loop is wrought with syntax errors, but the real question >> is ... am I missing a much simpler solution to solve this? >> > >> > Richard >> > >> > >> > On Sun, Sep 25, 2011 at 9:20 PM, Richard Brous <rbr...@gm...> >> wrote: >> > Worked on the code during the past few days and have committed some >> changes tonight. >> > Also successfully exported without error. >> > >> > UniProtDatabaseProfile.java >> > >> > • clean up of some comments and old code >> > • cleaned up logging code for getSystemTableManager() >> > • Worked on getRelationshipTableManager() >> > • [Uniprot - X conditional] >> > • rewrote SQL programmatically >> > • used looping to create correct setStrings >> > • added logging to surface details >> > • -[X - X conditional] >> > • rewrite of programmatic SQL is in progress >> (created stringbuilder etc. in preparation) >> > • added logging >> > • -[Species - X or X - Species] >> > • added minimal logging to be expanded upon >> > DatabaseProfile.java >> > >> > • cleaned up comments >> > • reviewed getRelationsTableManager() and added some logging to it >> > ExportToGenMAPP.java >> > >> > • cleaned up code and comments for readability >> > >> > >> > >> > Appreciate any feedback as usual =D >> > >> > Richard >> > >> > >> > On Mon, Sep 19, 2011 at 6:22 PM, John David N. Dionisio <do...@lm...> >> wrote: >> > Very cool; looks like we can move on now. Dr. Dahlquist and I suspect >> that the relationship tables may not actually be as hard as they seem; just >> a matter of tweaking the initial queries (which you're more comfortable >> doing now!) so that they return the corresponding records for all of the >> requested species. Onward we go :) >> > >> > John David N. Dionisio, PhD >> > Associate Professor, Computer Science >> > Associate Director, University Honors Program >> > Loyola Marymount University >> > >> > >> > >> > On Sep 19, 2011, at 11:09 AM, Kam Dahlquist wrote: >> > >> > > Hi, >> > > >> > > LMU can accept 20 MB attachments now; I don't know how big your file >> is zipped, but that's an option. You could also use LionShare. >> > > >> > > Glad to see success! >> > > >> > > Kam >> > > >> > > At 06:05 PM 9/18/2011, you wrote: >> > >> Tried to post my latest gdb export to the biodb wiki but receiving >> log in error (per separate email) >> > >> >> > >> But, I wanted to let you know I verified that each improper system >> table does in fact contain the id and species name (samples of each): >> > >> >> > >> >> > >> Pfam >> > >> ID Species Date >> > >> PF02866 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> PF07050 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> PF03479 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> PF02317 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> PF03279 |Pseudomonas aeruginosa| 9/14/2011 >> > >> PF09922 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> PF04205 |Pseudomonas aeruginosa| 9/14/2011 >> > >> PF03379 |Pseudomonas aeruginosa| 9/14/2011 >> > >> PF05389 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> >> > >> >> > >> RefSeq >> > >> ID Species Date >> > >> YP_039973 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> YP_040411 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> YP_039521 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> NP_253798 |Pseudomonas aeruginosa| 9/14/2011 >> > >> NP_254101 |Pseudomonas aeruginosa| 9/14/2011 >> > >> NP_248930 |Pseudomonas aeruginosa| 9/14/2011 >> > >> NP_251503 |Pseudomonas aeruginosa| 9/14/2011 >> > >> NP_249960 |Pseudomonas aeruginosa| 9/14/2011 >> > >> NP_253220 |Pseudomonas aeruginosa| 9/14/2011 >> > >> YP_040877 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> NP_253802 |Pseudomonas aeruginosa| 9/14/2011 >> > >> >> > >> >> > >> GeneId >> > >> ID Species Date >> > >> 879140 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2859243 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> 879337 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2860668 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> 881899 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 882312 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2860009 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> 881520 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2861152 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> 881596 |Pseudomonas aeruginosa| 9/14/2011 >> > >> >> > >> >> > >> InterPro >> > >> ID Species Date >> > >> IPR022522 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR003538 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR016379 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR016920 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR000477 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR008948 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR005415 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR009651 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> IPR007895 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR008231 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR004558 |Pseudomonas aeruginosa| 9/14/2011 >> > >> IPR011067 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> IPR006358 |Pseudomonas aeruginosa| 9/14/2011 >> > >> >> > >> >> > >> PDB >> > >> ID Species Date >> > >> 2ZWS |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2F9I |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> 2EXV |Pseudomonas aeruginosa| 9/14/2011 >> > >> 3L34 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 1EZM |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2WYB |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2F1L |Pseudomonas aeruginosa| 9/14/2011 >> > >> 1D7L |Pseudomonas aeruginosa| 9/14/2011 >> > >> 1Y12 |Pseudomonas aeruginosa| 9/14/2011 >> > >> 2IXH |Pseudomonas aeruginosa| 9/14/2011 >> > >> 1XAG |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> 2IXI |Pseudomonas aeruginosa| 9/14/2011 >> > >> >> > >> >> > >> EMBL >> > >> ID Species Date >> > >> AJ003006 |Pseudomonas aeruginosa| 9/14/2011 >> > >> BX571856 |Staphylococcus aureus (strain MRSA252)| 9/14/2011 >> > >> AJ633619 |Pseudomonas aeruginosa| 9/14/2011 >> > >> AJ633602 |Pseudomonas aeruginosa| 9/14/2011 >> > >> U07359 |Pseudomonas aeruginosa| 9/14/2011 >> > >> X54201 |Pseudomonas aeruginosa| 9/14/2011 >> > >> AY899300 |Pseudomonas aeruginosa| 9/14/2011 >> > >> AB085582 |Pseudomonas aeruginosa| 9/14/2011 >> > >> AB075926 |Pseudomonas aeruginosa| 9/14/2011 >> > >> AF306766 |Pseudomonas aeruginosa| 9/14/2011 >> > >> X99471 |Pseudomonas aeruginosa| 9/14/2011 >> > >> M21093 |Pseudomonas aeruginosa| 9/14/2011 >> > >> >> > >> >> > >> Richard >> > >> >> > >> On Wed, Sep 14, 2011 at 6:50 PM, Richard Brous <rbr...@gm...> >> wrote: >> > >> OK, going to commit my changes to UniProtDatabaseProfile: >> getSystemTableManager() >> > >> >> > >> Here is some logging info to confirm result.next() within the while >> loop is both id and species name: >> > >> >> > >> 3070712 [Thread-4] INFO >> edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtDatabaseProfile >> - getSystemTableManager(): while loop: ID:: IPR006314 Species:: >> Staphylococcus aureus (strain MRSA252) >> > >> 3070712 [Thread-4] INFO >> edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtDatabaseProfile >> - getSystemTableManager(): while loop: ID:: IPR013840 Species:: >> Staphylococcus aureus (strain MRSA252) >> > >> 3070728 [Thread-4] INFO >> edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtDatabaseProfile >> - getSystemTableManager(): while loop: ID:: IPR015887 Species:: >> Pseudomonas aeruginosa >> > >> 3070728 [Thread-4] INFO >> edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles.UniProtDatabaseProfile >> - getSystemTableManager(): while loop: ID:: IPR016148 Species:: >> Pseudomonas aeruginosa >> > >> >> > >> Richard >> > >> >> > >> On Wed, Sep 14, 2011 at 3:20 PM, Richard Brous <rbr...@gm...> >> wrote: >> > >> Solid progress made and I now have a compilable working copy. >> > >> >> > >> I'm now working through how to plug the results of the query into the >> proper slots under the while loop. I had stumbled on an error where I was >> supplying the column name instead of the actual data from the tuple. Solved >> that and am hoping the current export will be the last before I commit to >> sourceforge. >> > >> >> > >> Richard >> > >> >> > >> >> > >> On Mon, Sep 12, 2011 at 10:59 PM, Richard Brous <rbr...@gm...> >> wrote: >> > >> Spent the weekend reviewing sql and I have achieved some clarity. >> > >> >> > >> I'm still working through things but not at least I have better >> context and can ask intelligent questions. >> > >> >> > >> The sub query was a big help, I'm not sure how long it would have >> taken me to do all the joins using ON to return "hjid | species name" >> > >> >> > >> Dondi - I'll stop by after theory tomorrow to discuss further. >> > >> >> > >> Thanks. >> > >> >> > >> Richard >> > >> >> > >> >> > >> On Thu, Sep 8, 2011 at 5:24 PM, John David N. Dionisio < >> do...@lm...> wrote: >> > >> Hi Rich, >> > >> >> > >> As discussed in our meeting, here is a first step toward the new >> system table query: >> > >> >> > >> SELECT entrytype.hjid, organismnametype.value FROM entrytype >> INNER JOIN organismtype ON (entrytype.organism = organismtype.hjid) inner >> join organismnametype on (organismtype.hjid = >> organismnametype.organismtype_name_hjid) INNER JOIN dbreferencetype >> ON(dbreferencetype.organismtype_dbreference_hjid = organismtype.hjid) WHERE >> dbreferencetype.type = 'NCBI Taxonomy' and (id = '90371'); >> > >> >> > >> (substitute the "id = " clause accordingly) >> > >> >> > >> John David N. Dionisio, PhD >> > >> Associate Professor, Computer Science >> > >> Associate Director, University Honors Program >> > >> Loyola Marymount University >> > >> >> > >> >> > >> >> > >> >> > >> >> ------------------------------------------------------------------------------ >> > >> Why Cloud-Based Security and Archiving Make Sense >> > >> Osterman Research conducted this study that outlines how and why >> cloud >> > >> computing security and archiving is rapidly being adopted across the >> IT >> > >> space for its ease of implementation, lower cost, and increased >> > >> reliability. Learn more. >> http://www.accelacomm.com/jaw/sfnl/114/51425301/ >> > >> _______________________________________________ >> > >> xmlpipedb-developer mailing list >> > >> xml...@li... >> > >> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > > <ATT00001..txt><ATT00002..txt> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > All the data continuously generated in your IT infrastructure contains a >> > definitive record of customers, application performance, security >> > threats, fraudulent activity and more. Splunk takes this data and makes >> > sense of it. Business sense. IT sense. Common sense. >> > http://p.sf.net/sfu/splunk-d2dcopy1 >> > _______________________________________________ >> > xmlpipedb-developer mailing list >> > xml...@li... >> > https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> > >> > >> > <ATT00001..txt><ATT00002..txt> >> >> >> >> ------------------------------------------------------------------------------ >> All of the data generated in your IT infrastructure is seriously valuable. >> Why? It contains a definitive record of application performance, security >> threats, fraudulent activity, and more. Splunk takes this data and makes >> sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunk-d2dcopy2 >> _______________________________________________ >> xmlpipedb-developer mailing list >> xml...@li... >> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> > > |