Re: [XMLPipeDB-developer] 499 - PROBLEM - M tuberculosis xml tag importation
Brought to you by:
kdahlquist,
zugzugglug
From: Richard B. <rbr...@gm...> - 2011-02-22 06:01:09
|
hmm not taking parenthesis where I thought they should go... syntax error select count (*) from genenametype where type = ('ordered locus' or 'ORF') and value like 'Rv%'; also tried select count (*) from genenametype where (type = 'ordered locus' or type = 'ORF') and value like 'Rv%'; On Mon, Feb 21, 2011 at 9:40 PM, Richard Brous <rbr...@gm...> wrote: > ah yes... i see it... > > > On Mon, Feb 21, 2011 at 9:33 PM, John David N. Dionisio <do...@lm...>wrote: > >> Watch your parentheses: "and" has greater precedence than "or" :) >> >> >> John David N. Dionisio, PhD >> Associate Professor, Computer Science >> Loyola Marymount University >> >> >> On Feb 21, 2011, at 7:59 PM, Richard Brous <rbr...@gm...> wrote: >> >> OK, so here are my query results from raw SQL: >> >> 1. using: like 'Rv%' >> >> select count (*) from genenametype where type = 'ordered locus' and value >> like 'Rv%'; >> returns 3988 >> >> select count (*) from genenametype where type = 'ORF' and value like >> 'Rv%'; >> returns 70 >> >> select count (*) from genenametype where type = 'ordered locus' or type = >> 'ORF' and value like 'Rv%'; >> returns 7011 >> >> 2. regular expression : value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*' >> >> select count (*) from genenametype where type = 'ordered locus' and value >> ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*'; >> returns 3988 >> >> select count (*) from genenametype where type = 'ordered locus' or type = >> 'ORF' and value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*'; >> returns 7011 >> >> select count (*) from genenametype where type = 'ORF' and value ~ >> '[Rr][Vv][0-9][0-9][0-9][0-9]*'; >> returns 70 >> >> Conclusions: >> >> 1. It seems that querying for type = 'ORF' alone surfaces the 69 genes >> were were looking for plus one more (maybe the count for missing genes is >> off by 1?). >> >> 2. Combining the two types in a single query did not produce the results >> that I expected (7011? - how did that happen????) so this is likely not our >> solution... unless of course the query syntax isn't actually doing what I >> think it is... >> >> 3. I would think the best course of action is to serialy run two separate >> queries to capture all the required genes, then removing the one unneeded >> gene if its truly not wanted. >> >> What do you think? >> >> Richard >> >> >> On Mon, Feb 21, 2011 at 5:17 PM, John David N. Dionisio <do...@lm...>wrote: >> >>> I don't recall the exact details of the missing 69, but if your query >>> successfully returns them in raw SQL, then this is worth a try. You can >>> integrate into the same query as long as the same columns are returned, >>> which is the case here AFAIK, so go ahead and extend the existing query. >>> >>> >>> John David N. Dionisio, PhD >>> Associate Professor, Computer Science >>> Loyola Marymount University >>> >>> On Feb 21, 2011, at 6:56 PM, Richard Brous <rbr...@gm...> wrote: >>> >>> So here is the appropriate code snippet from >>> MycobacteriumTuberculosisUniProtSpeciesProfile.java: >>> >>> * >>> >>> public >>> *TableManager getSystemTableManagerCustomizations(TableManager >>> tableManager, TableManager primarySystemTableManager, Date version) * >>> throws* SQLException, InvalidParameterException { >>> >>> // Build the base query; we only use "ordered locus" and we only want >>> >>> // IDs that begin with "*Rv*." >>> >>> PreparedStatement ps = ConnectionManager.*getRelationalDBConnection* >>> ().prepareStatement( >>> "SELECT value, type " + >>> >>> "FROM genenametype INNER JOIN entrytype_genetype " + >>> >>> "ON (entrytype_genetype_name_hjid = entrytype_genetype.hjid) " + >>> >>> "WHERE type = 'ordered locus' and value like 'Rv%' and >>> entrytype_gene_hjid = ?"); >>> >>> ResultSet result; >>> >>> *for* (Row row : primarySystemTableManager.getRows()) { >>> >>> ps.setInt(1, Integer.*parseInt*(row.getValue( >>> "UID"))); >>> >>> result = ps.executeQuery(); >>> >>> // We actually want to keep the case where multiple ordered locus >>> >>> // names appear. >>> >>> *while* (result.next()) { >>> >>> // We want this name to appear in the OrderedLocusNames >>> >>> // system table. >>> >>> *for* (String id : result.getString("value").split("/")) { >>> >>> tableManager.submit( >>> "OrderedLocusNames", QueryType.*insert*, *new* String[][] { { "ID", id >>> }, { "Species", "|" + getSpeciesName() + "|" }, { "\"Date\"", >>> GenMAPPBuilderUtilities.*getSystemsDateString*(version) }, { "UID", >>> row.getValue("UID") } }); >>> >>> } >>> >>> } >>> >>> } >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> So now we want to build the base query which uses "ordered locus" and >>> "orf" and we only want IDs that begin with "Rv". >>> >>> I know there are more comprehensive ways to search for gene ID's >>> by matching gene ID prefix but "like Rv%" seemed to work thus far, we just >>> need to tell it to search for XML tag type orf in addition to ordered locus. >>> >>> "WHERE type = 'ordered locus' and type = 'orf' and value like 'Rv%' and >>> entrytype_gene_hjid = ? " >>> >>> Here is a stab at it.... This part of our class was right as the server >>> went down and my submission for week 6 assignment I can't seem to find. >>> >>> Is it possible to have two different types in the same query or should we >>> rewrite a separate query for the orf tag? >>> >>> Richard >>> >>> >>> >>> On Sun, Feb 20, 2011 at 10:21 PM, Richard Brous < <rbr...@gm...> >>> rbr...@gm...> wrote: >>> >>> >>>> thanks and will do as directed. >>>> >>>> My previous, last paragraph comment - A way for programming code in >>>> email holding its format in a mail message similarly to how you can post >>>> code on forum pages? >>>> >>>> <code> >>>> blah >>>> blah >>>> blah >>>> </code> >>>> >>>> thanks! >>>> >>>> Richard >>>> >>>> On Sun, Feb 20, 2011 at 10:05 PM, John David N. Dionisio <<do...@lm...> >>>> do...@lm...> wrote: >>>> >>>> >>>>> Greetings, >>>>> >>>>> Actually, gmbuilder.properties is for the TallyEngine only. When >>>>> dealing with .gdb exports, look *only* at the SpeciesProfile class. So, to >>>>> find those 69 IDs, it is the SpeciesProfile code, and *only* the >>>>> SpeciesProfile code, that needs to be changed. >>>>> >>>>> Your take on how gmbuilder.properties is used, however, is >>>>> understandable. It makes sense to assume that the TallyEngine code *and* >>>>> the ID export code are based on the same characterization of the needed IDs. >>>>> This replication is originally a historical artifact: SpeciesProfile was >>>>> done first, and then TallyEngine was done later by another student. >>>>> >>>>> However, there are other factors beyond history that sort of >>>>> necessitate this duplication of desired IDs: (skip the two bullets below if >>>>> you'd rather cut to the chase of the work to be done, and discuss design >>>>> issues later) >>>>> >>>>> - The actual XML import code is a black box: this is the "canned" JAXB >>>>> library actually in action, and not our code at all. Plus, the XML import >>>>> code really does not filter (nor should it), since the goal of the >>>>> XML->relational database step is to fully capture the XML data in the >>>>> relational database. So, XML count is necessarily separated from XML >>>>> import. >>>>> >>>>> - The notion of a declarative mechanism for extracting IDs from the >>>>> relational database (which is what gmbuilder.properties/TallyEngine uses) is >>>>> interesting, but at the same time there is value in the arbitrary >>>>> computation that can be done with Java (case in point: export two versions >>>>> of an ID, with and without periods). This is not to say that it is >>>>> impossible to do this declaratively, but let's just say that the procedural >>>>> approach exists here and now, and a declarative approach will need more >>>>> thought. >>>>> >>>>> These, and other factors, are good thoughts to hold onto and would be >>>>> worthy of a good meeting discussion sometime, but bottom line for now: >>>>> modifying the export behavior is a matter of editing the *SpeciesProfile* >>>>> Java code, and not the gmbuilder.properties file. Turn your attention to >>>>> that code. >>>>> >>>>> Now, as to annotating your code...I'd just put in code comments :) Or >>>>> did you mean something else by tagging code in e-mail? >>>>> >>>>> John David N. Dionisio, PhD >>>>> Associate Professor, Computer Science >>>>> Loyola Marymount University >>>>> >>>>> >>>>> >>>>> >>>>> On Feb 21, 2011, at 12:38 AM, Richard Brous wrote: >>>>> >>>>> > also, how do I tag code in email so it holds its formatting? I tried >>>>> a few suggestions I found on the web but they aren't holding formatting or >>>>> i'm just doing it wrong ;-D >>>>> > >>>>> > Richard >>>>> > >>>>> > On Sun, Feb 20, 2011 at 9:35 PM, Richard Brous <<rbr...@gm...> >>>>> rbr...@gm...> wrote: >>>>> > OK, have some updates and some suggestions: >>>>> > >>>>> > On Friday Dr. Dahlquist and I sat down and reviewed the gene testing >>>>> report. We verified that XML match does indeed find 4066 unique matches - 75 >>>>> of which are not in the gdb and need to be. >>>>> > >>>>> > Dr. Dahlquist informed me that she was the one who completed the gene >>>>> db testing report, not a previous student of BIO367 and had already verified >>>>> which genes were missing and where they were to be found. I had (mistakenly) >>>>> assumed that since a student had performed the gene database testing I had >>>>> to redo all of the verification. >>>>> > >>>>> > So that said, of the 75 genes missing - 69 need to be included and 6 >>>>> excluded. >>>>> > Per the gene db testing report: "69 of them have an "a", "b", or "d" >>>>> suffix. They are all found in the ORF tag and need to be included in the >>>>> gdb." >>>>> > >>>>> > To solve this we need to add additional search criteria into the M. >>>>> tuberculosis section in gmbuilder.properties below: >>>>> > # Mycobacterium tuberculosis >>>>> > >>>>> > mycobacteriumtuberculosis_level_amount= >>>>> > >>>>> > 1 >>>>> > >>>>> > mycobacteriumtuberculosis_element_level0= >>>>> > >>>>> > uniprot/entry/gene/name&type&ordered locus >>>>> > >>>>> > mycobacteriumtuberculosis_query_level0= >>>>> > >>>>> > select count(*) from genenametype where type = 'ordered locus' and >>>>> value like 'Rv%'; >>>>> > >>>>> > mycobacteriumtuberculosis_table_name_level0= >>>>> > >>>>> > Ordered Locus >>>>> > SOLUTIONS: >>>>> > >>>>> > 1. So am i correct in my understanding that the second line is the >>>>> query used by TallyEngine to read the XML file? If so then this is the issue >>>>> we need to table for the moment until we get the gbd verified and >>>>> re-released. We will revisit this to discover why it is not only reporting >>>>> incorrectly but also why its added a second row of Ordered Locus on the >>>>> TallyEngine results page. >>>>> > >>>>> > 2. The third line is the SQL query used by postgres during the export >>>>> from XML to gdb. To find and get the ORF tagged genes could we not add the >>>>> following lines and change the count in the first line: >>>>> > >>>>> > >>>>> > # Mycobacterium tuberculosis >>>>> > >>>>> > mycobacteriumtuberculosis_level_amount=2 >>>>> > >>>>> > >>>>> > >>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered >>>>> locus >>>>> > >>>>> mycobacteriumtuberculosis_element_level1=uniprot/entry/gene/name&type&orf >>>>> > >>>>> > >>>>> > mycobacteriumtuberculosis_query_level0= >>>>> > >>>>> > select count(*) from genenametype where type = 'ordered locus'; >>>>> > mycobacteriumtuberculosis_query_level1=select count(*) from >>>>> genenametype where type = 'orf'; >>>>> > >>>>> > >>>>> > mycobacteriumtuberculosis_table_name_level0= >>>>> > >>>>> > Ordered Locus >>>>> > mycobacteriumtuberculosis_table_name_level1=Ordered Locus >>>>> > >>>>> > >>>>> ---------------------------------------------------------------------------------------------------------------------------- >>>>> > >>>>> > Of course these queries would have be manually verified prior to >>>>> making these changes but this seems like we are moving in the right >>>>> direction. >>>>> > >>>>> > Richard >>>>> > >>>>> > >>>>> > On Thu, Feb 17, 2011 at 7:47 PM, Richard Brous <<rbr...@gm...> >>>>> rbr...@gm...> wrote: >>>>> > Just got done reading previous email and understand the change in >>>>> priority. >>>>> > >>>>> > Will work on the missing ID's for now and shelve the the TalleyEngine >>>>> issue for the moment. >>>>> > >>>>> > Also great about a more formalized weekly meeting. I was going to >>>>> suggest it myself as it has been slow going so far as maybe i'm a bit too >>>>> independent in this independent study class =D >>>>> > >>>>> > Will dig further into the missing ID's later tonight and during day >>>>> tomorrow and report back. >>>>> > >>>>> > Richard >>>>> > >>>>> > On Thu, Feb 17, 2011 at 4:34 PM, John David N. Dionisio <<do...@lm...> >>>>> do...@lm...> wrote: >>>>> > Hi Rich, >>>>> > >>>>> > No problem. The pertinent line you're referring to, for XML, is >>>>> this, right above the line you copied: >>>>> > >>>>> > >>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered >>>>> locus >>>>> > >>>>> > The slash-separated section is the "path" of XML tags leading to the >>>>> element of interest; then, after the ampersand, is a name/value pair for the >>>>> desired attribute to count. Note that there is no hint of a *content*-based >>>>> filter (nor is there the capability for one, as far as I can tell in the >>>>> code). By "content," I mean that we can't specify filters based on what's >>>>> *between* the tags. We can only go as far as filter by attribute value, >>>>> e.g., type="ordered locus". >>>>> > >>>>> > But anyway, as mentioned in the earlier e-mail, let's have the >>>>> missing IDs in the .gdb take precedence for now. Please take a look at the >>>>> tuberculosis, A. thaliana, and P. falciparum profiles to get an idea for how >>>>> the ID output can be customized, then let me know if you have any questions >>>>> or need to confirm anything. >>>>> > >>>>> > John David N. Dionisio, PhD >>>>> > Associate Professor, Computer Science >>>>> > Loyola Marymount University >>>>> > >>>>> > >>>>> > >>>>> > On Feb 17, 2011, at 3:04 PM, Richard Brous wrote: >>>>> > >>>>> > > Sorry been slammed with a programming assignment that kept needing >>>>> continued iteration and it has been all consuming until last night. But I >>>>> did get a chance to work with your comments and review the code again with a >>>>> different mind set. >>>>> > > >>>>> > > Yes, I examined the gmbuilder.properties file ( the query is also >>>>> in the MycobacteriumTuberculosisUniProtSpeciesProfile which I mentioned in a >>>>> previous email ) but I don't think I see what you mean regarding the XML >>>>> count. >>>>> > > >>>>> > > I understood that: mycobacteriumtuberculosis_query_level0=select >>>>> count(*) from genenametype where type = 'ordered locus' and value like >>>>> 'Rv%'; was the db query but don't see which is the XML count... or do they >>>>> share the same query and you are saying that XML count doesn't recognize and >>>>> therefore cannot use the 'Rv%' parameter? >>>>> > > >>>>> > > Richard >>>>> > > >>>>> > > >>>>> > > >>>>> > > On Sat, Feb 12, 2011 at 11:46 PM, John David N. Dionisio <<do...@lm...> >>>>> do...@lm...> wrote: >>>>> > > Hi Rich, >>>>> > > >>>>> > > Sorry for the delay. Had some distractions coming into the >>>>> weekend. >>>>> > > >>>>> > > You've looked at the code; have you looked at gmbuilder.properties? >>>>> (I may have mentioned it a few e-mails ago, just as you were starting to >>>>> dig into this) >>>>> > > >>>>> > > On the copy I have, the M. tuberculosis block looks like this >>>>> (indentation is mine to set it apart): >>>>> > > >>>>> > > # Mycobacterium tuberculosis >>>>> > > mycobacteriumtuberculosis_level_amount=1 >>>>> > > >>>>> > > >>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered >>>>> locus >>>>> > > >>>>> > > mycobacteriumtuberculosis_query_level0=select count(*) from >>>>> genenametype where type = 'ordered locus' and value like 'Rv%'; >>>>> > > >>>>> > > mycobacteriumtuberculosis_table_name_level0=Ordered Locus >>>>> > > >>>>> > > There, I think, is the rub. Notice that the XML count does not >>>>> filter on RV%. The SQL query does. >>>>> > > >>>>> > > Unfortunately, I don't think the TallyEngine can include selective >>>>> filtering in the XML counts. If the need to do selective filtering on XML >>>>> is necessary, then I think we're looking at a new functionality for you to >>>>> implement (or, if this throws things off too much, this may have to be noted >>>>> somewhere, that the XML vs. database counts may be off because the database >>>>> count is doing some text-based filtering but the XML count does not). >>>>> > > >>>>> > > What does xmlpipedb-match say? That will at least tell you whether >>>>> the 'RV%' count is indeed correct. >>>>> > > >>>>> > > John David N. Dionisio, PhD >>>>> > > Associate Professor, Computer Science >>>>> > > Loyola Marymount University >>>>> > > >>>>> > > >>>>> > > >>>>> > > On Feb 11, 2011, at 4:52 PM, Richard Brous wrote: >>>>> > > >>>>> > > > OK here is what I was able to put together from the past few >>>>> hours of code review: >>>>> > > > >>>>> > > > MycobacteriumTuberculosisUniProtSpeciesProfile.java: >>>>> > > > -reveals that after the 2 System table modifications are made >>>>> adding species name and link, a PreparedStatement is instantiated which >>>>> builds and calls the base query. >>>>> > > > >>>>> > > > -The base query called is: ("SELECT value, type " + "FROM >>>>> genenametype INNER JOIN entrytype_genetype " + >>>>> "ON(entrytype_genetype_name_hjid = entrytype_genetype.hjid) " + "WHERE type >>>>> = 'ordered locus' and value like 'Rv%' and entrytype_gene_hjid = ?") >>>>> > > > >>>>> > > > -So its looking in 'ordered locus' table/column for any tuple >>>>> that starts with Rv (followed by any substring) and entrytype_gene_hjid = ? >>>>> . >>>>> > > > The 'like' comparator and % usage are clear with the 'type' >>>>> entrytype_gene_hjid = ? >>>>> > > > >>>>> > > > -To me it seems the query makes sense so the problem is likely >>>>> elsewhere. >>>>> > > > >>>>> > > > GenMappBuilder.java: >>>>> > > > -I found method doTallies() at code line 895 which: >>>>> > > > Instantiates a Configuration called hibernateConfiguration and >>>>> assigns to it the current hibernate configuration >>>>> > > > Validates database settings by analyzing hibernateConfiguration >>>>> > > > Instantiates a CriterionList for uniprot and assigns to it >>>>> TallyType.UNIPROT >>>>> > > > Instantiates a CriterionList for go and assigns to it >>>>> TallyType.GO >>>>> > > > Determines if both xml files exist >>>>> > > > Then getTallyResultsXML and getTallyResultsDatabase are run on >>>>> both xml files and their respective CriterionList >>>>> > > > Results are then formatted for display in a table. >>>>> > > > >>>>> > > > -So enum TallyType which means that they are the only valid >>>>> datatypes which TallyEngine accepts... go to know ... >>>>> > > > >>>>> > > > -Based on the screen shot of Tally Engine it would seem that both >>>>> getTallyResultsXML() and getTallyResultsDatabase() are incorrectly >>>>> returning. Likely due to both using an incorrect query (as we previously >>>>> supposed). But where are the queries?... the more I dig the more I think >>>>> they are in the criterial all the work is done against. >>>>> > > > >>>>> > > > continuing the review: >>>>> > > > getTallyResultsXML() calls Tally Engine instance method >>>>> getXmlFileCounts(xmlFile) >>>>> > > > getTallyResultsDatabase() calls Tally Engine instance method >>>>> getDbcounts(new QueryEngine(hibernateConfiguration) >>>>> > > > Both of these instanced methods originate from >>>>> TallyEngine.java... >>>>> > > > >>>>> > > > TallyEngine.java: >>>>> > > > >>>>> > > > getXmlFileCounts() calls digestXmlFile() which instantiates a >>>>> digester then processes against criteria... but this quickly becomes >>>>> confusing and is hard to follow >>>>> > > > >>>>> > > > getDbcounts() then starts a db session and executes a query but >>>>> then I also get a bit lost with my limited db knowledge. >>>>> > > > >>>>> > > > >>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >>>>> > > > >>>>> > > > OVERALL I think I'm getting closer to the issues but I still feel >>>>> as if I'm missing some understanding to proceed further. Can you pass along >>>>> some of that Dondi insight and steer me in the right direction? =D >>>>> > > > >>>>> > > > -DB Tally - Not having taken databases yet certainly is limiting >>>>> my ability determine where the "criteria" are being set and how they are >>>>> followed during session activities. Also is the query we have been looking >>>>> for this whole time in the criteria or someplace else? >>>>> > > > >>>>> > > > -XML Tally - again is the query contained within the criteria >>>>> that digestXmlFile() uses to parse? >>>>> > > > >>>>> > > > Richard >>>>> > > > >>>>> > > > >>>>> > > > On Mon, Feb 7, 2011 at 5:50 PM, John David N. Dionisio <<do...@lm...> >>>>> do...@lm...> wrote: >>>>> > > > Right, schema issues are unlikely. Most count discrepancies like >>>>> this that I've seen have boiled down to forming the right query. Then, >>>>> knowing the right query (in both XML and SQL), it's a matter of making sure >>>>> that TallyEngine asks that same query. >>>>> > > > >>>>> > > > John David N. Dionisio, PhD >>>>> > > > Associate Professor, Computer Science >>>>> > > > Loyola Marymount University >>>>> > > > >>>>> > > > >>>>> > > > On Feb 7, 2011, at 5:48 PM, Richard Brous wrote: >>>>> > > > >>>>> > > > > OK, so based on your approach: >>>>> > > > > >>>>> > > > > 1. I'll start with reviewing the queries for xmlpipedb-match >>>>> and sql queries needed for the respective results as you requested. >>>>> > > > > >>>>> > > > > I was also thinking I may need to review the schema from xml >>>>> into postgres but the issue isn't likely a schema error. The error most >>>>> likely lies in how xmlpipedbutils queries the data from xml source and >>>>> writes to the tables what it returns? >>>>> > > > > >>>>> > > > > 2. I'll review the code: trace the entrance of tally engine in >>>>> the gmbuilder code then follow it through the xmlpipedbutils. >>>>> > > > > >>>>> > > > > Richard >>>>> > > > > >>>>> > > > > On Sat, Feb 5, 2011 at 10:28 AM, John David N. Dionisio <<do...@lm...> >>>>> do...@lm...> wrote: >>>>> > > > > Just wanted to confirm (since I wasn't sure in the first >>>>> e-mail) --- the XMLPipeDB Utilities source code is in trunk/xmlpipedbutils >>>>> in SourceForge's Subversion repo. >>>>> > > > > >>>>> > > > > John David N. Dionisio, PhD >>>>> > > > > Associate Professor, Computer Science >>>>> > > > > Loyola Marymount University >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > On Feb 5, 2011, at 10:02 AM, Richard Brous wrote: >>>>> > > > > >>>>> > > > > > Hi Dondi, >>>>> > > > > > >>>>> > > > > > So I'm at the point in working with M tuberculosis that I was >>>>> able to exactly reproduce Dr. Dahlquist's problematic TallyEngine results. >>>>> > > > > > >>>>> > > > > > gmb2b60 Results >>>>> > > > > > >>>>> > > > > > >>>>> > > > > > >>>>> > > > > > Now the proverbial question - What next to solve the Ordered >>>>> Locus import/count issue? >>>>> > > > > > >>>>> > > > > > ********************************************** >>>>> > > > > > Here is my thought process: >>>>> > > > > > >>>>> > > > > > Step 1: How does the import process work at the high level? >>>>> (obviously correct me if I'm wrong) >>>>> > > > > > >>>>> > > > > > I believe that basically as each XML tag is read, it is >>>>> placed in the proper Postgres table(s) based on some criteria. There is also >>>>> likely some sort of check that each individual tag is in valid XML format >>>>> unless we don't care at this stage (care at export) or maybe the parser just >>>>> skips over and goes on to the next . >>>>> > > > > > >>>>> > > > > > Step 2: What could be the problem? >>>>> > > > > > >>>>> > > > > > Either - >>>>> > > > > > a. XML tags are being parsed incorrectly (ignored/skipped)? >>>>> > > > > > b. Decision criteria of which table they should be added to? >>>>> > > > > > >>>>> > > > > > ********************************************** >>>>> > > > > > >>>>> > > > > > I read on the sourceforge wiki: >>>>> > > > > > >>>>> > > > > > XMLPipeDB has a modular architecture with three components >>>>> that may be used separately or together. XSD-to-DB reads an XSD (XML Schema >>>>> Definition) and automatically generates an SQL schema, Java classes, and >>>>> Hibernate mappings. XMLPipeDB Utilities provides functionality for >>>>> configuring the database, importing data, and performing queries. GenMAPP >>>>> Builder is based on the XMLPipeDB Utilities and exports GenMAPP-compatible >>>>> Gene Databases based on data from UniProt and Gene Ontology (GO). >>>>> > > > > > >>>>> > > > > > So I should probably start with the XMLPipeDB Utilities which >>>>> are where? I don't see any in the basic distribution or are they not >>>>> standalone and called from the command line? >>>>> > > > > > >>>>> > > > > > Thanks! >>>>> > > > > > >>>>> > > > > > Richard >>>>> > > > > >>>>> > > > > >>>>> > > > > <ATT00001..txt><ATT00002..txt> >>>>> > > > >>>>> > > > >>>>> > > > >>>>> ------------------------------------------------------------------------------ >>>>> > > > The ultimate all-in-one performance toolkit: Intel(R) Parallel >>>>> Studio XE: >>>>> > > > Pinpoint memory and threading errors before they happen. >>>>> > > > Find and fix more than 250 security defects in the development >>>>> cycle. >>>>> > > > Locate bottlenecks in serial and parallel code that limit >>>>> performance. >>>>> > > > <http://p.sf.net/sfu/intel-dev2devfeb> >>>>> http://p.sf.net/sfu/intel-dev2devfeb >>>>> > > > _______________________________________________ >>>>> > > > xmlpipedb-developer mailing list >>>>> > > > <xml...@li...> >>>>> xml...@li... >>>>> > > > >>>>> <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer> >>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >>>>> > > > >>>>> > > > <ATT00001..txt><ATT00002..txt> >>>>> > > >>>>> > > >>>>> > > >>>>> ------------------------------------------------------------------------------ >>>>> > > The ultimate all-in-one performance toolkit: Intel(R) Parallel >>>>> Studio XE: >>>>> > > Pinpoint memory and threading errors before they happen. >>>>> > > Find and fix more than 250 security defects in the development >>>>> cycle. >>>>> > > Locate bottlenecks in serial and parallel code that limit >>>>> performance. >>>>> > > <http://p.sf.net/sfu/intel-dev2devfeb> >>>>> http://p.sf.net/sfu/intel-dev2devfeb >>>>> > > _______________________________________________ >>>>> > > xmlpipedb-developer mailing list >>>>> > > <xml...@li...> >>>>> xml...@li... >>>>> > > <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer> >>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >>>>> > > >>>>> > > <ATT00001..txt><ATT00002..txt> >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------------ >>>>> > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio >>>>> XE: >>>>> > Pinpoint memory and threading errors before they happen. >>>>> > Find and fix more than 250 security defects in the development cycle. >>>>> > Locate bottlenecks in serial and parallel code that limit >>>>> performance. >>>>> > <http://p.sf.net/sfu/intel-dev2devfeb> >>>>> http://p.sf.net/sfu/intel-dev2devfeb >>>>> > _______________________________________________ >>>>> > xmlpipedb-developer mailing list >>>>> > <xml...@li...> >>>>> xml...@li... >>>>> > <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer> >>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >>>>> > >>>>> > >>>>> > >>>>> > <ATT00001..txt><ATT00002..txt> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio >>>>> XE: >>>>> Pinpoint memory and threading errors before they happen. >>>>> Find and fix more than 250 security defects in the development cycle. >>>>> Locate bottlenecks in serial and parallel code that limit performance. >>>>> <http://p.sf.net/sfu/intel-dev2devfeb> >>>>> http://p.sf.net/sfu/intel-dev2devfeb >>>>> _______________________________________________ >>>>> xmlpipedb-developer mailing list >>>>> <xml...@li...> >>>>> xml...@li... >>>>> <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer> >>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk >>> Collect, index and harness all the fast moving IT data generated by your >>> applications, servers and devices whether physical, virtual or in the >>> cloud. >>> Deliver compliance at lower cost and gain new business insights. >>> Free Software Download: <http://p.sf.net/sfu/splunk-dev2dev> >>> http://p.sf.net/sfu/splunk-dev2dev >>> >>> _______________________________________________ >>> xmlpipedb-developer mailing list >>> xml...@li... >>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk >>> Collect, index and harness all the fast moving IT data generated by your >>> applications, servers and devices whether physical, virtual or in the >>> cloud. >>> Deliver compliance at lower cost and gain new business insights. >>> Free Software Download: http://p.sf.net/sfu/splunk-dev2dev >>> _______________________________________________ >>> xmlpipedb-developer mailing list >>> xml...@li... >>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >>> >>> >> >> ------------------------------------------------------------------------------ >> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk >> Collect, index and harness all the fast moving IT data generated by your >> applications, servers and devices whether physical, virtual or in the >> cloud. >> Deliver compliance at lower cost and gain new business insights. >> Free Software Download: http://p.sf.net/sfu/splunk-dev2dev >> >> _______________________________________________ >> xmlpipedb-developer mailing list >> xml...@li... >> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> >> >> >> ------------------------------------------------------------------------------ >> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk >> Collect, index and harness all the fast moving IT data generated by your >> applications, servers and devices whether physical, virtual or in the >> cloud. >> Deliver compliance at lower cost and gain new business insights. >> Free Software Download: http://p.sf.net/sfu/splunk-dev2dev >> _______________________________________________ >> xmlpipedb-developer mailing list >> xml...@li... >> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer >> >> > |