Re: [XMLPipeDB-developer] 499 - PROBLEM - M tuberculosis xml tag importation
Brought to you by:
kdahlquist,
zugzugglug
|
From: Richard B. <rbr...@gm...> - 2011-02-22 06:01:09
|
hmm not taking parenthesis where I thought they should go... syntax error
select count (*) from genenametype where type = ('ordered locus' or 'ORF')
and value like 'Rv%';
also tried
select count (*) from genenametype where (type = 'ordered locus' or type =
'ORF') and value like 'Rv%';
On Mon, Feb 21, 2011 at 9:40 PM, Richard Brous <rbr...@gm...> wrote:
> ah yes... i see it...
>
>
> On Mon, Feb 21, 2011 at 9:33 PM, John David N. Dionisio <do...@lm...>wrote:
>
>> Watch your parentheses: "and" has greater precedence than "or" :)
>>
>>
>> John David N. Dionisio, PhD
>> Associate Professor, Computer Science
>> Loyola Marymount University
>>
>>
>> On Feb 21, 2011, at 7:59 PM, Richard Brous <rbr...@gm...> wrote:
>>
>> OK, so here are my query results from raw SQL:
>>
>> 1. using: like 'Rv%'
>>
>> select count (*) from genenametype where type = 'ordered locus' and value
>> like 'Rv%';
>> returns 3988
>>
>> select count (*) from genenametype where type = 'ORF' and value like
>> 'Rv%';
>> returns 70
>>
>> select count (*) from genenametype where type = 'ordered locus' or type =
>> 'ORF' and value like 'Rv%';
>> returns 7011
>>
>> 2. regular expression : value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*'
>>
>> select count (*) from genenametype where type = 'ordered locus' and value
>> ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>> returns 3988
>>
>> select count (*) from genenametype where type = 'ordered locus' or type =
>> 'ORF' and value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>> returns 7011
>>
>> select count (*) from genenametype where type = 'ORF' and value ~
>> '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>> returns 70
>>
>> Conclusions:
>>
>> 1. It seems that querying for type = 'ORF' alone surfaces the 69 genes
>> were were looking for plus one more (maybe the count for missing genes is
>> off by 1?).
>>
>> 2. Combining the two types in a single query did not produce the results
>> that I expected (7011? - how did that happen????) so this is likely not our
>> solution... unless of course the query syntax isn't actually doing what I
>> think it is...
>>
>> 3. I would think the best course of action is to serialy run two separate
>> queries to capture all the required genes, then removing the one unneeded
>> gene if its truly not wanted.
>>
>> What do you think?
>>
>> Richard
>>
>>
>> On Mon, Feb 21, 2011 at 5:17 PM, John David N. Dionisio <do...@lm...>wrote:
>>
>>> I don't recall the exact details of the missing 69, but if your query
>>> successfully returns them in raw SQL, then this is worth a try. You can
>>> integrate into the same query as long as the same columns are returned,
>>> which is the case here AFAIK, so go ahead and extend the existing query.
>>>
>>>
>>> John David N. Dionisio, PhD
>>> Associate Professor, Computer Science
>>> Loyola Marymount University
>>>
>>> On Feb 21, 2011, at 6:56 PM, Richard Brous <rbr...@gm...> wrote:
>>>
>>> So here is the appropriate code snippet from
>>> MycobacteriumTuberculosisUniProtSpeciesProfile.java:
>>>
>>> *
>>>
>>> public
>>> *TableManager getSystemTableManagerCustomizations(TableManager
>>> tableManager, TableManager primarySystemTableManager, Date version) *
>>> throws* SQLException, InvalidParameterException {
>>>
>>> // Build the base query; we only use "ordered locus" and we only want
>>>
>>> // IDs that begin with "*Rv*."
>>>
>>> PreparedStatement ps = ConnectionManager.*getRelationalDBConnection*
>>> ().prepareStatement(
>>> "SELECT value, type " +
>>>
>>> "FROM genenametype INNER JOIN entrytype_genetype " +
>>>
>>> "ON (entrytype_genetype_name_hjid = entrytype_genetype.hjid) " +
>>>
>>> "WHERE type = 'ordered locus' and value like 'Rv%' and
>>> entrytype_gene_hjid = ?");
>>>
>>> ResultSet result;
>>>
>>> *for* (Row row : primarySystemTableManager.getRows()) {
>>>
>>> ps.setInt(1, Integer.*parseInt*(row.getValue(
>>> "UID")));
>>>
>>> result = ps.executeQuery();
>>>
>>> // We actually want to keep the case where multiple ordered locus
>>>
>>> // names appear.
>>>
>>> *while* (result.next()) {
>>>
>>> // We want this name to appear in the OrderedLocusNames
>>>
>>> // system table.
>>>
>>> *for* (String id : result.getString("value").split("/")) {
>>>
>>> tableManager.submit(
>>> "OrderedLocusNames", QueryType.*insert*, *new* String[][] { { "ID", id
>>> }, { "Species", "|" + getSpeciesName() + "|" }, { "\"Date\"",
>>> GenMAPPBuilderUtilities.*getSystemsDateString*(version) }, { "UID",
>>> row.getValue("UID") } });
>>>
>>> }
>>>
>>> }
>>>
>>> }
>>>
>>>
>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> So now we want to build the base query which uses "ordered locus" and
>>> "orf" and we only want IDs that begin with "Rv".
>>>
>>> I know there are more comprehensive ways to search for gene ID's
>>> by matching gene ID prefix but "like Rv%" seemed to work thus far, we just
>>> need to tell it to search for XML tag type orf in addition to ordered locus.
>>>
>>> "WHERE type = 'ordered locus' and type = 'orf' and value like 'Rv%' and
>>> entrytype_gene_hjid = ? "
>>>
>>> Here is a stab at it.... This part of our class was right as the server
>>> went down and my submission for week 6 assignment I can't seem to find.
>>>
>>> Is it possible to have two different types in the same query or should we
>>> rewrite a separate query for the orf tag?
>>>
>>> Richard
>>>
>>>
>>>
>>> On Sun, Feb 20, 2011 at 10:21 PM, Richard Brous < <rbr...@gm...>
>>> rbr...@gm...> wrote:
>>>
>>>
>>>> thanks and will do as directed.
>>>>
>>>> My previous, last paragraph comment - A way for programming code in
>>>> email holding its format in a mail message similarly to how you can post
>>>> code on forum pages?
>>>>
>>>> <code>
>>>> blah
>>>> blah
>>>> blah
>>>> </code>
>>>>
>>>> thanks!
>>>>
>>>> Richard
>>>>
>>>> On Sun, Feb 20, 2011 at 10:05 PM, John David N. Dionisio <<do...@lm...>
>>>> do...@lm...> wrote:
>>>>
>>>>
>>>>> Greetings,
>>>>>
>>>>> Actually, gmbuilder.properties is for the TallyEngine only. When
>>>>> dealing with .gdb exports, look *only* at the SpeciesProfile class. So, to
>>>>> find those 69 IDs, it is the SpeciesProfile code, and *only* the
>>>>> SpeciesProfile code, that needs to be changed.
>>>>>
>>>>> Your take on how gmbuilder.properties is used, however, is
>>>>> understandable. It makes sense to assume that the TallyEngine code *and*
>>>>> the ID export code are based on the same characterization of the needed IDs.
>>>>> This replication is originally a historical artifact: SpeciesProfile was
>>>>> done first, and then TallyEngine was done later by another student.
>>>>>
>>>>> However, there are other factors beyond history that sort of
>>>>> necessitate this duplication of desired IDs: (skip the two bullets below if
>>>>> you'd rather cut to the chase of the work to be done, and discuss design
>>>>> issues later)
>>>>>
>>>>> - The actual XML import code is a black box: this is the "canned" JAXB
>>>>> library actually in action, and not our code at all. Plus, the XML import
>>>>> code really does not filter (nor should it), since the goal of the
>>>>> XML->relational database step is to fully capture the XML data in the
>>>>> relational database. So, XML count is necessarily separated from XML
>>>>> import.
>>>>>
>>>>> - The notion of a declarative mechanism for extracting IDs from the
>>>>> relational database (which is what gmbuilder.properties/TallyEngine uses) is
>>>>> interesting, but at the same time there is value in the arbitrary
>>>>> computation that can be done with Java (case in point: export two versions
>>>>> of an ID, with and without periods). This is not to say that it is
>>>>> impossible to do this declaratively, but let's just say that the procedural
>>>>> approach exists here and now, and a declarative approach will need more
>>>>> thought.
>>>>>
>>>>> These, and other factors, are good thoughts to hold onto and would be
>>>>> worthy of a good meeting discussion sometime, but bottom line for now:
>>>>> modifying the export behavior is a matter of editing the *SpeciesProfile*
>>>>> Java code, and not the gmbuilder.properties file. Turn your attention to
>>>>> that code.
>>>>>
>>>>> Now, as to annotating your code...I'd just put in code comments :) Or
>>>>> did you mean something else by tagging code in e-mail?
>>>>>
>>>>> John David N. Dionisio, PhD
>>>>> Associate Professor, Computer Science
>>>>> Loyola Marymount University
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Feb 21, 2011, at 12:38 AM, Richard Brous wrote:
>>>>>
>>>>> > also, how do I tag code in email so it holds its formatting? I tried
>>>>> a few suggestions I found on the web but they aren't holding formatting or
>>>>> i'm just doing it wrong ;-D
>>>>> >
>>>>> > Richard
>>>>> >
>>>>> > On Sun, Feb 20, 2011 at 9:35 PM, Richard Brous <<rbr...@gm...>
>>>>> rbr...@gm...> wrote:
>>>>> > OK, have some updates and some suggestions:
>>>>> >
>>>>> > On Friday Dr. Dahlquist and I sat down and reviewed the gene testing
>>>>> report. We verified that XML match does indeed find 4066 unique matches - 75
>>>>> of which are not in the gdb and need to be.
>>>>> >
>>>>> > Dr. Dahlquist informed me that she was the one who completed the gene
>>>>> db testing report, not a previous student of BIO367 and had already verified
>>>>> which genes were missing and where they were to be found. I had (mistakenly)
>>>>> assumed that since a student had performed the gene database testing I had
>>>>> to redo all of the verification.
>>>>> >
>>>>> > So that said, of the 75 genes missing - 69 need to be included and 6
>>>>> excluded.
>>>>> > Per the gene db testing report: "69 of them have an "a", "b", or "d"
>>>>> suffix. They are all found in the ORF tag and need to be included in the
>>>>> gdb."
>>>>> >
>>>>> > To solve this we need to add additional search criteria into the M.
>>>>> tuberculosis section in gmbuilder.properties below:
>>>>> > # Mycobacterium tuberculosis
>>>>> >
>>>>> > mycobacteriumtuberculosis_level_amount=
>>>>> >
>>>>> > 1
>>>>> >
>>>>> > mycobacteriumtuberculosis_element_level0=
>>>>> >
>>>>> > uniprot/entry/gene/name&type&ordered locus
>>>>> >
>>>>> > mycobacteriumtuberculosis_query_level0=
>>>>> >
>>>>> > select count(*) from genenametype where type = 'ordered locus' and
>>>>> value like 'Rv%';
>>>>> >
>>>>> > mycobacteriumtuberculosis_table_name_level0=
>>>>> >
>>>>> > Ordered Locus
>>>>> > SOLUTIONS:
>>>>> >
>>>>> > 1. So am i correct in my understanding that the second line is the
>>>>> query used by TallyEngine to read the XML file? If so then this is the issue
>>>>> we need to table for the moment until we get the gbd verified and
>>>>> re-released. We will revisit this to discover why it is not only reporting
>>>>> incorrectly but also why its added a second row of Ordered Locus on the
>>>>> TallyEngine results page.
>>>>> >
>>>>> > 2. The third line is the SQL query used by postgres during the export
>>>>> from XML to gdb. To find and get the ORF tagged genes could we not add the
>>>>> following lines and change the count in the first line:
>>>>> >
>>>>> >
>>>>> > # Mycobacterium tuberculosis
>>>>> >
>>>>> > mycobacteriumtuberculosis_level_amount=2
>>>>> >
>>>>> >
>>>>> >
>>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered
>>>>> locus
>>>>> >
>>>>> mycobacteriumtuberculosis_element_level1=uniprot/entry/gene/name&type&orf
>>>>> >
>>>>> >
>>>>> > mycobacteriumtuberculosis_query_level0=
>>>>> >
>>>>> > select count(*) from genenametype where type = 'ordered locus';
>>>>> > mycobacteriumtuberculosis_query_level1=select count(*) from
>>>>> genenametype where type = 'orf';
>>>>> >
>>>>> >
>>>>> > mycobacteriumtuberculosis_table_name_level0=
>>>>> >
>>>>> > Ordered Locus
>>>>> > mycobacteriumtuberculosis_table_name_level1=Ordered Locus
>>>>> >
>>>>> >
>>>>> ----------------------------------------------------------------------------------------------------------------------------
>>>>> >
>>>>> > Of course these queries would have be manually verified prior to
>>>>> making these changes but this seems like we are moving in the right
>>>>> direction.
>>>>> >
>>>>> > Richard
>>>>> >
>>>>> >
>>>>> > On Thu, Feb 17, 2011 at 7:47 PM, Richard Brous <<rbr...@gm...>
>>>>> rbr...@gm...> wrote:
>>>>> > Just got done reading previous email and understand the change in
>>>>> priority.
>>>>> >
>>>>> > Will work on the missing ID's for now and shelve the the TalleyEngine
>>>>> issue for the moment.
>>>>> >
>>>>> > Also great about a more formalized weekly meeting. I was going to
>>>>> suggest it myself as it has been slow going so far as maybe i'm a bit too
>>>>> independent in this independent study class =D
>>>>> >
>>>>> > Will dig further into the missing ID's later tonight and during day
>>>>> tomorrow and report back.
>>>>> >
>>>>> > Richard
>>>>> >
>>>>> > On Thu, Feb 17, 2011 at 4:34 PM, John David N. Dionisio <<do...@lm...>
>>>>> do...@lm...> wrote:
>>>>> > Hi Rich,
>>>>> >
>>>>> > No problem. The pertinent line you're referring to, for XML, is
>>>>> this, right above the line you copied:
>>>>> >
>>>>> >
>>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered
>>>>> locus
>>>>> >
>>>>> > The slash-separated section is the "path" of XML tags leading to the
>>>>> element of interest; then, after the ampersand, is a name/value pair for the
>>>>> desired attribute to count. Note that there is no hint of a *content*-based
>>>>> filter (nor is there the capability for one, as far as I can tell in the
>>>>> code). By "content," I mean that we can't specify filters based on what's
>>>>> *between* the tags. We can only go as far as filter by attribute value,
>>>>> e.g., type="ordered locus".
>>>>> >
>>>>> > But anyway, as mentioned in the earlier e-mail, let's have the
>>>>> missing IDs in the .gdb take precedence for now. Please take a look at the
>>>>> tuberculosis, A. thaliana, and P. falciparum profiles to get an idea for how
>>>>> the ID output can be customized, then let me know if you have any questions
>>>>> or need to confirm anything.
>>>>> >
>>>>> > John David N. Dionisio, PhD
>>>>> > Associate Professor, Computer Science
>>>>> > Loyola Marymount University
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Feb 17, 2011, at 3:04 PM, Richard Brous wrote:
>>>>> >
>>>>> > > Sorry been slammed with a programming assignment that kept needing
>>>>> continued iteration and it has been all consuming until last night. But I
>>>>> did get a chance to work with your comments and review the code again with a
>>>>> different mind set.
>>>>> > >
>>>>> > > Yes, I examined the gmbuilder.properties file ( the query is also
>>>>> in the MycobacteriumTuberculosisUniProtSpeciesProfile which I mentioned in a
>>>>> previous email ) but I don't think I see what you mean regarding the XML
>>>>> count.
>>>>> > >
>>>>> > > I understood that: mycobacteriumtuberculosis_query_level0=select
>>>>> count(*) from genenametype where type = 'ordered locus' and value like
>>>>> 'Rv%'; was the db query but don't see which is the XML count... or do they
>>>>> share the same query and you are saying that XML count doesn't recognize and
>>>>> therefore cannot use the 'Rv%' parameter?
>>>>> > >
>>>>> > > Richard
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Sat, Feb 12, 2011 at 11:46 PM, John David N. Dionisio <<do...@lm...>
>>>>> do...@lm...> wrote:
>>>>> > > Hi Rich,
>>>>> > >
>>>>> > > Sorry for the delay. Had some distractions coming into the
>>>>> weekend.
>>>>> > >
>>>>> > > You've looked at the code; have you looked at gmbuilder.properties?
>>>>> (I may have mentioned it a few e-mails ago, just as you were starting to
>>>>> dig into this)
>>>>> > >
>>>>> > > On the copy I have, the M. tuberculosis block looks like this
>>>>> (indentation is mine to set it apart):
>>>>> > >
>>>>> > > # Mycobacterium tuberculosis
>>>>> > > mycobacteriumtuberculosis_level_amount=1
>>>>> > >
>>>>> > >
>>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered
>>>>> locus
>>>>> > >
>>>>> > > mycobacteriumtuberculosis_query_level0=select count(*) from
>>>>> genenametype where type = 'ordered locus' and value like 'Rv%';
>>>>> > >
>>>>> > > mycobacteriumtuberculosis_table_name_level0=Ordered Locus
>>>>> > >
>>>>> > > There, I think, is the rub. Notice that the XML count does not
>>>>> filter on RV%. The SQL query does.
>>>>> > >
>>>>> > > Unfortunately, I don't think the TallyEngine can include selective
>>>>> filtering in the XML counts. If the need to do selective filtering on XML
>>>>> is necessary, then I think we're looking at a new functionality for you to
>>>>> implement (or, if this throws things off too much, this may have to be noted
>>>>> somewhere, that the XML vs. database counts may be off because the database
>>>>> count is doing some text-based filtering but the XML count does not).
>>>>> > >
>>>>> > > What does xmlpipedb-match say? That will at least tell you whether
>>>>> the 'RV%' count is indeed correct.
>>>>> > >
>>>>> > > John David N. Dionisio, PhD
>>>>> > > Associate Professor, Computer Science
>>>>> > > Loyola Marymount University
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Feb 11, 2011, at 4:52 PM, Richard Brous wrote:
>>>>> > >
>>>>> > > > OK here is what I was able to put together from the past few
>>>>> hours of code review:
>>>>> > > >
>>>>> > > > MycobacteriumTuberculosisUniProtSpeciesProfile.java:
>>>>> > > > -reveals that after the 2 System table modifications are made
>>>>> adding species name and link, a PreparedStatement is instantiated which
>>>>> builds and calls the base query.
>>>>> > > >
>>>>> > > > -The base query called is: ("SELECT value, type " + "FROM
>>>>> genenametype INNER JOIN entrytype_genetype " +
>>>>> "ON(entrytype_genetype_name_hjid = entrytype_genetype.hjid) " + "WHERE type
>>>>> = 'ordered locus' and value like 'Rv%' and entrytype_gene_hjid = ?")
>>>>> > > >
>>>>> > > > -So its looking in 'ordered locus' table/column for any tuple
>>>>> that starts with Rv (followed by any substring) and entrytype_gene_hjid = ?
>>>>> .
>>>>> > > > The 'like' comparator and % usage are clear with the 'type'
>>>>> entrytype_gene_hjid = ?
>>>>> > > >
>>>>> > > > -To me it seems the query makes sense so the problem is likely
>>>>> elsewhere.
>>>>> > > >
>>>>> > > > GenMappBuilder.java:
>>>>> > > > -I found method doTallies() at code line 895 which:
>>>>> > > > Instantiates a Configuration called hibernateConfiguration and
>>>>> assigns to it the current hibernate configuration
>>>>> > > > Validates database settings by analyzing hibernateConfiguration
>>>>> > > > Instantiates a CriterionList for uniprot and assigns to it
>>>>> TallyType.UNIPROT
>>>>> > > > Instantiates a CriterionList for go and assigns to it
>>>>> TallyType.GO
>>>>> > > > Determines if both xml files exist
>>>>> > > > Then getTallyResultsXML and getTallyResultsDatabase are run on
>>>>> both xml files and their respective CriterionList
>>>>> > > > Results are then formatted for display in a table.
>>>>> > > >
>>>>> > > > -So enum TallyType which means that they are the only valid
>>>>> datatypes which TallyEngine accepts... go to know ...
>>>>> > > >
>>>>> > > > -Based on the screen shot of Tally Engine it would seem that both
>>>>> getTallyResultsXML() and getTallyResultsDatabase() are incorrectly
>>>>> returning. Likely due to both using an incorrect query (as we previously
>>>>> supposed). But where are the queries?... the more I dig the more I think
>>>>> they are in the criterial all the work is done against.
>>>>> > > >
>>>>> > > > continuing the review:
>>>>> > > > getTallyResultsXML() calls Tally Engine instance method
>>>>> getXmlFileCounts(xmlFile)
>>>>> > > > getTallyResultsDatabase() calls Tally Engine instance method
>>>>> getDbcounts(new QueryEngine(hibernateConfiguration)
>>>>> > > > Both of these instanced methods originate from
>>>>> TallyEngine.java...
>>>>> > > >
>>>>> > > > TallyEngine.java:
>>>>> > > >
>>>>> > > > getXmlFileCounts() calls digestXmlFile() which instantiates a
>>>>> digester then processes against criteria... but this quickly becomes
>>>>> confusing and is hard to follow
>>>>> > > >
>>>>> > > > getDbcounts() then starts a db session and executes a query but
>>>>> then I also get a bit lost with my limited db knowledge.
>>>>> > > >
>>>>> > > >
>>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> > > >
>>>>> > > > OVERALL I think I'm getting closer to the issues but I still feel
>>>>> as if I'm missing some understanding to proceed further. Can you pass along
>>>>> some of that Dondi insight and steer me in the right direction? =D
>>>>> > > >
>>>>> > > > -DB Tally - Not having taken databases yet certainly is limiting
>>>>> my ability determine where the "criteria" are being set and how they are
>>>>> followed during session activities. Also is the query we have been looking
>>>>> for this whole time in the criteria or someplace else?
>>>>> > > >
>>>>> > > > -XML Tally - again is the query contained within the criteria
>>>>> that digestXmlFile() uses to parse?
>>>>> > > >
>>>>> > > > Richard
>>>>> > > >
>>>>> > > >
>>>>> > > > On Mon, Feb 7, 2011 at 5:50 PM, John David N. Dionisio <<do...@lm...>
>>>>> do...@lm...> wrote:
>>>>> > > > Right, schema issues are unlikely. Most count discrepancies like
>>>>> this that I've seen have boiled down to forming the right query. Then,
>>>>> knowing the right query (in both XML and SQL), it's a matter of making sure
>>>>> that TallyEngine asks that same query.
>>>>> > > >
>>>>> > > > John David N. Dionisio, PhD
>>>>> > > > Associate Professor, Computer Science
>>>>> > > > Loyola Marymount University
>>>>> > > >
>>>>> > > >
>>>>> > > > On Feb 7, 2011, at 5:48 PM, Richard Brous wrote:
>>>>> > > >
>>>>> > > > > OK, so based on your approach:
>>>>> > > > >
>>>>> > > > > 1. I'll start with reviewing the queries for xmlpipedb-match
>>>>> and sql queries needed for the respective results as you requested.
>>>>> > > > >
>>>>> > > > > I was also thinking I may need to review the schema from xml
>>>>> into postgres but the issue isn't likely a schema error. The error most
>>>>> likely lies in how xmlpipedbutils queries the data from xml source and
>>>>> writes to the tables what it returns?
>>>>> > > > >
>>>>> > > > > 2. I'll review the code: trace the entrance of tally engine in
>>>>> the gmbuilder code then follow it through the xmlpipedbutils.
>>>>> > > > >
>>>>> > > > > Richard
>>>>> > > > >
>>>>> > > > > On Sat, Feb 5, 2011 at 10:28 AM, John David N. Dionisio <<do...@lm...>
>>>>> do...@lm...> wrote:
>>>>> > > > > Just wanted to confirm (since I wasn't sure in the first
>>>>> e-mail) --- the XMLPipeDB Utilities source code is in trunk/xmlpipedbutils
>>>>> in SourceForge's Subversion repo.
>>>>> > > > >
>>>>> > > > > John David N. Dionisio, PhD
>>>>> > > > > Associate Professor, Computer Science
>>>>> > > > > Loyola Marymount University
>>>>> > > > >
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > On Feb 5, 2011, at 10:02 AM, Richard Brous wrote:
>>>>> > > > >
>>>>> > > > > > Hi Dondi,
>>>>> > > > > >
>>>>> > > > > > So I'm at the point in working with M tuberculosis that I was
>>>>> able to exactly reproduce Dr. Dahlquist's problematic TallyEngine results.
>>>>> > > > > >
>>>>> > > > > > gmb2b60 Results
>>>>> > > > > >
>>>>> > > > > >
>>>>> > > > > >
>>>>> > > > > > Now the proverbial question - What next to solve the Ordered
>>>>> Locus import/count issue?
>>>>> > > > > >
>>>>> > > > > > **********************************************
>>>>> > > > > > Here is my thought process:
>>>>> > > > > >
>>>>> > > > > > Step 1: How does the import process work at the high level?
>>>>> (obviously correct me if I'm wrong)
>>>>> > > > > >
>>>>> > > > > > I believe that basically as each XML tag is read, it is
>>>>> placed in the proper Postgres table(s) based on some criteria. There is also
>>>>> likely some sort of check that each individual tag is in valid XML format
>>>>> unless we don't care at this stage (care at export) or maybe the parser just
>>>>> skips over and goes on to the next .
>>>>> > > > > >
>>>>> > > > > > Step 2: What could be the problem?
>>>>> > > > > >
>>>>> > > > > > Either -
>>>>> > > > > > a. XML tags are being parsed incorrectly (ignored/skipped)?
>>>>> > > > > > b. Decision criteria of which table they should be added to?
>>>>> > > > > >
>>>>> > > > > > **********************************************
>>>>> > > > > >
>>>>> > > > > > I read on the sourceforge wiki:
>>>>> > > > > >
>>>>> > > > > > XMLPipeDB has a modular architecture with three components
>>>>> that may be used separately or together. XSD-to-DB reads an XSD (XML Schema
>>>>> Definition) and automatically generates an SQL schema, Java classes, and
>>>>> Hibernate mappings. XMLPipeDB Utilities provides functionality for
>>>>> configuring the database, importing data, and performing queries. GenMAPP
>>>>> Builder is based on the XMLPipeDB Utilities and exports GenMAPP-compatible
>>>>> Gene Databases based on data from UniProt and Gene Ontology (GO).
>>>>> > > > > >
>>>>> > > > > > So I should probably start with the XMLPipeDB Utilities which
>>>>> are where? I don't see any in the basic distribution or are they not
>>>>> standalone and called from the command line?
>>>>> > > > > >
>>>>> > > > > > Thanks!
>>>>> > > > > >
>>>>> > > > > > Richard
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > <ATT00001..txt><ATT00002..txt>
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> ------------------------------------------------------------------------------
>>>>> > > > The ultimate all-in-one performance toolkit: Intel(R) Parallel
>>>>> Studio XE:
>>>>> > > > Pinpoint memory and threading errors before they happen.
>>>>> > > > Find and fix more than 250 security defects in the development
>>>>> cycle.
>>>>> > > > Locate bottlenecks in serial and parallel code that limit
>>>>> performance.
>>>>> > > > <http://p.sf.net/sfu/intel-dev2devfeb>
>>>>> http://p.sf.net/sfu/intel-dev2devfeb
>>>>> > > > _______________________________________________
>>>>> > > > xmlpipedb-developer mailing list
>>>>> > > > <xml...@li...>
>>>>> xml...@li...
>>>>> > > >
>>>>> <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer>
>>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>>>> > > >
>>>>> > > > <ATT00001..txt><ATT00002..txt>
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> ------------------------------------------------------------------------------
>>>>> > > The ultimate all-in-one performance toolkit: Intel(R) Parallel
>>>>> Studio XE:
>>>>> > > Pinpoint memory and threading errors before they happen.
>>>>> > > Find and fix more than 250 security defects in the development
>>>>> cycle.
>>>>> > > Locate bottlenecks in serial and parallel code that limit
>>>>> performance.
>>>>> > > <http://p.sf.net/sfu/intel-dev2devfeb>
>>>>> http://p.sf.net/sfu/intel-dev2devfeb
>>>>> > > _______________________________________________
>>>>> > > xmlpipedb-developer mailing list
>>>>> > > <xml...@li...>
>>>>> xml...@li...
>>>>> > > <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer>
>>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>>>> > >
>>>>> > > <ATT00001..txt><ATT00002..txt>
>>>>> >
>>>>> >
>>>>> >
>>>>> ------------------------------------------------------------------------------
>>>>> > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio
>>>>> XE:
>>>>> > Pinpoint memory and threading errors before they happen.
>>>>> > Find and fix more than 250 security defects in the development cycle.
>>>>> > Locate bottlenecks in serial and parallel code that limit
>>>>> performance.
>>>>> > <http://p.sf.net/sfu/intel-dev2devfeb>
>>>>> http://p.sf.net/sfu/intel-dev2devfeb
>>>>> > _______________________________________________
>>>>> > xmlpipedb-developer mailing list
>>>>> > <xml...@li...>
>>>>> xml...@li...
>>>>> > <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer>
>>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>>>> >
>>>>> >
>>>>> >
>>>>> > <ATT00001..txt><ATT00002..txt>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio
>>>>> XE:
>>>>> Pinpoint memory and threading errors before they happen.
>>>>> Find and fix more than 250 security defects in the development cycle.
>>>>> Locate bottlenecks in serial and parallel code that limit performance.
>>>>> <http://p.sf.net/sfu/intel-dev2devfeb>
>>>>> http://p.sf.net/sfu/intel-dev2devfeb
>>>>> _______________________________________________
>>>>> xmlpipedb-developer mailing list
>>>>> <xml...@li...>
>>>>> xml...@li...
>>>>> <https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer>
>>>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk
>>> Collect, index and harness all the fast moving IT data generated by your
>>> applications, servers and devices whether physical, virtual or in the
>>> cloud.
>>> Deliver compliance at lower cost and gain new business insights.
>>> Free Software Download: <http://p.sf.net/sfu/splunk-dev2dev>
>>> http://p.sf.net/sfu/splunk-dev2dev
>>>
>>> _______________________________________________
>>> xmlpipedb-developer mailing list
>>> xml...@li...
>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk
>>> Collect, index and harness all the fast moving IT data generated by your
>>> applications, servers and devices whether physical, virtual or in the
>>> cloud.
>>> Deliver compliance at lower cost and gain new business insights.
>>> Free Software Download: http://p.sf.net/sfu/splunk-dev2dev
>>> _______________________________________________
>>> xmlpipedb-developer mailing list
>>> xml...@li...
>>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk
>> Collect, index and harness all the fast moving IT data generated by your
>> applications, servers and devices whether physical, virtual or in the
>> cloud.
>> Deliver compliance at lower cost and gain new business insights.
>> Free Software Download: http://p.sf.net/sfu/splunk-dev2dev
>>
>> _______________________________________________
>> xmlpipedb-developer mailing list
>> xml...@li...
>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Index, Search & Analyze Logs and other IT data in Real-Time with Splunk
>> Collect, index and harness all the fast moving IT data generated by your
>> applications, servers and devices whether physical, virtual or in the
>> cloud.
>> Deliver compliance at lower cost and gain new business insights.
>> Free Software Download: http://p.sf.net/sfu/splunk-dev2dev
>> _______________________________________________
>> xmlpipedb-developer mailing list
>> xml...@li...
>> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>>
>>
>
|