Re: [XMLPipeDB-developer] 499 - PROBLEM - M tuberculosis xml tag importation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

*OK here is what I was able to put together from the past few hours of code
review:*

MycobacteriumTuberculosisUniProtSpeciesProfile.java:
-reveals that after the 2 System table modifications are made adding species
name and link, a PreparedStatement is instantiated which builds and calls
the base query.

-The base query called is: ("SELECT value, type " + "FROM genenametype INNER
JOIN entrytype_genetype " + "ON(entrytype_genetype_name_hjid =
entrytype_genetype.hjid) " + "WHERE type = 'ordered locus' and value like
'Rv%' and entrytype_gene_hjid = ?")

-So its looking in 'ordered locus' table/column for any tuple that starts
with Rv (followed by any substring) and entrytype_gene_hjid = ? .
The 'like' comparator and % usage are clear with the 'type'
entrytype_gene_hjid = ?

-To me it seems the query makes sense so the problem is likely elsewhere.

GenMappBuilder.java:
-I found method doTallies() at code line 895 which:
Instantiates a Configuration called hibernateConfiguration and assigns to it
the current hibernate configuration
Validates database settings by analyzing hibernateConfiguration
Instantiates a CriterionList for uniprot and assigns to it TallyType.UNIPROT
Instantiates a CriterionList for go and assigns to it TallyType.GO
Determines if both xml files exist
Then getTallyResultsXML and getTallyResultsDatabase are run on both xml
files and their respective CriterionList
Results are then formatted for display in a table.

-So enum TallyType which means that they are the only valid datatypes which
TallyEngine accepts... go to know ...

-Based on the screen shot of Tally Engine it would seem that
both getTallyResultsXML() and getTallyResultsDatabase() are incorrectly
returning. Likely due to both using an incorrect query (as we previously
supposed). But where are the queries?... the more I dig the more I think
they are in the criterial all the work is done against.

continuing the review:
getTallyResultsXML() calls Tally Engine instance method
getXmlFileCounts(xmlFile)
getTallyResultsDatabase() calls Tally Engine instance method getDbcounts(*
new* QueryEngine(hibernateConfiguration)
Both of these instanced methods originate from TallyEngine.java...

TallyEngine.java:

getXmlFileCounts() calls digestXmlFile() which instantiates a digester then
processes against criteria... but this quickly becomes confusing and is hard
to follow

getDbcounts() then starts a db session and executes a query but then I also
get a bit lost with my limited db knowledge.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

OVERALL I think I'm getting closer to the issues but I still feel as if I'm
missing some understanding to proceed further. Can you pass along some of
that Dondi insight and steer me in the right direction? =D

-DB Tally - Not having taken databases yet certainly is limiting my ability
determine where the "criteria" are being set and how they are followed
during session activities. Also is the query we have been looking for this
whole time in the criteria or someplace else?

-XML Tally - again is the query contained within the criteria that
digestXmlFile() uses to parse?

Richard

On Mon, Feb 7, 2011 at 5:50 PM, John David N. Dionisio <do...@lm...>wrote:

> Right, schema issues are unlikely.  Most count discrepancies like this that
> I've seen have boiled down to forming the right query.  Then, knowing the
> right query (in both XML and SQL), it's a matter of making sure that
> TallyEngine asks that same query.
>
> John David N. Dionisio, PhD
> Associate Professor, Computer Science
> Loyola Marymount University
>
>
>  On Feb 7, 2011, at 5:48 PM, Richard Brous wrote:
>
> > OK, so based on your approach:
> >
> > 1. I'll start with reviewing the queries for xmlpipedb-match and sql
> queries needed for the respective results as you requested.
> >
> > I was also thinking I may need to review the schema from xml into
> postgres but the issue isn't likely a schema error. The error most likely
> lies in how xmlpipedbutils queries the data from xml source and writes to
> the tables what it returns?
> >
> > 2. I'll review the code: trace the entrance of tally engine in the
> gmbuilder code then follow it through the xmlpipedbutils.
> >
> > Richard
> >
> > On Sat, Feb 5, 2011 at 10:28 AM, John David N. Dionisio <do...@lm...>
> wrote:
> > Just wanted to confirm (since I wasn't sure in the first e-mail) --- the
> XMLPipeDB Utilities source code is in trunk/xmlpipedbutils in SourceForge's
> Subversion repo.
> >
> > John David N. Dionisio, PhD
> > Associate Professor, Computer Science
> > Loyola Marymount University
> >
> >
> >
> > On Feb 5, 2011, at 10:02 AM, Richard Brous wrote:
> >
> > > Hi Dondi,
> > >
> > > So I'm at the point in working with M tuberculosis that I was able to
> exactly reproduce Dr. Dahlquist's problematic TallyEngine results.
> > >
> > > gmb2b60 Results
> > >
> > >
> > >
> > > Now the proverbial question - What next to solve the Ordered Locus
> import/count issue?
> > >
> > > **********************************************
> > > Here is my thought process:
> > >
> > > Step 1: How does the import process work at the high level? (obviously
> correct me if I'm wrong)
> > >
> > > I believe that basically as each XML tag is read, it is placed in the
> proper Postgres table(s) based on some criteria. There is also likely some
> sort of check that each individual tag is in valid XML format unless we
> don't care at this stage (care at export) or maybe the parser just skips
> over and goes on to the next .
> > >
> > > Step 2: What could be the problem?
> > >
> > > Either -
> > > a. XML tags are being parsed incorrectly (ignored/skipped)?
> > > b. Decision criteria of which table they should be added to?
> > >
> > > **********************************************
> > >
> > > I read on the sourceforge wiki:
> > >
> > > XMLPipeDB has a modular architecture with three components that may be
> used separately or together. XSD-to-DB reads an XSD (XML Schema Definition)
> and automatically generates an SQL schema, Java classes, and Hibernate
> mappings. XMLPipeDB Utilities provides functionality for configuring the
> database, importing data, and performing queries. GenMAPP Builder is based
> on the XMLPipeDB Utilities and exports GenMAPP-compatible Gene Databases
> based on data from UniProt and Gene Ontology (GO).
> > >
> > > So I should probably start with the XMLPipeDB Utilities which are
> where? I don't see any in the basic distribution or are they not standalone
> and called from the command line?
> > >
> > > Thanks!
> > >
> > > Richard
> >
> >
> > <ATT00001..txt><ATT00002..txt>
>
>
>
> ------------------------------------------------------------------------------
> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> Pinpoint memory and threading errors before they happen.
> Find and fix more than 250 security defects in the development cycle.
> Locate bottlenecks in serial and parallel code that limit performance.
> http://p.sf.net/sfu/intel-dev2devfeb
> _______________________________________________
> xmlpipedb-developer mailing list
> xml...@li...
> https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>