Re: [XMLPipeDB-developer] 499 - PROBLEM - M tuberculosis xml tag importation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

That sounds good to me.

Best,
Dr. D

At 11:16 AM 2/23/2011, you wrote:
>Hmm... possible that the method to split up records has been 
>globally implemented for export from Postgres?
>
>Since the records are automatically splitting when exported from 
>postgres to gdb file, all we need to do is exclude Rv3346/55c 'ORF' 
>from "select count (*) from genenametype where (type = 'ordered 
>locus' or type = 'ORF') and value ~ 
>'[Rr][Vv][0-9][0-9][0-9][0-9]*';" and we should be set.
>
>That seem correct Dondi?
>
>Richard
>
>On Wed, Feb 23, 2011 at 11:04 AM, Kam Dahlquist 
><<mailto:kda...@lm...>kda...@lm...> wrote:
>Hi,
>
>I checked these out.  They all should be separated and kept as 
>separate OrderedLocusNames records.  In fact, the gdb already has 
>them separated into individual records, so it is only the tally for 
>Postgres that is off, they were correctly exported into the gdb.
>
>Best,
>Dr. D
>
>
>At 07:31 PM 2/22/2011, Richard Brous wrote:
>>OK I found the culprits:
>>
>>There are 3 ordered locus gene ID's that have slashes. As was 
>>previously discussed it seems XML Match is counting each one twice 
>>which would inflate the count by 3.
>>
>>Rv2561/Rv2562       'ordered locus'
>>Rv2880c/Rv2879c   'ordered locus'
>>Rv3021c/Rv3022c   'ordered locus'
>>
>>And as was previously noted, Rv3346/55c   'ORF' should be excluded.
>>
>>This will bring the both XML and sql db queries counts in sync at 4057.
>>
>>By my math:
>>------------------------------------------------------------------------------------------------------------
>>XML tags read by Match:
>>4066 unique matches - 6 genes excluded by Dr. D - 3 duplicates 
>>caused by slashes = total 4057 genes
>>------------------------------------------------------------------------------------------------------------
>>sql query of the db tables:
>>3988 ordered locus + 69 ORF = total 4057 genes
>>------------------------------------------------------------------------------------------------------------ 
>>
>>
>>TO MOVE FORWARD:
>>
>>1. We need to decide how to address the slash ID's (should we keep 
>>and split into separate tuples or should they be excluded)
>>
>>2. Adjust the queries to reflect what is needed and test with raw sql.
>>
>>3. Update the queries on MycobacteriumTuberculosisUniProtSpeciesProfile.java
>>
>>4. Export a new Mtb gene database for testing.
>>
>>
>>Richard
>>
>>
>>
>>On Tue, Feb 22, 2011 at 5:12 PM, Kam Dahlquist 
>><<mailto:kda...@lm...>kda...@lm...> wrote:
>>Hi,
>>I don't know why the numbers are off.  I think the only way to find 
>>out is to get the results of the Postgres query and line it up next 
>>to the match results and see what is different.  If you can send me 
>>the lists of IDs from the Postgres query and the match results, I 
>>can check them.
>>Best,
>>Dr. D
>>
>>At 04:57 PM 2/22/2011, you wrote:
>>>Here is where I am on the numbers:
>>>------------------------------------------------------------------------------------------------------------ 
>>>
>>>XML tags read by Match:
>>>4066 unique matches - 6 genes excluded by Dr. D = total 4060 genes
>>>------------------------------------------------------------------------------------------------------------ 
>>>
>>>sql query of the db tables:
>>>3988 ordered locus + 69 ORF = total 4057 genes
>>>------------------------------------------------------------------------------------------------------------ 
>>>
>>>So by my count we are off by 3 genes
>>>
>>>Maybe XML Match is counting slashed genes as 2 separate genes?
>>>
>>>IE. so if it encountered 3 slashed genes it would in effect be 
>>>counting those 3 as 6?
>>>
>>>Richard
>>>On Tue, Feb 22, 2011 at 3:06 PM, Kam Dahlquist 
>>><<mailto:kda...@lm...>kda...@lm...> wrote:
>>>Hi,
>>>I compared this list of gene IDs with the list I had on the 
>>>Testing Report on the wiki and found the following.
>>>Your list has 70 IDs.  69 of them are identical to what is in my 
>>>list of IDs in the Testing report.
>>>Indeed, the only difference is the one with the 
>>>slash.  "Rv3346/55c"  I looked this ID up at UniProt.org as 
>>>Rv3346.  It appears in the record for UniProt ID O50384.  For that 
>>>UniProt record, the ID referred to in the OrderedLocus tag is 
>>>Rv3355c which does appear in the gdb already.  Looking up Rv3346 
>>>on Tuberculist and the Stanford TB database, Rv3346 is not a real 
>>>gene ID.  So this entire record with the slash in it needs to be 
>>>excluded and all the rest of them need to be captured.
>>>I think then, that the numbers should add up correctly, is this true?
>>>Best,
>>>Dr. D
>>>At 01:09 PM 2/22/2011, Richard Brous wrote:
>>>>attachement now included...
>>>>On Tue, Feb 22, 2011 at 1:09 PM, Richard Brous 
>>>><<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>>Here is an export of the genes found using: select * from 
>>>>genenametype where type = 'ORF' and value ~ 
>>>>'[Rr][Vv][0-9][0-9][0-9][0-9]*'; and also attached as a csv file.
>>>>
>>>>647412|"org.uniprot.uniprot.GeneNameType"|0|"Rv1990A"|"ORF"|""|647409|2
>>>>5297|"org.uniprot.uniprot.GeneNameType"|0|"Rv2922A"|"ORF"|""|5292|4
>>>>647553|"org.uniprot.uniprot.GeneNameType"|0|"Rv1638A"|"ORF"|""|647550|2
>>>>647679|"org.uniprot.uniprot.GeneNameType"|0|"Rv1507A"|"ORF"|""|647676|2
>>>>647804|"org.uniprot.uniprot.GeneNameType"|0|"Rv1498A"|"ORF"|""|647801|2
>>>>647944|"org.uniprot.uniprot.GeneNameType"|0|"Rv1489A"|"ORF"|""|647941|2
>>>>211818|"org.uniprot.uniprot.GeneNameType"|0|"Rv0979A"|"ORF"|""|211814|3
>>>>648210|"org.uniprot.uniprot.GeneNameType"|0|"Rv1473A"|"ORF"|""|648207|2
>>>>648340|"org.uniprot.uniprot.GeneNameType"|0|"Rv1322A"|"ORF"|""|648337|2
>>>>648488|"org.uniprot.uniprot.GeneNameType"|0|"Rv1135A"|"ORF"|""|648485|2
>>>>648637|"org.uniprot.uniprot.GeneNameType"|0|"Rv1116A"|"ORF"|""|648634|2
>>>>648762|"org.uniprot.uniprot.GeneNameType"|0|"Rv1087A"|"ORF"|""|648759|2
>>>>649177|"org.uniprot.uniprot.GeneNameType"|0|"Rv0787A"|"ORF"|""|649174|2
>>>>649334|"org.uniprot.uniprot.GeneNameType"|0|"Rv0749A"|"ORF"|""|649331|2
>>>>649472|"org.uniprot.uniprot.GeneNameType"|0|"Rv0590A"|"ORF"|""|649469|2
>>>>649899|"org.uniprot.uniprot.GeneNameType"|0|"Rv0470A"|"ORF"|""|649896|2
>>>>650295|"org.uniprot.uniprot.GeneNameType"|0|"Rv0078A"|"ORF"|""|650292|2
>>>>174122|"org.uniprot.uniprot.GeneNameType"|0|"Rv1159A"|"ORF"|""|174119|2
>>>>174307|"org.uniprot.uniprot.GeneNameType"|0|"Rv3312A"|"ORF"|""|174303|3
>>>>312550|"org.uniprot.uniprot.GeneNameType"|0|"Rv0236A"|"ORF"|""|312547|2
>>>>331661|"org.uniprot.uniprot.GeneNameType"|0|"Rv3198A"|"ORF"|""|331658|2
>>>>445836|"org.uniprot.uniprot.GeneNameType"|0|"Rv3346/55c"|"ORF"|""|445833|2
>>>>621649|"org.uniprot.uniprot.GeneNameType"|0|"Rv3395A"|"ORF"|""|621647|1
>>>>622466|"org.uniprot.uniprot.GeneNameType"|0|"Rv3224B"|"ORF"|""|622464|1
>>>>622558|"org.uniprot.uniprot.GeneNameType"|0|"Rv3224A"|"ORF"|""|622556|1
>>>>622739|"org.uniprot.uniprot.GeneNameType"|0|"Rv3208A"|"ORF"|""|622736|2
>>>>622824|"org.uniprot.uniprot.GeneNameType"|0|"Rv3197A"|"ORF"|""|622821|2
>>>>623397|"org.uniprot.uniprot.GeneNameType"|0|"Rv3022A"|"ORF"|""|623394|2
>>>>623597|"org.uniprot.uniprot.GeneNameType"|0|"Rv3018A"|"ORF"|""|623594|2
>>>>623682|"org.uniprot.uniprot.GeneNameType"|0|"Rv2998A"|"ORF"|""|623680|1
>>>>623787|"org.uniprot.uniprot.GeneNameType"|0|"Rv2943A"|"ORF"|""|623785|1
>>>>624282|"org.uniprot.uniprot.GeneNameType"|0|"Rv0492A"|"ORF"|""|624280|1
>>>>624460|"org.uniprot.uniprot.GeneNameType"|0|"Rv0456A"|"ORF"|""|624458|1
>>>>625679|"org.uniprot.uniprot.GeneNameType"|0|"Rv3724B"|"ORF"|""|625676|2
>>>>625774|"org.uniprot.uniprot.GeneNameType"|0|"Rv3724A"|"ORF"|""|625771|2
>>>>626169|"org.uniprot.uniprot.GeneNameType"|0|"Rv2737A"|"ORF"|""|626167|1
>>>>626355|"org.uniprot.uniprot.GeneNameType"|0|"Rv2614A"|"ORF"|""|626353|1
>>>>626652|"org.uniprot.uniprot.GeneNameType"|0|"Rv2438A"|"ORF"|""|626650|1
>>>>626910|"org.uniprot.uniprot.GeneNameType"|0|"Rv2401A"|"ORF"|""|626908|1
>>>>627340|"org.uniprot.uniprot.GeneNameType"|0|"Rv2331A"|"ORF"|""|627338|1
>>>>627418|"org.uniprot.uniprot.GeneNameType"|0|"Rv2307B"|"ORF"|""|627416|1
>>>>627496|"org.uniprot.uniprot.GeneNameType"|0|"Rv2306B"|"ORF"|""|627494|1
>>>>627579|"org.uniprot.uniprot.GeneNameType"|0|"Rv2306A"|"ORF"|""|627577|1
>>>>627657|"org.uniprot.uniprot.GeneNameType"|0|"Rv2250A"|"ORF"|""|627655|1
>>>>627736|"org.uniprot.uniprot.GeneNameType"|0|"Rv2219A"|"ORF"|""|627734|1
>>>>627827|"org.uniprot.uniprot.GeneNameType"|0|"Rv2160A"|"ORF"|""|627825|1
>>>>628290|"org.uniprot.uniprot.GeneNameType"|0|"Rv1888A"|"ORF"|""|628288|1
>>>>629063|"org.uniprot.uniprot.GeneNameType"|0|"Rv1765A"|"ORF"|""|629061|1
>>>>629159|"org.uniprot.uniprot.GeneNameType"|0|"Rv1706A"|"ORF"|""|629157|1
>>>>629325|"org.uniprot.uniprot.GeneNameType"|0|"Rv1508A"|"ORF"|""|629323|1
>>>>630084|"org.uniprot.uniprot.GeneNameType"|0|"Rv1290A"|"ORF"|""|630082|1
>>>>630597|"org.uniprot.uniprot.GeneNameType"|0|"Rv1089A"|"ORF"|""|630594|2
>>>>631025|"org.uniprot.uniprot.GeneNameType"|0|"Rv1028A"|"ORF"|""|631022|2
>>>>632207|"org.uniprot.uniprot.GeneNameType"|0|"Rv0755A"|"ORF"|""|632205|1
>>>>632630|"org.uniprot.uniprot.GeneNameType"|0|"Rv0724A"|"ORF"|""|632628|1
>>>>633088|"org.uniprot.uniprot.GeneNameType"|0|"Rv0609A"|"ORF"|""|633086|1
>>>>633363|"org.uniprot.uniprot.GeneNameType"|0|"Rv0192A"|"ORF"|""|633361|1
>>>>645287|"org.uniprot.uniprot.GeneNameType"|0|"Rv3770B"|"ORF"|""|645284|2
>>>>645415|"org.uniprot.uniprot.GeneNameType"|0|"Rv3770A"|"ORF"|""|645412|2
>>>>645542|"org.uniprot.uniprot.GeneNameType"|0|"Rv3705A"|"ORF"|""|645539|2
>>>>645680|"org.uniprot.uniprot.GeneNameType"|0|"Rv3678A"|"ORF"|""|645677|2
>>>>645817|"org.uniprot.uniprot.GeneNameType"|0|"Rv3566A"|"ORF"|""|645814|2
>>>>646080|"org.uniprot.uniprot.GeneNameType"|0|"Rv3221A"|"ORF"|""|646077|2
>>>>646212|"org.uniprot.uniprot.GeneNameType"|0|"Rv3196A"|"ORF"|""|646209|2
>>>>646486|"org.uniprot.uniprot.GeneNameType"|0|"Rv2601A"|"ORF"|""|646483|2
>>>>646630|"org.uniprot.uniprot.GeneNameType"|0|"Rv2530A"|"ORF"|""|646627|2
>>>>646767|"org.uniprot.uniprot.GeneNameType"|0|"Rv2309A"|"ORF"|""|646764|2
>>>>646892|"org.uniprot.uniprot.GeneNameType"|0|"Rv2307D"|"ORF"|""|646889|2
>>>>647019|"org.uniprot.uniprot.GeneNameType"|0|"Rv2307A"|"ORF"|""|647016|2
>>>>647144|"org.uniprot.uniprot.GeneNameType"|0|"Rv2077A"|"ORF"|""|647141|2
>>>>*****The item of note I see is the gene with the slash separating 
>>>>gene id's which refer to the same gene.
>>>>
>>>>Richard
>>>>
>>>>On Mon, Feb 21, 2011 at 11:11 PM, Richard Brous 
>>>><<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>>Understood.
>>>>
>>>>I'll check in with Dr. D in the afternoon tomorrow and discuss.
>>>>
>>>>Richard
>>>>
>>>>On Mon, Feb 21, 2011 at 11:06 PM, John David N. Dionisio 
>>>><<mailto:do...@lm...>do...@lm...> wrote:
>>>>Hi Rich,
>>>>Addressing the release business first, let's put it this way: if 
>>>>the remaining loose ends can be addressed by tomorrow, we can 
>>>>probably wait until then.  If unexpected snags are encountered, 
>>>>then it would be worthwhile to release whatever you have.
>>>>With that in mind, considering that you pretty much know the 
>>>>patterns of the IDs that are needed, I think it will only take a 
>>>>little digital forensic work now to figure out exactly which IDs 
>>>>are still needed.  Once you know what those are, you should:
>>>>1. Find where they are in the XML file.
>>>>2. Knowing the XML location, find the corresponding table in the 
>>>>relational database (table names are generally derived from 
>>>>tag/element names).
>>>>3. Knowing the table in the database, write or extend the 
>>>>SpeciesProfile query to retrieve that data.
>>>>For the ID that must *not* be included, again it's a matter of 
>>>>tracking down what this ID is.  Knowing this straggler, you can 
>>>>then consult with Dr. Dahlquist if this ID is truly a unique 
>>>>one-off, or is representative of a pattern that we'll want to 
>>>>exclude.  Either way, this ID can be omitted by using "not" or 
>>>>"<>" or possibly "not like" or "not ~" (check PostgreSQL where 
>>>>clause syntax to see where the negation can be applied).
>>>>John David N. Dionisio, PhD
>>>>Associate Professor, Computer Science
>>>>Loyola Marymount University
>>>>On Feb 22, 2011, at 1:37 AM, Richard Brous wrote:
>>>> > actually i had a typo (emailing from desktop system but 
>>>> testing on my laptop... typed correctly here but wrong in 
>>>> pgadmin) but the results make much more sense now.
>>>> >
>>>> >
>>>> > select count (*) from genenametype where (type = 'ordered 
>>>> locus' or type = 'ORF') and value like 'Rv%';
>>>> > returns 4058
>>>> >
>>>> > select count (*) from genenametype where (type = 'ordered 
>>>> locus' or type = 'ORF') and value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>>>> > returns 4058
>>>> >
>>>> > 
>>>> -------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>> >
>>>> >
>>>> > Continuing forward -
>>>> >
>>>> > The testing report says that 4066 unique matches exist in XML 
>>>> but 6 of them were eliminated by Dr. D leaving the desired number at 4060.
>>>> >
>>>> > So now we are only 2 genes short with the query returning 
>>>> 4058... which is also (conveniently) the sum of the two separate 
>>>> queries of 'ordered locus' and 'ORF' respectively.
>>>> >
>>>> >  But recall that Dr. D said that only 69 genes of the missing 
>>>> 75 were tagged 'ORF' but we seem to have 1 extra gene tagged 
>>>> 'ORF' than we expected. Adding that into missing genes puts us 3 short...
>>>> >
>>>> > Should I make the changes to the code and export a gdb so that 
>>>> analysis can be done or wait until we work this through further?
>>>> >
>>>> > Richard
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Feb 21, 2011 at 10:04 PM, John David N. Dionisio 
>>>> <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > Hi Rich,
>>>> >
>>>> > The second form should have worked actually.  What exactly was 
>>>> the error?
>>>> >
>>>> > John David N. Dionisio, PhD
>>>> > Associate Professor, Computer Science
>>>> > Loyola Marymount University
>>>> >
>>>> >
>>>> >
>>>> > On Feb 22, 2011, at 1:01 AM, Richard Brous wrote:
>>>> >
>>>> > > hmm not taking parenthesis where I thought they should go... 
>>>> syntax error
>>>> > >
>>>> > > select count (*) from genenametype where type = ('ordered 
>>>> locus' or  'ORF') and value like 'Rv%';
>>>> > > also tried
>>>> > > select count (*) from genenametype where (type = 'ordered 
>>>> locus' or type = 'ORF') and value like 'Rv%';
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Feb 21, 2011 at 9:40 PM, Richard Brous 
>>>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>> > > ah yes... i see it...
>>>> > >
>>>> > >
>>>> > > On Mon, Feb 21, 2011 at 9:33 PM, John David N. Dionisio 
>>>> <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > > Watch your parentheses: "and" has greater precedence than "or" :)
>>>> > >
>>>> > >
>>>> > > John David N. Dionisio, PhD
>>>> > > Associate Professor, Computer Science
>>>> > > Loyola Marymount University
>>>> > >
>>>> > >
>>>> > > On Feb 21, 2011, at 7:59 PM, Richard Brous 
>>>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>> > >
>>>> > >> OK, so here are my query results from raw SQL:
>>>> > >>
>>>> > >> 1. using: like 'Rv%'
>>>> > >>
>>>> > >> select count (*) from genenametype where type = 'ordered 
>>>> locus' and value like 'Rv%';
>>>> > >> returns 3988
>>>> > >>
>>>> > >> select count (*) from genenametype where type = 'ORF' and 
>>>> value like 'Rv%';
>>>> > >> returns 70
>>>> > >>
>>>> > >> select count (*) from genenametype where type = 'ordered 
>>>> locus' or type = 'ORF' and value like 'Rv%';
>>>> > >> returns 7011
>>>> > >>
>>>> > >> 2. regular expression : value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*'
>>>> > >>
>>>> > >> select count (*) from genenametype where type = 'ordered 
>>>> locus' and value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>>>> > >> returns 3988
>>>> > >>
>>>> > >> select count (*) from genenametype where type = 'ordered 
>>>> locus' or type = 'ORF' and value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>>>> > >> returns 7011
>>>> > >>
>>>> > >> select count (*) from genenametype where type = 'ORF' and 
>>>> value ~ '[Rr][Vv][0-9][0-9][0-9][0-9]*';
>>>> > >> returns 70
>>>> > >>
>>>> > >> Conclusions:
>>>> > >>
>>>> > >> 1. It seems that querying for type = 'ORF' alone surfaces 
>>>> the 69 genes were were looking for plus one more (maybe the 
>>>> count for missing genes is off by 1?).
>>>> > >>
>>>> > >> 2. Combining the two types in a single query did not 
>>>> produce the results that I expected (7011? - how did that 
>>>> happen????) so this is likely not our solution... unless of 
>>>> course the query syntax isn't actually doing what I think it is...
>>>> > >>
>>>> > >> 3. I would think the best course of action is to serialy 
>>>> run two separate queries to capture all the required genes, then 
>>>> removing the one unneeded gene if its truly not wanted.
>>>> > >>
>>>> > >> What do you think?
>>>> > >>
>>>> > >> Richard
>>>> > >>
>>>> > >>
>>>> > >> On Mon, Feb 21, 2011 at 5:17 PM, John David N. Dionisio 
>>>> <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > >> I don't recall the exact details of the missing 69, but if 
>>>> your query successfully returns them in raw SQL, then this is 
>>>> worth a try.  You can integrate into the same query as long as 
>>>> the same columns are returned, which is the case here AFAIK, so 
>>>> go ahead and extend the existing query.
>>>> > >>
>>>> > >>
>>>> > >> John David N. Dionisio, PhD
>>>> > >> Associate Professor, Computer Science
>>>> > >> Loyola Marymount University
>>>> > >>
>>>> > >> On Feb 21, 2011, at 6:56 PM, Richard Brous 
>>>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>> > >>
>>>> > >>> So here is the appropriate code snippet from 
>>>> MycobacteriumTuberculosisUniProtSpeciesProfile.java:
>>>> > >>> public
>>>> > >>>
>>>> > >>> TableManager 
>>>> getSystemTableManagerCustomizations(TableManager tableManager, 
>>>> TableManager primarySystemTableManager, Date version) throws 
>>>> SQLException, InvalidParameterException {
>>>> > >>>
>>>> > >>> // Build the base query; we only use "ordered locus" and 
>>>> we only want
>>>> > >>>
>>>> > >>> // IDs that begin with "Rv."
>>>> > >>> PreparedStatement ps = 
>>>> ConnectionManager.getRelationalDBConnection().prepareStatement(
>>>> > >>>
>>>> > >>> "SELECT value, type " +
>>>> > >>>
>>>> > >>> "FROM genenametype INNER JOIN entrytype_genetype " +
>>>> > >>>
>>>> > >>> "ON (entrytype_genetype_name_hjid = entrytype_genetype.hjid) " +
>>>> > >>>
>>>> > >>> "WHERE type = 'ordered locus' and value like 'Rv%' and 
>>>> entrytype_gene_hjid = ?");
>>>> > >>> ResultSet result;
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> for (Row row : primarySystemTableManager.getRows()) {
>>>> > >>> ps.setInt(1, Integer.parseInt(row.getValue(
>>>> > >>>
>>>> > >>> "UID")));
>>>> > >>> result = ps.executeQuery();
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> // We actually want to keep the case where multiple ordered locus
>>>> > >>>
>>>> > >>> // names appear.
>>>> > >>>
>>>> > >>> while (result.next()) {
>>>> > >>>
>>>> > >>> // We want this name to appear in the OrderedLocusNames
>>>> > >>>
>>>> > >>> // system table.
>>>> > >>>
>>>> > >>> for (String id : result.getString("value").split("/")) {
>>>> > >>> tableManager.submit(
>>>> > >>>
>>>> > >>> "OrderedLocusNames", QueryType.insert, new String[][] { { 
>>>> "ID", id }, { "Species", "|" + getSpeciesName() + "|" }, { 
>>>> "\"Date\"", 
>>>> GenMAPPBuilderUtilities.getSystemsDateString(version) }, { 
>>>> "UID", row.getValue("UID") } });
>>>> > >>> }
>>>> > >>>
>>>> > >>> }
>>>> > >>>
>>>> > >>> }
>>>> > >>>
>>>> > >>> 
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>> > >>> So now we want to build the base query which uses "ordered 
>>>> locus" and "orf" and we only want IDs that begin with "Rv".
>>>> > >>>
>>>> > >>> I know there are more comprehensive ways to search for 
>>>> gene ID's by matching gene ID prefix but "like Rv%" seemed to 
>>>> work thus far, we just need to tell it to search for XML tag 
>>>> type orf in addition to ordered locus.
>>>> > >>>
>>>> > >>> "WHERE type = 'ordered locus' and type = 'orf' and value 
>>>> like 'Rv%' and entrytype_gene_hjid = ? "
>>>> > >>>
>>>> > >>> Here is a stab at it.... This part of our class was right 
>>>> as the server went down and my submission for week 6 assignment 
>>>> I can't seem to find.
>>>> > >>>
>>>> > >>> Is it possible to have two different types in the same 
>>>> query or should we rewrite a separate query for the orf tag?
>>>> > >>>
>>>> > >>> Richard
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> On Sun, Feb 20, 2011 at 10:21 PM, Richard Brous 
>>>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>> > >>>
>>>> > >>> thanks and will do as directed.
>>>> > >>>
>>>> > >>> My previous, last paragraph comment - A way for 
>>>> programming code in email holding its format in a mail message 
>>>> similarly to how you can post code on forum pages?
>>>> > >>>
>>>> > >>> <code>
>>>> > >>> blah
>>>> > >>> blah
>>>> > >>> blah
>>>> > >>> </code>
>>>> > >>>
>>>> > >>> thanks!
>>>> > >>>
>>>> > >>> Richard
>>>> > >>>
>>>> > >>> On Sun, Feb 20, 2011 at 10:05 PM, John David N. Dionisio 
>>>> <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > >>>
>>>> > >>> Greetings,
>>>> > >>>
>>>> > >>> Actually, gmbuilder.properties is for the TallyEngine 
>>>> only.  When dealing with .gdb exports, look *only* at the 
>>>> SpeciesProfile class.  So, to find those 69 IDs, it is the 
>>>> SpeciesProfile code, and *only* the SpeciesProfile code, that 
>>>> needs to be changed.
>>>> > >>>
>>>> > >>> Your take on how gmbuilder.properties is used, however, is 
>>>> understandable.  It makes sense to assume that the TallyEngine 
>>>> code *and* the ID export code are based on the same 
>>>> characterization of the needed IDs.  This replication is 
>>>> originally a historical artifact: SpeciesProfile was done first, 
>>>> and then TallyEngine was done later by another student.
>>>> > >>>
>>>> > >>> However, there are other factors beyond history that sort 
>>>> of necessitate this duplication of desired IDs: (skip the two 
>>>> bullets below if you'd rather cut to the chase of the work to be 
>>>> done, and discuss design issues later)
>>>> > >>>
>>>> > >>> - The actual XML import code is a black box: this is the 
>>>> "canned" JAXB library actually in action, and not our code at 
>>>> all.  Plus, the XML import code really does not filter (nor 
>>>> should it), since the goal of the XML->relational database step 
>>>> is to fully capture the XML data in the relational 
>>>> database.  So, XML count is necessarily separated from XML import.
>>>> > >>>
>>>> > >>> - The notion of a declarative mechanism for extracting IDs 
>>>> from the relational database (which is what 
>>>> gmbuilder.properties/TallyEngine uses) is interesting, but at 
>>>> the same time there is value in the arbitrary computation that 
>>>> can be done with Java (case in point: export two versions of an 
>>>> ID, with and without periods).  This is not to say that it is 
>>>> impossible to do this declaratively, but let's just say that the 
>>>> procedural approach exists here and now, and a declarative 
>>>> approach will need more thought.
>>>> > >>>
>>>> > >>> These, and other factors, are good thoughts to hold onto 
>>>> and would be worthy of a good meeting discussion sometime, but 
>>>> bottom line for now: modifying the export behavior is a matter 
>>>> of editing the *SpeciesProfile* Java code, and not the 
>>>> gmbuilder.properties file.  Turn your attention to that code.
>>>> > >>>
>>>> > >>> Now, as to annotating your code...I'd just put in code 
>>>> comments  :)  Or did you mean something else by tagging code in e-mail?
>>>> > >>>
>>>> > >>> John David N. Dionisio, PhD
>>>> > >>> Associate Professor, Computer Science
>>>> > >>> Loyola Marymount University
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> On Feb 21, 2011, at 12:38 AM, Richard Brous wrote:
>>>> > >>>
>>>> > >>> > also, how do I tag code in email so it holds its 
>>>> formatting? I tried a few suggestions I found on the web but 
>>>> they aren't holding formatting or i'm just doing it wrong ;-D
>>>> > >>> >
>>>> > >>> > Richard
>>>> > >>> >
>>>> > >>> > On Sun, Feb 20, 2011 at 9:35 PM, Richard Brous 
>>>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>> > >>> > OK, have some updates and some suggestions:
>>>> > >>> >
>>>> > >>> > On Friday Dr. Dahlquist and I sat down and reviewed the 
>>>> gene testing report. We verified that XML match does indeed find 
>>>> 4066 unique matches - 75 of which are not in the gdb and need to be.
>>>> > >>> >
>>>> > >>> > Dr. Dahlquist informed me that she was the one who 
>>>> completed the gene db testing report, not a previous student of 
>>>> BIO367 and had already verified which genes were missing and 
>>>> where they were to be found. I had (mistakenly) assumed that 
>>>> since a student had performed the gene database testing I had to 
>>>> redo all of the verification.
>>>> > >>> >
>>>> > >>> > So that said, of the 75 genes missing - 69 need to be 
>>>> included and 6 excluded.
>>>> > >>> > Per the gene db testing report: "69 of them have an "a", 
>>>> "b", or "d" suffix. They are all found in the ORF tag and need 
>>>> to be included in the gdb."
>>>> > >>> >
>>>> > >>> > To solve this we need to add additional search criteria 
>>>> into the M. tuberculosis section in gmbuilder.properties below:
>>>> > >>> > # Mycobacterium tuberculosis
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_level_amount=
>>>> > >>> >
>>>> > >>> > 1
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_element_level0=
>>>> > >>> >
>>>> > >>> > uniprot/entry/gene/name&type&ordered locus
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_query_level0=
>>>> > >>> >
>>>> > >>> > select count(*) from genenametype where type = 'ordered 
>>>> locus' and value like 'Rv%';
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_table_name_level0=
>>>> > >>> >
>>>> > >>> > Ordered Locus
>>>> > >>> > SOLUTIONS:
>>>> > >>> >
>>>> > >>> > 1. So am i correct in my understanding that the second 
>>>> line is the query used by TallyEngine to read the XML file? If 
>>>> so then this is the issue we need to table for the moment until 
>>>> we get the gbd verified and re-released. We will revisit this to 
>>>> discover why it is not only reporting incorrectly but also why 
>>>> its added a second row of Ordered Locus on the TallyEngine results page.
>>>> > >>> >
>>>> > >>> > 2. The third line is the SQL query used by postgres 
>>>> during the export from XML to gdb. To find and get the ORF 
>>>> tagged genes could we not add the following lines and change the 
>>>> count in the first line:
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > # Mycobacterium tuberculosis
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_level_amount=2
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > 
>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered 
>>>> locus
>>>> > >>> > 
>>>> mycobacteriumtuberculosis_element_level1=uniprot/entry/gene/name&type&orf
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_query_level0=
>>>> > >>> >
>>>> > >>> > select count(*) from genenametype where type = 'ordered locus';
>>>> > >>> > mycobacteriumtuberculosis_query_level1=select count(*) 
>>>> from genenametype where type = 'orf';
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > mycobacteriumtuberculosis_table_name_level0=
>>>> > >>> >
>>>> > >>> > Ordered Locus
>>>> > >>> > mycobacteriumtuberculosis_table_name_level1=Ordered Locus
>>>> > >>> >
>>>> > >>> > 
>>>> ---------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>> > >>> >
>>>> > >>> > Of course these queries would have be manually verified 
>>>> prior to making these changes but this seems like we are moving 
>>>> in the right direction.
>>>> > >>> >
>>>> > >>> > Richard
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > On Thu, Feb 17, 2011 at 7:47 PM, Richard Brous 
>>>> <<mailto:rbr...@gm...>rbr...@gm...> wrote:
>>>> > >>> > Just got done reading previous email and understand the 
>>>> change in priority.
>>>> > >>> >
>>>> > >>> > Will work on the missing ID's for now and shelve the the 
>>>> TalleyEngine issue for the moment.
>>>> > >>> >
>>>> > >>> > Also great about a more formalized weekly meeting. I was 
>>>> going to suggest it myself as it has been slow going so far as 
>>>> maybe i'm a bit too independent in this independent study class =D
>>>> > >>> >
>>>> > >>> > Will dig further into the missing ID's later tonight and 
>>>> during day tomorrow and report back.
>>>> > >>> >
>>>> > >>> > Richard
>>>> > >>> >
>>>> > >>> > On Thu, Feb 17, 2011 at 4:34 PM, John David N. Dionisio 
>>>> <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > >>> > Hi Rich,
>>>> > >>> >
>>>> > >>> > No problem.  The pertinent line you're referring to, for 
>>>> XML, is this, right above the line you copied:
>>>> > >>> >
>>>> > >>> > 
>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered 
>>>> locus
>>>> > >>> >
>>>> > >>> > The slash-separated section is the "path" of XML tags 
>>>> leading to the element of interest; then, after the ampersand, 
>>>> is a name/value pair for the desired attribute to count.  Note 
>>>> that there is no hint of a *content*-based filter (nor is there 
>>>> the capability for one, as far as I can tell in the code).  By 
>>>> "content," I mean that we can't specify filters based on what's 
>>>> *between* the tags.  We can only go as far as filter by 
>>>> attribute value, e.g., type="ordered locus".
>>>> > >>> >
>>>> > >>> > But anyway, as mentioned in the earlier e-mail, let's 
>>>> have the missing IDs in the .gdb take precedence for 
>>>> now.  Please take a look at the tuberculosis, A. thaliana, and 
>>>> P. falciparum profiles to get an idea for how the ID output can 
>>>> be customized, then let me know if you have any questions or 
>>>> need to confirm anything.
>>>> > >>> >
>>>> > >>> > John David N. Dionisio, PhD
>>>> > >>> > Associate Professor, Computer Science
>>>> > >>> > Loyola Marymount University
>>>> > >>> >
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > On Feb 17, 2011, at 3:04 PM, Richard Brous wrote:
>>>> > >>> >
>>>> > >>> > > Sorry been slammed with a programming assignment that 
>>>> kept needing continued iteration and it has been all consuming 
>>>> until last night. But I did get a chance to work with your 
>>>> comments and review the code again with a different mind set.
>>>> > >>> > >
>>>> > >>> > > Yes, I examined the gmbuilder.properties file ( the 
>>>> query is also in the 
>>>> MycobacteriumTuberculosisUniProtSpeciesProfile which I mentioned 
>>>> in a previous email ) but I don't think I see what you mean 
>>>> regarding the XML count.
>>>> > >>> > >
>>>> > >>> > > I understood that: 
>>>> mycobacteriumtuberculosis_query_level0=select count(*) from 
>>>> genenametype where type = 'ordered locus' and value like 
>>>> 'Rv%';  was the db query but don't see which is the XML count... 
>>>> or do they share the same query and you are saying that XML 
>>>> count doesn't recognize and therefore cannot use the 'Rv%' parameter?
>>>> > >>> > >
>>>> > >>> > > Richard
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > > On Sat, Feb 12, 2011 at 11:46 PM, John David N. 
>>>> Dionisio <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > >>> > > Hi Rich,
>>>> > >>> > >
>>>> > >>> > > Sorry for the delay.  Had some distractions coming 
>>>> into the weekend.
>>>> > >>> > >
>>>> > >>> > > You've looked at the code; have you looked at 
>>>> gmbuilder.properties?  (I may have mentioned it a few e-mails 
>>>> ago, just as you were starting to dig into this)
>>>> > >>> > >
>>>> > >>> > > On the copy I have, the M. tuberculosis block looks 
>>>> like this (indentation is mine to set it apart):
>>>> > >>> > >
>>>> > >>> > >        # Mycobacterium tuberculosis
>>>> > >>> > >        mycobacteriumtuberculosis_level_amount=1
>>>> > >>> > >
>>>> > >>> > > 
>>>> mycobacteriumtuberculosis_element_level0=uniprot/entry/gene/name&type&ordered 
>>>> locus
>>>> > >>> > >
>>>> > >>> > >        mycobacteriumtuberculosis_query_level0=select 
>>>> count(*) from genenametype where type = 'ordered locus' and value like 'Rv%';
>>>> > >>> > >
>>>> > >>> > > 
>>>> mycobacteriumtuberculosis_table_name_level0=Ordered Locus
>>>> > >>> > >
>>>> > >>> > > There, I think, is the rub.  Notice that the XML count 
>>>> does not filter on RV%.  The SQL query does.
>>>> > >>> > >
>>>> > >>> > > Unfortunately, I don't think the TallyEngine can 
>>>> include selective filtering in the XML counts.  If the need to 
>>>> do selective filtering on XML is necessary, then I think we're 
>>>> looking at a new functionality for you to implement (or, if this 
>>>> throws things off too much, this may have to be noted somewhere, 
>>>> that the XML vs. database counts may be off because the database 
>>>> count is doing some text-based filtering but the XML count does not).
>>>> > >>> > >
>>>> > >>> > > What does xmlpipedb-match say?  That will at least 
>>>> tell you whether the 'RV%' count is indeed correct.
>>>> > >>> > >
>>>> > >>> > > John David N. Dionisio, PhD
>>>> > >>> > > Associate Professor, Computer Science
>>>> > >>> > > Loyola Marymount University
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > > On Feb 11, 2011, at 4:52 PM, Richard Brous wrote:
>>>> > >>> > >
>>>> > >>> > > > OK here is what I was able to put together from the 
>>>> past few hours of code review:
>>>> > >>> > > >
>>>> > >>> > > > MycobacteriumTuberculosisUniProtSpeciesProfile.java:
>>>> > >>> > > > -reveals that after the 2 System table modifications 
>>>> are made adding species name and link, a PreparedStatement is 
>>>> instantiated which builds and calls the base query.
>>>> > >>> > > >
>>>> > >>> > > > -The base query called is: ("SELECT value, type " + 
>>>> "FROM genenametype INNER JOIN entrytype_genetype " + 
>>>> "ON(entrytype_genetype_name_hjid = entrytype_genetype.hjid) " + 
>>>> "WHERE type = 'ordered locus' and value like 'Rv%' and 
>>>> entrytype_gene_hjid = ?")
>>>> > >>> > > >
>>>> > >>> > > > -So its looking in 'ordered locus' table/column for 
>>>> any tuple that starts with Rv (followed by any substring) and 
>>>> entrytype_gene_hjid = ? .
>>>> > >>> > > > The 'like' comparator and % usage are clear with the 
>>>> 'type' entrytype_gene_hjid = ?
>>>> > >>> > > >
>>>> > >>> > > > -To me it seems the query makes sense so the problem 
>>>> is likely elsewhere.
>>>> > >>> > > >
>>>> > >>> > > > GenMappBuilder.java:
>>>> > >>> > > > -I found method doTallies() at code line 895 which:
>>>> > >>> > > > Instantiates a Configuration called 
>>>> hibernateConfiguration and assigns to it the current hibernate configuration
>>>> > >>> > > > Validates database settings by analyzing 
>>>> hibernateConfiguration
>>>> > >>> > > > Instantiates a CriterionList for uniprot and assigns 
>>>> to it TallyType.UNIPROT
>>>> > >>> > > > Instantiates a CriterionList for go and assigns to 
>>>> it TallyType.GO
>>>> > >>> > > > Determines if both xml files exist
>>>> > >>> > > > Then getTallyResultsXML and getTallyResultsDatabase 
>>>> are run on both xml files and their respective CriterionList
>>>> > >>> > > > Results are then formatted for display in a table.
>>>> > >>> > > >
>>>> > >>> > > > -So enum TallyType which means that they are the 
>>>> only valid datatypes which TallyEngine accepts... go to know ...
>>>> > >>> > > >
>>>> > >>> > > > -Based on the screen shot of Tally Engine it would 
>>>> seem that both getTallyResultsXML() and 
>>>> getTallyResultsDatabase() are incorrectly returning. Likely due 
>>>> to both using an incorrect query (as we previously supposed). 
>>>> But where are the queries?... the more I dig the more I think 
>>>> they are in the criterial all the work is done against.
>>>> > >>> > > >
>>>> > >>> > > > continuing the review:
>>>> > >>> > > > getTallyResultsXML() calls Tally Engine instance 
>>>> method getXmlFileCounts(xmlFile)
>>>> > >>> > > > getTallyResultsDatabase() calls Tally Engine 
>>>> instance method getDbcounts(new QueryEngine(hibernateConfiguration)
>>>> > >>> > > > Both of these instanced methods originate from 
>>>> TallyEngine.java...
>>>> > >>> > > >
>>>> > >>> > > > TallyEngine.java:
>>>> > >>> > > >
>>>> > >>> > > > getXmlFileCounts() calls digestXmlFile() which 
>>>> instantiates a digester then processes against criteria... but 
>>>> this quickly becomes confusing and is hard to follow
>>>> > >>> > > >
>>>> > >>> > > > getDbcounts() then starts a db session and executes 
>>>> a query but then I also get a bit lost with my limited db knowledge.
>>>> > >>> > > >
>>>> > >>> > > > 
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
>>>>
>>>> > >>> > > >
>>>> > >>> > > > OVERALL I think I'm getting closer to the issues but 
>>>> I still feel as if I'm missing some understanding to proceed 
>>>> further. Can you pass along some of that Dondi insight and steer 
>>>> me in the right direction? =D
>>>> > >>> > > >
>>>> > >>> > > > -DB Tally - Not having taken databases yet certainly 
>>>> is limiting my ability determine where the "criteria" are being 
>>>> set and how they are followed during session activities. Also is 
>>>> the query we have been looking for this whole time in the 
>>>> criteria or someplace else?
>>>> > >>> > > >
>>>> > >>> > > > -XML Tally - again is the query contained within the 
>>>> criteria that digestXmlFile() uses to parse?
>>>> > >>> > > >
>>>> > >>> > > > Richard
>>>> > >>> > > >
>>>> > >>> > > >
>>>> > >>> > > > On Mon, Feb 7, 2011 at 5:50 PM, John David N. 
>>>> Dionisio <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > >>> > > > Right, schema issues are unlikely.  Most count 
>>>> discrepancies like this that I've seen have boiled down to 
>>>> forming the right query.  Then, knowing the right query (in both 
>>>> XML and SQL), it's a matter of making sure that TallyEngine asks 
>>>> that same query.
>>>> > >>> > > >
>>>> > >>> > > > John David N. Dionisio, PhD
>>>> > >>> > > > Associate Professor, Computer Science
>>>> > >>> > > > Loyola Marymount University
>>>> > >>> > > >
>>>> > >>> > > >
>>>> > >>> > > > On Feb 7, 2011, at 5:48 PM, Richard Brous wrote:
>>>> > >>> > > >
>>>> > >>> > > > > OK, so based on your approach:
>>>> > >>> > > > >
>>>> > >>> > > > > 1. I'll start with reviewing the queries for 
>>>> xmlpipedb-match and sql queries needed for the respective 
>>>> results as you requested.
>>>> > >>> > > > >
>>>> > >>> > > > > I was also thinking I may need to review the 
>>>> schema from xml into postgres but the issue isn't likely a 
>>>> schema error. The error most likely lies in how xmlpipedbutils 
>>>> queries the data from xml source and writes to the tables what it returns?
>>>> > >>> > > > >
>>>> > >>> > > > > 2. I'll review the code: trace the entrance of 
>>>> tally engine in the gmbuilder code then follow it through the xmlpipedbutils.
>>>> > >>> > > > >
>>>> > >>> > > > > Richard
>>>> > >>> > > > >
>>>> > >>> > > > > On Sat, Feb 5, 2011 at 10:28 AM, John David N. 
>>>> Dionisio <<mailto:do...@lm...>do...@lm...> wrote:
>>>> > >>> > > > > Just wanted to confirm (since I wasn't sure in the 
>>>> first e-mail) --- the XMLPipeDB Utilities source code is in 
>>>> trunk/xmlpipedbutils in SourceForge's Subversion repo.
>>>> > >>> > > > >
>>>> > >>> > > > > John David N. Dionisio, PhD
>>>> > >>> > > > > Associate Professor, Computer Science
>>>> > >>> > > > > Loyola Marymount University
>>>> > >>> > > > >
>>>> > >>> > > > >
>>>> > >>> > > > >
>>>> > >>> > > > > On Feb 5, 2011, at 10:02 AM, Richard Brous wrote:
>>>> > >>> > > > >
>>>> > >>> > > > > > Hi Dondi,
>>>> > >>> > > > > >
>>>> > >>> > > > > > So I'm at the point in working with M 
>>>> tuberculosis that I was able to exactly reproduce Dr. 
>>>> Dahlquist's problematic TallyEngine results.
>>>> > >>> > > > > >
>>>> > >>> > > > > > gmb2b60 Results
>>>> > >>> > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > > > Now the proverbial question - What next to solve 
>>>> the Ordered Locus import/count issue?
>>>> > >>> > > > > >
>>>> > >>> > > > > > **********************************************
>>>> > >>> > > > > > Here is my thought process:
>>>> > >>> > > > > >
>>>> > >>> > > > > > Step 1: How does the import process work at the 
>>>> high level? (obviously correct me if I'm wrong)
>>>> > >>> > > > > >
>>>> > >>> > > > > > I believe that basically as each XML tag is 
>>>> read, it is placed in the proper Postgres table(s) based on some 
>>>> criteria. There is also likely some sort of check that each 
>>>> individual tag is in valid XML format unless we don't care at 
>>>> this stage (care at export) or maybe the parser just skips over 
>>>> and goes on to the next .
>>>> > >>> > > > > >
>>>> > >>> > > > > > Step 2: What could be the problem?
>>>> > >>> > > > > >
>>>> > >>> > > > > > Either -
>>>> > >>> > > > > > a. XML tags are being parsed incorrectly 
>>>> (ignored/skipped)?
>>>> > >>> > > > > > b. Decision criteria of which table they should 
>>>> be added to?
>>>> > >>> > > > > >
>>>> > >>> > > > > > **********************************************
>>>> > >>> > > > > >
>>>> > >>> > > > > > I read on the sourceforge wiki:
>>>> > >>> > > > > >
>>>> > >>> > > > > > XMLPipeDB has a modular architecture with three 
>>>> components that may be used separately or together. XSD-to-DB 
>>>> reads an XSD (XML Schema Definition) and automatically generates 
>>>> an SQL schema, Java classes, and Hibernate mappings. XMLPipeDB 
>>>> Utilities provides functionality for configuring the database, 
>>>> importing data, and performing queries. GenMAPP Builder is based 
>>>> on the XMLPipeDB Utilities and exports GenMAPP-compatible Gene 
>>>> Databases based on data from UniProt and Gene Ontology (GO).
>>>> > >>> > > > > >
>>>> > >>> > > > > > So I should probably start with the XMLPipeDB 
>>>> Utilities which are where? I don't see any in the basic 
>>>> distribution or are they not standalone and called from the command line?
>>>> > >>> > > > > >
>>>> > >>> > > > > > Thanks!
>>>> > >>> > > > > >
>>>> > >>> > > > > > Richard
>
>
>
>------------------------------------------------------------------------------
>Free Software Download: Index, Search & Analyze Logs and other IT data in
>Real-Time with Splunk. Collect, index and harness all the fast moving IT data
>generated by your applications, servers and devices whether physical, virtual
>or in the cloud. Deliver compliance at lower cost and gain new business
>insights. 
><http://p.sf.net/sfu/splunk-dev2dev>http://p.sf.net/sfu/splunk-dev2dev
>_______________________________________________
>xmlpipedb-developer mailing list
><mailto:xml...@li...>xml...@li...
>https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>
>
>
>
>------------------------------------------------------------------------------
>Free Software Download: Index, Search & Analyze Logs and other IT data in
>Real-Time with Splunk. Collect, index and harness all the fast moving IT data
>generated by your applications, servers and devices whether physical, virtual
>or in the cloud. Deliver compliance at lower cost and gain new business
>insights. 
><http://p.sf.net/sfu/splunk-dev2dev>http://p.sf.net/sfu/splunk-dev2dev
>_______________________________________________
>xmlpipedb-developer mailing list
><mailto:xml...@li...>xml...@li...
>https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer
>
>
>