xmlpipedb-developer Mailing List for XMLPipeDB (Page 18)
Brought to you by:
kdahlquist,
zugzugglug
You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(16) |
Sep
|
Oct
(9) |
Nov
(3) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(2) |
Feb
(8) |
Mar
|
Apr
(22) |
May
(1) |
Jun
|
Jul
|
Aug
(3) |
Sep
(32) |
Oct
(2) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
(60) |
Mar
(42) |
Apr
(35) |
May
(17) |
Jun
(2) |
Jul
(23) |
Aug
(72) |
Sep
(15) |
Oct
(10) |
Nov
(14) |
Dec
(4) |
2012 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(11) |
Dec
|
2014 |
Jan
(1) |
Feb
(12) |
Mar
(14) |
Apr
(8) |
May
|
Jun
(14) |
Jul
(2) |
Aug
|
Sep
(5) |
Oct
(6) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
(5) |
Mar
(2) |
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: John D. N. D. <do...@lm...> - 2010-04-01 05:54:20
|
Greetings all, GenMAPP Builder 2.0b42 is now available. This version updates the UniProt XML database code to correspond to the latest UniProt XSD, released 2/23/2010. Thus, this version will only import UniProt XML files released after that date. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University |
From: John D. N. D. <do...@lm...> - 2010-02-17 19:03:09
|
A new GenMAPP Builder version is available (2.0 beta 41). This version introduces a new customization for Staphylococcus aureus. There are no changes for other species. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University |
From: John D. N. D. <do...@lm...> - 2010-02-08 21:26:52
|
Greetings all, Yes, sorry, serious mea culpa on that one. There was a procedural error in packaging the files for that release which I missed, then failed to validate. The .zip file has been fixed in SourceForge now; please redownload and the error should be resolved. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University |
From: John D. N. D. <do...@lm...> - 2010-02-08 01:16:21
|
A new GenMAPP Builder version is available (2.0 beta 40). This version restores the previously lost S. cerevisiae customizations (still make sure to choose S. cerevisiae as species name though), and includes customizations for P. aeruginosa. Version information is also now available in an About dialog as well as the window title. The default note in an exported .gdb also includes "Exported by..." the running version of GenMAPP Builder. Please download and test when you can...thanks! John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University |
From: John D. N. D. <do...@lm...> - 2010-02-04 16:36:28
|
OK, thanks, I will investigate then report. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Feb 4, 2010, at 8:18 AM, Kam Dahlquist wrote: > Hi, > > I have results from a new export with gmb2b39 that I ran overnight. > > 1. So, choosing "Saccharomyces cerevisiae" from the drop down menu > in the > wizard does invoke a different species profile with different > modifications > to the tables. My recollection is that Arabidopsis and E. coli do > not work > this way. We need to think about how to deal with this issue. > Ideally, > "Baker's yeast" would not be an option since it is the common name > for the > species and we don't want people to use that. However, at a > minimum, both > should invoke the right species profile. > > 2. However, the SGD table and the OrderedLocusNames tables are > completely > missing from the exported gdb. The TallyEngine is counting the IDs, > the > relationship tables between SGD and the other ID systems are there, > but the > SGD table itself is not there, nor is the OrderedLocusNames table. > The > OrderedLocusNames relationship tables to other IDs are empty. This is > different than the "By" species profile where the SGD table got > made, but > just had the "S000000000" IDs, and where the OrderedLocusNames table > had a > mixture of ID types. > > Kam |
From: Kam D. <kda...@lm...> - 2010-02-04 16:19:13
|
Hi, I have results from a new export with gmb2b39 that I ran overnight. 1. So, choosing "Saccharomyces cerevisiae" from the drop down menu in the wizard does invoke a different species profile with different modifications to the tables. My recollection is that Arabidopsis and E. coli do not work this way. We need to think about how to deal with this issue. Ideally, "Baker's yeast" would not be an option since it is the common name for the species and we don't want people to use that. However, at a minimum, both should invoke the right species profile. 2. However, the SGD table and the OrderedLocusNames tables are completely missing from the exported gdb. The TallyEngine is counting the IDs, the relationship tables between SGD and the other ID systems are there, but the SGD table itself is not there, nor is the OrderedLocusNames table. The OrderedLocusNames relationship tables to other IDs are empty. This is different than the "By" species profile where the SGD table got made, but just had the "S000000000" IDs, and where the OrderedLocusNames table had a mixture of ID types. Kam At 10:33 AM 2/3/2010, Kam Dahlquist wrote: >Hi, > >In my past experience, it does not matter which name is chosen in the GUI >wizard, the program does what it does anyway (that's one of the issues with >the GUI, that drop down menu for species doesn't seem to matter). > >I can confirm this, but since the export takes so long, I don't want to >start it this morning while my students need the computer. I can start it >later, but we won't know the answer until tomorrow. > >At some point in the near future, I hope we can dump the OrderedLocusNames >set of tables since we don't need them and creating them is probably adding >a lot of time to the export. > >Kam > >At 10:16 AM 2/3/2010, John David N. Dionisio wrote: > >Greetings, > > > >I believe the first two items may be related --- the species profile may > >not have been triggered correctly if "Baker's yeast" is the chosen species > >name. Would it be possible to re-run an export with "Saccharomyces..." > >chosen as the name? > > > >Good suggestion about embedding the version information...I'll include > >that in some future build. > > > >John David N. Dionisio, PhD > >Assistant Professor, Computer Science > >Loyola Marymount University > > > > > >On Feb 3, 2010, at 8:24 AM, Kam Dahlquist wrote: > > > > > Hi, > > > > > > I checked the yeast gdb export that Bernadette started yesterday > using the > > > gmbuilder-2.0b39. There are a couple issues I can see right away: > > > > > > 1. The SGD table is being created, but all it has are the fields ID, > > Date, > > > Remarks. The ID is the "S" form of the ID. I went back and looked > at the > > > export run last week with gmbuilder-2.ob38 and the SGD table is there, > > > exactly like this (I just must've missed it last time). So anyway, this > > > build did not have any of the customizations you made for the SGD table. > > > > > > 2. For some reason, these two versions of gmbuilder (38 and 39) are > using > > > "Baker's yeast" instead of "Saccharomyces cerevisiae" as the species > > > name. This means that they are automatically naming the database with a > > > "By_" prefix and also inserting "Baker's yeast" into the species fields > > > throughout the database. This is odd because I don't think it was doing > > > this before. > > > > > > 3. One last random thought. Would it be possible to display the version > > > of GenMAPP Builder somewhere in the GUI interface, either in an About > drop > > > down menu, or just as text at the bottom the active window like > > > "gmbuilder-2.0b39". I realized that I could not easily check the version > > > number of the program while it was open. Also, it would be nice to embed > > > the version of GenMAPP Builder into the gdb itself. Perhaps by inserting > > > it into the Notes field of the Info table. Language like "Created with > > > gmbuilder-2.0b39 available from > > > http://sourceforge.net/projects/xmlpipedb". I know this is an extra > > > feature request, but it would be helpful. > > > > > > Thanks, > > > Kam > > > > > >------------------------------------------------------------------------- > ----- > >The Planet: dedicated and managed hosting, cloud storage, colocation > >Stay online with enterprise data centers and the best network in the > business > >Choose flexible plans and management services without long-term contracts > >Personal 24x7 support from experience hosting pros just a phone call away. > >http://p.sf.net/sfu/theplanet-com > >_______________________________________________ > >xmlpipedb-developer mailing list > >xml...@li... > >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer > > > >------------------------------------------------------------------------------ >The Planet: dedicated and managed hosting, cloud storage, colocation >Stay online with enterprise data centers and the best network in the business >Choose flexible plans and management services without long-term contracts >Personal 24x7 support from experience hosting pros just a phone call away. >http://p.sf.net/sfu/theplanet-com >_______________________________________________ >xmlpipedb-developer mailing list >xml...@li... >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer |
From: Kam D. <kda...@lm...> - 2010-02-03 18:33:22
|
Hi, In my past experience, it does not matter which name is chosen in the GUI wizard, the program does what it does anyway (that's one of the issues with the GUI, that drop down menu for species doesn't seem to matter). I can confirm this, but since the export takes so long, I don't want to start it this morning while my students need the computer. I can start it later, but we won't know the answer until tomorrow. At some point in the near future, I hope we can dump the OrderedLocusNames set of tables since we don't need them and creating them is probably adding a lot of time to the export. Kam At 10:16 AM 2/3/2010, John David N. Dionisio wrote: >Greetings, > >I believe the first two items may be related --- the species profile may >not have been triggered correctly if "Baker's yeast" is the chosen species >name. Would it be possible to re-run an export with "Saccharomyces..." >chosen as the name? > >Good suggestion about embedding the version information...I'll include >that in some future build. > >John David N. Dionisio, PhD >Assistant Professor, Computer Science >Loyola Marymount University > > >On Feb 3, 2010, at 8:24 AM, Kam Dahlquist wrote: > > > Hi, > > > > I checked the yeast gdb export that Bernadette started yesterday using the > > gmbuilder-2.0b39. There are a couple issues I can see right away: > > > > 1. The SGD table is being created, but all it has are the fields ID, > Date, > > Remarks. The ID is the "S" form of the ID. I went back and looked at the > > export run last week with gmbuilder-2.ob38 and the SGD table is there, > > exactly like this (I just must've missed it last time). So anyway, this > > build did not have any of the customizations you made for the SGD table. > > > > 2. For some reason, these two versions of gmbuilder (38 and 39) are using > > "Baker's yeast" instead of "Saccharomyces cerevisiae" as the species > > name. This means that they are automatically naming the database with a > > "By_" prefix and also inserting "Baker's yeast" into the species fields > > throughout the database. This is odd because I don't think it was doing > > this before. > > > > 3. One last random thought. Would it be possible to display the version > > of GenMAPP Builder somewhere in the GUI interface, either in an About drop > > down menu, or just as text at the bottom the active window like > > "gmbuilder-2.0b39". I realized that I could not easily check the version > > number of the program while it was open. Also, it would be nice to embed > > the version of GenMAPP Builder into the gdb itself. Perhaps by inserting > > it into the Notes field of the Info table. Language like "Created with > > gmbuilder-2.0b39 available from > > http://sourceforge.net/projects/xmlpipedb". I know this is an extra > > feature request, but it would be helpful. > > > > Thanks, > > Kam > > >------------------------------------------------------------------------------ >The Planet: dedicated and managed hosting, cloud storage, colocation >Stay online with enterprise data centers and the best network in the business >Choose flexible plans and management services without long-term contracts >Personal 24x7 support from experience hosting pros just a phone call away. >http://p.sf.net/sfu/theplanet-com >_______________________________________________ >xmlpipedb-developer mailing list >xml...@li... >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer |
From: John D. N. D. <do...@lm...> - 2010-02-03 18:16:30
|
Greetings, I believe the first two items may be related --- the species profile may not have been triggered correctly if "Baker's yeast" is the chosen species name. Would it be possible to re-run an export with "Saccharomyces..." chosen as the name? Good suggestion about embedding the version information...I'll include that in some future build. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Feb 3, 2010, at 8:24 AM, Kam Dahlquist wrote: > Hi, > > I checked the yeast gdb export that Bernadette started yesterday using the > gmbuilder-2.0b39. There are a couple issues I can see right away: > > 1. The SGD table is being created, but all it has are the fields ID, Date, > Remarks. The ID is the "S" form of the ID. I went back and looked at the > export run last week with gmbuilder-2.ob38 and the SGD table is there, > exactly like this (I just must've missed it last time). So anyway, this > build did not have any of the customizations you made for the SGD table. > > 2. For some reason, these two versions of gmbuilder (38 and 39) are using > "Baker's yeast" instead of "Saccharomyces cerevisiae" as the species > name. This means that they are automatically naming the database with a > "By_" prefix and also inserting "Baker's yeast" into the species fields > throughout the database. This is odd because I don't think it was doing > this before. > > 3. One last random thought. Would it be possible to display the version > of GenMAPP Builder somewhere in the GUI interface, either in an About drop > down menu, or just as text at the bottom the active window like > "gmbuilder-2.0b39". I realized that I could not easily check the version > number of the program while it was open. Also, it would be nice to embed > the version of GenMAPP Builder into the gdb itself. Perhaps by inserting > it into the Notes field of the Info table. Language like "Created with > gmbuilder-2.0b39 available from > http://sourceforge.net/projects/xmlpipedb". I know this is an extra > feature request, but it would be helpful. > > Thanks, > Kam |
From: Kam D. <kda...@lm...> - 2010-02-03 16:24:46
|
Hi, I checked the yeast gdb export that Bernadette started yesterday using the gmbuilder-2.0b39. There are a couple issues I can see right away: 1. The SGD table is being created, but all it has are the fields ID, Date, Remarks. The ID is the "S" form of the ID. I went back and looked at the export run last week with gmbuilder-2.ob38 and the SGD table is there, exactly like this (I just must've missed it last time). So anyway, this build did not have any of the customizations you made for the SGD table. 2. For some reason, these two versions of gmbuilder (38 and 39) are using "Baker's yeast" instead of "Saccharomyces cerevisiae" as the species name. This means that they are automatically naming the database with a "By_" prefix and also inserting "Baker's yeast" into the species fields throughout the database. This is odd because I don't think it was doing this before. 3. One last random thought. Would it be possible to display the version of GenMAPP Builder somewhere in the GUI interface, either in an About drop down menu, or just as text at the bottom the active window like "gmbuilder-2.0b39". I realized that I could not easily check the version number of the program while it was open. Also, it would be nice to embed the version of GenMAPP Builder into the gdb itself. Perhaps by inserting it into the Notes field of the Info table. Language like "Created with gmbuilder-2.0b39 available from http://sourceforge.net/projects/xmlpipedb". I know this is an extra feature request, but it would be helpful. Thanks, Kam |
From: John D. N. D. <do...@lm...> - 2010-01-29 19:43:12
|
Greetings all, A new GenMAPP Builder version is available (beta 39). This version includes the SGD table customizations (unmodified from before, so nothing new from yesterday's investigations of some yeast IDs). I was also able to throw in the Tally Engine customization for MRSA (adding ordered locus to the counts). One note about the SGD table customization --- I checked the commit records, and noted I committed those customizations before the prior version, beta 38, was released. So, either some lines were crossed and beta 38 somehow was built before those changes were made, or a new issue has emerged that keeps the customizations from happening. So Bernadette, keep your eyes peeled on Monday when generating a new export, and let me know if the custom SGD table still does not appear. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University |
From: Kam D. <kda...@lm...> - 2010-01-28 17:13:56
|
Hi Dondi, I'm just now getting to the yeast task list. The way I'm going to approach dealing with these IDs is to look them all up one by one to make individual determinations for what is going on. I suspect that I will be communicating with the Integr8 or UniProt people to fix the primary entries for those genes/proteins. Having said that, it looks like for some of the instances, you provided some examples, but not the entire list. So, would you go and find all the cases for me so that I can check them all? In the meantime, for any case where an SGD ID exists, and ORF ID exists, but a gene symbol does not, use the ORF ID for the gene symbol. Bernadette ran a new yeast gdb off of build 38 Tuesday, I'll make sure she works on the testing report when she gets in today so we will have more info from that as well. Cheers, Kam |
From: John D. N. D. <do...@lm...> - 2009-12-05 20:00:12
|
Greetings, There's a new gdb version of the test file: https://www.cs.lmu.edu/biodb/fall2009/index.php/File:Sc-Std_20091203-test.gdb I decided to just keep the filename the same so that the sequence of tests can be under the same wiki file but just as different versions. Anyway, this one uses the ORF name when the primary is absent. The SGD record count goes up to 5719 as a result. The "slash" IDs are still there, so presumably that will require some action (depending on the action, maybe this can be something for the students? --- I can just give some instructions). Any other null values get skipped. Oh also, the OrderedLocus relationship tables still get generated; I'll look into ditching those. Anyway, take a gander when you can and let me know... John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University |
From: John D. N. D. <do...@lm...> - 2009-12-05 07:47:02
|
Greetings Kam, In refining that modified SGD table query, I found some "interesting" sets of records. For your consideration... hjid | id | symbol | orf ---------+----+--------+--------- 1799877 | | LPT1 | 1801238 | | ALD6 | 1802608 | | | YDR539W 1803959 | | | 1805321 | | IME1 | YJR094C 1806664 | | RME1 | YGR044C 1808027 | | RSF1 | YMR030W 1809403 | | | ...these records are entries that have no SGD IDs. Some supplementary information about them: LPT1 - A9EDP4_YEAST - Lysophospholipid acyltransferase ALD6 - A9LRZ7_YEAST - Cytosolic aldehyde dehydrogenase YDR539W - B2NII0_YEAST - Putative uncharacterized protein YDR539W IME1 - B8XW28_YEAST - Ime1p RME1 - B8XW41_YEAST - Rme1p RSF1 - B8XW45_YEAST - Rsf1p I don't know what to say about the 2 records with *none* of the IDs though...hjid doesn't help because that is generated for the relational database, and is not part of the XML file. I'm guessing that these simply don't make it to SGD because there is no SGD ID? Here's another "interesting" set... hjid | id | symbol | orf ---------+------------+-----------+------------------------- 1509564 | S000000068 | TY1A-PR1 | YAR010C 1509564 | S000000068 | TY1A-PR1 | YPR137C-A 1509564 | S000000068 | TY1A-A | YAR010C 1509564 | S000000068 | TY1A-A | YPR137C-A 1128516 | S000000168 | RPS8A | YBL072C 1128516 | S000000168 | RPS8B | YBL072C 1128516 | S000000168 | RPS8A | YER102W 1128516 | S000000168 | RPS8B | YER102W 1043286 | S000000183 | RPL23B | YER117W 1043286 | S000000183 | RPL23B | YBL087C 1043286 | S000000183 | RPL23A | YBL087C 1043286 | S000000183 | RPL23A | YER117W These guys (among others) either have 2 symbol tags, 2 ORF names, or both. Presumably an issue since the SGD IDs must be unique in the SGD table? And finally, here are some "old friends" (again a subset): 1529063 | S000000200 | | YBL104C/YBL103C-A 712746 | S000000302 | MMS4 | YBR098W/YBR100W 888697 | S000000510 | PGS1 | YCL004W/YCL003W 145116 | S000000520 | BUD3 | YCL014W/YCL013W/YCL012W 1738671 | S000005435 | | YOL075C/YOL074C 1745387 | S000005522 | | YOL162W/YOL163W 1745387 | S000005523 | | YOL162W/YOL163W 1792808 | S000005613 | YVC1 | YOR087W/YOR088W 5540 | S000005765 | ABP140 | YOR239W/YOR240W ...there are 41 of these in all. Presumably we'll want to "split" somehow, but I figure that their being in the SGD table may cause an issue, since that would duplicate/triplicate the SGD ID record? Let me know how you'd like to deal with these...meanwhile, I'll implement what you said about the ORF filling in for the symbol/primary when that isn't around. I'll let you know if there's a new test build lying around. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Dec 4, 2009, at 11:20 AM, Kam Dahlquist wrote: > Hi, > > See below: > > Kam > > At 09:41 PM 12/3/2009, you wrote: >> Hi Kam, >> >> I've uploaded a test export to the wiki: >> >> https://www.cs.lmu.edu/biodb/fall2009/index.php/File:Sc-Std_20091203-test.gdb >> >> This .gdb has a table called SGDTest, which is a candidate for what the SGD table should really be. Please take a look and see if this appears correct (or at least the right track :) ). There are fewer records overall in this combo SGDTest table, as it represents only the records with all 3 IDs. > > This is the right track, but... > > If a gene does not have a gene symbol (like ACT1), it's ORF ID is used instead. The difference between the SGD table and the SGDTest is 1219 records, I am guessing that most, if not all of them got left out because they did not have a gene symbol. In that case, their ORF ID should be copied over into the gene symbol field. > > Also, don't forget that some of the ORF IDs are not in the "Y" form, but are in the form as follows: Q####. These are mitochondrial genes. > > Somehow in the 2006 yeast gdb, empty data is being tolerated in the ORF or Symbol fields; I'm not sure how that is. I'm hoping that for every S######### ID there is at least an ORF ID so that if you copy over that to the gene symbol for the ones that are missing, we won't lose any records. > > The 2006 gdb is actually quite poor in terms of data integrity, not that I look at it. > > >> Also, I noticed that the Ensembl table in the 2006 version also has more columns...should this also be replicated in the GenMAPP Builder export? Are there other tables that I might not be remembering? > > No. Keep in mind, GenMAPP.org is using Ensembl as a primary database so they have the ability to capture more data from Ensembl directly. I don't think we should try to replicate this table because the info there is pretty much in the UniProt or SGD tables. The only issue for users is that if they made a MAPP or Expression Dataset using Ensembl in a previous version of the gdb, it won't be compatible with ours. However, I would wager that 99% of the yeast community would choose SGD as their choice of system, so I think we're OK. > > >> Meanwhile, before I left campus, Kenny and Don were off investigating matched XML IDs that were not found in the database. There were 33 in all, and by the time I left a few categories had already emerged --- IDs in comment text only, another ID in a paper title but nowhere else --- so this may turn out to be like A. thaliana. We'll see what their final report looks like. > > Yeah, except that if it's only found in comment text or a paper title, we probably don't want them. If there's a list of the 33 somewhere, we can go through them one by one to make these determinations. > > Progress is definitely being made! > > >> John David N. Dionisio, PhD >> Assistant Professor, Computer Science >> Loyola Marymount University > > |
From: Kam D. <kda...@lm...> - 2009-12-04 21:42:23
|
Hi, Yes, I saw that list too. I told Kenny that I wanted to review them myself before we decided what to do with them. I may or may not get a chance to do this before I leave for the conference. If I don't get caught up, we'll just leave them on the "to do" list for later. Unfortunately, his list doesn't have the crucial piece of information which is what UniProt record is it a part of and is it the gene that belongs to that record. Cheers, Kam At 12:02 PM 12/4/2009, John David N. Dionisio wrote: >OK, I'll adjust as specified and let you know when there is a new test gdb. > >I just checked and Kenny has listed "the 33" in his online notebook, under >the December 3 entry. It looks like they also finished surveying where in >the XML these IDs were found, so that may be ready for some action >decisions :) > >John David N. Dionisio, PhD >Assistant Professor, Computer Science >Loyola Marymount University > > >On Dec 4, 2009, at 11:20 AM, Kam Dahlquist wrote: > > > Hi, > > > > See below: > > > > Kam > > > > At 09:41 PM 12/3/2009, you wrote: > >> Hi Kam, > >> > >> I've uploaded a test export to the wiki: > >> > >> > https://www.cs.lmu.edu/biodb/fall2009/index.php/File:Sc-Std_20091203-test.gdb > >> > >> This .gdb has a table called SGDTest, which is a candidate for what > the SGD table should really be. Please take a look and see if this > appears correct (or at least the right track :) ). There are fewer > records overall in this combo SGDTest table, as it represents only the > records with all 3 IDs. > > > > This is the right track, but... > > > > If a gene does not have a gene symbol (like ACT1), it's ORF ID is used > instead. The difference between the SGD table and the SGDTest is 1219 > records, I am guessing that most, if not all of them got left out because > they did not have a gene symbol. In that case, their ORF ID should be > copied over into the gene symbol field. > > > > Also, don't forget that some of the ORF IDs are not in the "Y" form, > but are in the form as follows: Q####. These are mitochondrial genes. > > > > Somehow in the 2006 yeast gdb, empty data is being tolerated in the ORF > or Symbol fields; I'm not sure how that is. I'm hoping that for every > S######### ID there is at least an ORF ID so that if you copy over that > to the gene symbol for the ones that are missing, we won't lose any records. > > > > The 2006 gdb is actually quite poor in terms of data integrity, not > that I look at it. > > > > > >> Also, I noticed that the Ensembl table in the 2006 version also has > more columns...should this also be replicated in the GenMAPP Builder > export? Are there other tables that I might not be remembering? > > > > No. Keep in mind, GenMAPP.org is using Ensembl as a primary database > so they have the ability to capture more data from Ensembl directly. I > don't think we should try to replicate this table because the info there > is pretty much in the UniProt or SGD tables. The only issue for users is > that if they made a MAPP or Expression Dataset using Ensembl in a > previous version of the gdb, it won't be compatible with ours. However, > I would wager that 99% of the yeast community would choose SGD as their > choice of system, so I think we're OK. > > > > > >> Meanwhile, before I left campus, Kenny and Don were off investigating > matched XML IDs that were not found in the database. There were 33 in > all, and by the time I left a few categories had already emerged --- IDs > in comment text only, another ID in a paper title but nowhere else --- so > this may turn out to be like A. thaliana. We'll see what their final > report looks like. > > > > Yeah, except that if it's only found in comment text or a paper title, > we probably don't want them. If there's a list of the 33 somewhere, we > can go through them one by one to make these determinations. > > > > Progress is definitely being made! > > > > > >> John David N. Dionisio, PhD > >> Assistant Professor, Computer Science > >> Loyola Marymount University > > > > |
From: John D. N. D. <do...@lm...> - 2009-12-04 20:02:38
|
OK, I'll adjust as specified and let you know when there is a new test gdb. I just checked and Kenny has listed "the 33" in his online notebook, under the December 3 entry. It looks like they also finished surveying where in the XML these IDs were found, so that may be ready for some action decisions :) John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Dec 4, 2009, at 11:20 AM, Kam Dahlquist wrote: > Hi, > > See below: > > Kam > > At 09:41 PM 12/3/2009, you wrote: >> Hi Kam, >> >> I've uploaded a test export to the wiki: >> >> https://www.cs.lmu.edu/biodb/fall2009/index.php/File:Sc-Std_20091203-test.gdb >> >> This .gdb has a table called SGDTest, which is a candidate for what the SGD table should really be. Please take a look and see if this appears correct (or at least the right track :) ). There are fewer records overall in this combo SGDTest table, as it represents only the records with all 3 IDs. > > This is the right track, but... > > If a gene does not have a gene symbol (like ACT1), it's ORF ID is used instead. The difference between the SGD table and the SGDTest is 1219 records, I am guessing that most, if not all of them got left out because they did not have a gene symbol. In that case, their ORF ID should be copied over into the gene symbol field. > > Also, don't forget that some of the ORF IDs are not in the "Y" form, but are in the form as follows: Q####. These are mitochondrial genes. > > Somehow in the 2006 yeast gdb, empty data is being tolerated in the ORF or Symbol fields; I'm not sure how that is. I'm hoping that for every S######### ID there is at least an ORF ID so that if you copy over that to the gene symbol for the ones that are missing, we won't lose any records. > > The 2006 gdb is actually quite poor in terms of data integrity, not that I look at it. > > >> Also, I noticed that the Ensembl table in the 2006 version also has more columns...should this also be replicated in the GenMAPP Builder export? Are there other tables that I might not be remembering? > > No. Keep in mind, GenMAPP.org is using Ensembl as a primary database so they have the ability to capture more data from Ensembl directly. I don't think we should try to replicate this table because the info there is pretty much in the UniProt or SGD tables. The only issue for users is that if they made a MAPP or Expression Dataset using Ensembl in a previous version of the gdb, it won't be compatible with ours. However, I would wager that 99% of the yeast community would choose SGD as their choice of system, so I think we're OK. > > >> Meanwhile, before I left campus, Kenny and Don were off investigating matched XML IDs that were not found in the database. There were 33 in all, and by the time I left a few categories had already emerged --- IDs in comment text only, another ID in a paper title but nowhere else --- so this may turn out to be like A. thaliana. We'll see what their final report looks like. > > Yeah, except that if it's only found in comment text or a paper title, we probably don't want them. If there's a list of the 33 somewhere, we can go through them one by one to make these determinations. > > Progress is definitely being made! > > >> John David N. Dionisio, PhD >> Assistant Professor, Computer Science >> Loyola Marymount University > > |
From: Kam D. <kda...@lm...> - 2009-12-04 19:20:33
|
Hi, See below: Kam At 09:41 PM 12/3/2009, you wrote: >Hi Kam, > >I've uploaded a test export to the wiki: > > >https://www.cs.lmu.edu/biodb/fall2009/index.php/File:Sc-Std_20091203-test.gdb > >This .gdb has a table called SGDTest, which is a candidate for what the >SGD table should really be. Please take a look and see if this appears >correct (or at least the right track :) ). There are fewer records >overall in this combo SGDTest table, as it represents only the records >with all 3 IDs. This is the right track, but... If a gene does not have a gene symbol (like ACT1), it's ORF ID is used instead. The difference between the SGD table and the SGDTest is 1219 records, I am guessing that most, if not all of them got left out because they did not have a gene symbol. In that case, their ORF ID should be copied over into the gene symbol field. Also, don't forget that some of the ORF IDs are not in the "Y" form, but are in the form as follows: Q####. These are mitochondrial genes. Somehow in the 2006 yeast gdb, empty data is being tolerated in the ORF or Symbol fields; I'm not sure how that is. I'm hoping that for every S######### ID there is at least an ORF ID so that if you copy over that to the gene symbol for the ones that are missing, we won't lose any records. The 2006 gdb is actually quite poor in terms of data integrity, not that I look at it. >Also, I noticed that the Ensembl table in the 2006 version also has more >columns...should this also be replicated in the GenMAPP Builder >export? Are there other tables that I might not be remembering? No. Keep in mind, GenMAPP.org is using Ensembl as a primary database so they have the ability to capture more data from Ensembl directly. I don't think we should try to replicate this table because the info there is pretty much in the UniProt or SGD tables. The only issue for users is that if they made a MAPP or Expression Dataset using Ensembl in a previous version of the gdb, it won't be compatible with ours. However, I would wager that 99% of the yeast community would choose SGD as their choice of system, so I think we're OK. >Meanwhile, before I left campus, Kenny and Don were off investigating >matched XML IDs that were not found in the database. There were 33 in all, >and by the time I left a few categories had already emerged --- IDs in >comment text only, another ID in a paper title but nowhere else --- so >this may turn out to be like A. thaliana. We'll see what their final >report looks like. Yeah, except that if it's only found in comment text or a paper title, we probably don't want them. If there's a list of the 33 somewhere, we can go through them one by one to make these determinations. Progress is definitely being made! >John David N. Dionisio, PhD >Assistant Professor, Computer Science >Loyola Marymount University |
From: Kam D. <kda...@lm...> - 2009-12-03 19:23:19
|
Hi, You'll have seen in my previous message that MAPPFinder is stalling due to the hugeness of the GeneOntology, specifically, I think it is the GeneOntologyTree table. After I sent the e-mail, I went and looked and it has 200,000 records instead of the ~38,000 that were in the 2006 database. Obviously, this is not something we are going to be able to fix this semester, but I think that there will be a viable fix we can try to pursue with our research team next semester, which is: GO has created something called "GO Slim" where they have removed a lot of the very specific child terms so that you only get the broad categories of GO. We can try using the GO slim instead of the entire GO for our gdb. The only hitch, of course, is that it is only provided in the OBO format, not the OBO-XML format. There is a perl script called map2slim (http://search.cpan.org/~cmungall/go-perl/scripts/map2slim) that purports to take a gene associations file (like our GOA, but remains to be seen if it actually takes the GOA exact format) and re-map it to the GO Slim terms, so I think that is probably what we will need to do. I'm guessing that it will take some rounds of testing before we are sure it is working properly for us. So in the meantime, Bernie is going forward with analyzing the Arava data with the old gdb and hopefully Kenny and Don are making some headway on the gdb. Cheers, Kam |
From: Kam D. <kda...@lm...> - 2009-11-25 02:16:21
|
Hi, Just wanted to log that the TallyEngine does not calculate the counts for OrderedLocusNames if there is no species profile for that particular species, so this will be something that the coders have to do for their species profile. See screenshots for Pseudomonas on the class wiki: https://www.cs.lmu.edu/biodb/fall2009/index.php/Ravenclaw_TR110909_kp Cheers, Kam |
From: Dahlquist, K. D. <kda...@lm...> - 2009-11-18 05:46:01
|
Hi, Here is the minimum for a new species profile: 1. In Systems table, Species field of OrderedLocusNames record needs to read |<Genus species>| It is currently being left blank with just two bars || 2. In Systems table, Link field of OrderedLocusNames record needs to have the appropriate URL query added where the ~ character refers to the ID to be inserted. For example, it should be for Vibrio: "http://cmr.jcvi.org/tigr-scripts/CMR/shared/GenePage.cgi?locus=~" 3. For yeast, the SGD table has the fields ID, Symbol, ORF, Species, Date, Remarks instead of just ID, Date, and Remarks for a normal OrderedLocusNames table. The ID is the Primary SGD ID of the form S#########. The Symbol is the standard name or gene symbol, like CIN5, but it can also be a duplicate of the ORF or systematic name. The ORF is the systematic name, Y[A-O][R/L]###[W/C] and sometimes -A or -B. I didn't see any -C, but there always might be one. When dealing with yeast, we will probably have the issues we did with TAIR because it already exists in the Systems table and it is being made twice, essentially with the OrderedLocusNames. Cheers, Kam |
From: John D. N. D. <do...@lm...> - 2009-11-02 03:37:29
|
Hello Gao Shan, Here would be my checklist of things to narrow down the issue: - What query/method did you use to check for data in the database? - Did you see any errors on the console when importing? You can manually edit the log level if you want to see more messages. Thanks! John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Nov 1, 2009, at 7:31 PM, wang peter wrote: > dear phd John David: > i am gao shan from china. another problem is > i have used gmbuilder on red hat linux. > my postgresql work well and i have created all the tables > by sql sricpts. > when i load uniport xml file, it showed "" > but no data were import in database > my configuration as follows: > > hibernate.connection.url: jdbc:postgresql://localhost:5432/uniprot > hibernate.connection.driver_class: org.postgresql.Driver > hibernate.connection.username: postgres > hibernate.connection.password: postgres > hibernate.dialect: org.hibernate.dialect.PostgreSQLDialect |
From: John D. N. D. <do...@lm...> - 2009-10-29 17:00:15
|
Greetings, Sorry for the delay in getting back to you. Probably the quickest way for you to get going for now is to access the GenMAPP Builder source code. The code has a fully functional UniProt importer component; you can pattern your own code against that. When you have the code, and specific questions come up, just post them to this mailing list and we can address them. GenMAPP Builder is an ant-built Java project, and its source can be retrieved using Subversion via this URL: https://xmlpipedb.svn.sourceforge.net/svnroot/xmlpipedb/trunk/gmbuilder The directory tree is self-contained. It does use our related XMLPipeDB Utilities and UniProtDB/GODB subprojects, but they are included as binaries in the above code base instead of having to be compiled from source. Hope this helps! John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Oct 26, 2009, at 8:05 PM, wang peter wrote: > dear sir or madam: > i am from a bioinformaitcs group, nankai university,china. > thank you for your XMLPipeDB utilities files. > it is a very useful tool for us. > i have used xsd2db tools to translate xsd document and > construct the structure of UNIPROT database including > 61 tables. the next step, i want to transfer data from xml document > to relational database of UNIPROT on our local computers.i have read > the manual carefully. it is too simple about how to use those > classes in xpdutils-2.1.zip. the key problem is > the sample has no attached source codes. > can you give me some information about how to use those > classess to develop our project, or give us > some source codes to construct UNIPROT locally. > > i will write acknowledgement to you on any papers which used your > tools thank u very much > best regard > gao shan > > |
From: wang p. <wng...@gm...> - 2009-10-27 03:05:25
|
dear sir or madam: i am from a bioinformaitcs group, nankai university,china. thank you for your XMLPipeDB utilities files. it is a very useful tool for us. i have used xsd2db tools to translate xsd document and construct the structure of UNIPROT database including 61 tables. the next step, i want to transfer data from xml document to relational database of UNIPROT on our local computers.i have read the manual carefully. it is too simple about how to use those classes in xpdutils-2.1.zip. the key problem is the sample has no attached source codes. can you give me some information about how to use those classess to develop our project, or give us some source codes to construct UNIPROT locally. i will write acknowledgement to you on any papers which used your tools thank u very much best regard gao shan |
From: Kam D. <kda...@lm...> - 2009-10-27 00:12:04
|
Hi, The export finished and looks good. I'll file a complete report on the SourceForge wiki. There are differences between this gdb and our release version, but in a good way--we increased in our counts of genes. One thing we could consider the students doing tomorrow is to run MAPPFinder twice, once with each version of the database. Kam At 02:45 PM 10/26/2009, Kam Dahlquist wrote: >Hi, > >I'm testing with gmb2b36. Importing GO at the moment and will be doing an >export when that finishes. I'll let you know how it goes. > >Kam > >At 01:29 PM 10/26/2009, John David N. Dionisio wrote: > >I was able to replicate the error and addressed the issue with a new > >2.0b36 release. The issue was the Java version that was used when > >building 2.0b35. I tested 2.0b36 on a machine on which 2.0b35 failed, > >and it worked there. > > > >Let me know if 2.0b36 now fares better on your test machine too. > > > >John David N. Dionisio, PhD > >Assistant Professor, Computer Science > >Loyola Marymount University > > > > > >On Oct 26, 2009, at 9:47 AM, Kam Dahlquist wrote: > > > >>Hi, > >> > >>I downloaded the new version and tried to launch it using the .bat > >>file and it throws an exception and quits so fast that I can't read > >>the error message on the console. > >> > >>Kam > > > >------------------------------------------------------------------------------ >Come build with us! The BlackBerry(R) Developer Conference in SF, CA >is the only developer event you need to attend this year. Jumpstart your >developing skills, take BlackBerry mobile applications to market and stay >ahead of the curve. Join us from November 9 - 12, 2009. Register now! >http://p.sf.net/sfu/devconference >_______________________________________________ >xmlpipedb-developer mailing list >xml...@li... >https://lists.sourceforge.net/lists/listinfo/xmlpipedb-developer |
From: John D. N. D. <do...@lm...> - 2009-10-26 23:38:29
|
I was able to replicate the error and addressed the issue with a new 2.0b36 release. The issue was the Java version that was used when building 2.0b35. I tested 2.0b36 on a machine on which 2.0b35 failed, and it worked there. Let me know if 2.0b36 now fares better on your test machine too. John David N. Dionisio, PhD Assistant Professor, Computer Science Loyola Marymount University On Oct 26, 2009, at 9:47 AM, Kam Dahlquist wrote: > Hi, > > I downloaded the new version and tried to launch it using the .bat > file and it throws an exception and quits so fast that I can't read > the error message on the console. > > Kam |
From: Kam D. <kda...@lm...> - 2009-10-26 21:45:30
|
Hi, I'm testing with gmb2b36. Importing GO at the moment and will be doing an export when that finishes. I'll let you know how it goes. Kam At 01:29 PM 10/26/2009, John David N. Dionisio wrote: >I was able to replicate the error and addressed the issue with a new >2.0b36 release. The issue was the Java version that was used when >building 2.0b35. I tested 2.0b36 on a machine on which 2.0b35 failed, >and it worked there. > >Let me know if 2.0b36 now fares better on your test machine too. > >John David N. Dionisio, PhD >Assistant Professor, Computer Science >Loyola Marymount University > > >On Oct 26, 2009, at 9:47 AM, Kam Dahlquist wrote: > >>Hi, >> >>I downloaded the new version and tried to launch it using the .bat >>file and it throws an exception and quits so fast that I can't read >>the error message on the console. >> >>Kam |