You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(11) |
Jul
(34) |
Aug
(14) |
Sep
(10) |
Oct
(10) |
Nov
(11) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(56) |
Feb
(76) |
Mar
(68) |
Apr
(11) |
May
(97) |
Jun
(16) |
Jul
(29) |
Aug
(35) |
Sep
(18) |
Oct
(32) |
Nov
(23) |
Dec
(77) |
2004 |
Jan
(52) |
Feb
(44) |
Mar
(55) |
Apr
(38) |
May
(106) |
Jun
(82) |
Jul
(76) |
Aug
(47) |
Sep
(36) |
Oct
(56) |
Nov
(46) |
Dec
(61) |
2005 |
Jan
(52) |
Feb
(118) |
Mar
(41) |
Apr
(40) |
May
(35) |
Jun
(99) |
Jul
(84) |
Aug
(104) |
Sep
(53) |
Oct
(107) |
Nov
(68) |
Dec
(30) |
2006 |
Jan
(19) |
Feb
(27) |
Mar
(24) |
Apr
(9) |
May
(22) |
Jun
(11) |
Jul
(34) |
Aug
(8) |
Sep
(15) |
Oct
(55) |
Nov
(16) |
Dec
(2) |
2007 |
Jan
(12) |
Feb
(4) |
Mar
(8) |
Apr
|
May
(19) |
Jun
(3) |
Jul
(1) |
Aug
(6) |
Sep
(12) |
Oct
(3) |
Nov
|
Dec
|
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(21) |
2009 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
(19) |
Jun
(14) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(22) |
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Michael S. <msa...@pc...> - 2005-06-16 12:34:02
|
Hi Jian, We're trying to put together a group discount at a local hotel. I hope to have more details about this, other hotels, and transportation later today. Thanks, Mike On 6/14/05 10:38 AM, "Jian Lu" <jl...@vb...> wrote: > Could you also provide information about local hotel and transportation? > > Michael Saffitz wrote: > >> All, >> >> The GUS workshop is quickly approaching, and we have several fantastic >> speakers and workshops lined up. The full agenda will be ready shortly. In >> the meantime, we now have a GUS Workshop website available at: >> >> http://gusdb.org/workshop/ >> >> In addition to the website, there is now a registration form available for >> the workshop. If you're planning on attending, please take a minute to >> complete the short registration form so that we can get an approximate head >> count: >> >> http://www.gusdb.org/workshop/registration.php >> >> Thanks for your interest and participation! >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput >> a projector? How fast can you ride your desk chair down the office luge >> track? >> If you want to score the big prize, get to know the little guy. >> Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Jian Lu <jl...@vb...> - 2005-06-16 08:40:20
|
Could you also provide information about local hotel and transportation? Michael Saffitz wrote: >All, > >The GUS workshop is quickly approaching, and we have several fantastic >speakers and workshops lined up. The full agenda will be ready shortly. In >the meantime, we now have a GUS Workshop website available at: > >http://gusdb.org/workshop/ > >In addition to the website, there is now a registration form available for >the workshop. If you're planning on attending, please take a minute to >complete the short registration form so that we can get an approximate head >count: > >http://www.gusdb.org/workshop/registration.php > >Thanks for your interest and participation! > > > > >------------------------------------------------------- >This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput >a projector? How fast can you ride your desk chair down the office luge track? >If you want to score the big prize, get to know the little guy. >Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >_______________________________________________ >Gusdev-gusdev mailing list >Gus...@li... >https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > |
From: Amruta J. <am...@st...> - 2005-06-15 23:31:12
|
Hi, I want to create a sub-class view on NAFeatureImp table and insert tuples into it using the SubmitRow plugin. I've created the view. A corresponding row also needs to be entered in the core.TableInfo relation. Is there some other metadata that needs to be updated while creating a new view? thanks, Amruta --- Research Assistant, Relman Lab, Stanford University, CA - 94305. |
From: Angel P. <an...@ma...> - 2005-06-14 17:55:41
|
Sorry for the late reply on this, but I would like to put this conversation in a proper context. Let's review what the GBParser currently does: First, GBParser has no function for restarting other than looping through the records until it finds one that needs update or insert. Any GB record has a modification date associated with the record. Accessions and modification dates are stored in the NAEntry table. Any GB record that does not match the data stored in NAEntry get put through the update process, all others get skipped. A GUS object trees are made from the database entry (dbTree) and the record in the flatfile (ffTree). Each feature of the dbTree is tried to be matched to each feature of the ffTree by scoring how close the values are. Perfect matches are deleted from the ffTree. Any feature not matched in the dbTree is marked deleted. Any feature left in the ffTree is added to dbTree. We need the new algorithm because: 1- This matching is not optimal, and a MD5SUM would come in very handy. 2- We cannot rely on other external DBs to provide modification dates, hence the need for the checksum on the sequence entry. (more comments below) Steve Fischer wrote: > i am not persuaded that this functionality will be used by many other > plugins. Most do inserts, not updates. And, many that do updates > are given difference files and have stable identifiers, so the > problems of this plugin don't apply. > If what steve says is true, then we do not need a table to store md5sums, since GB entries can rely on the modification date in the NAEntry table to designate when there should be an update operation. I would like to know at least one other plugin that using a checksum table would be useful for in order to see some value for a database table. > as far as the schema is concerned, you've reminded me that i left > something out. we need to have a fourth column, giving this: > digest, primary_key, type, ext_db_rls_id > the ext_db_rls_id differentiates different datasets stored in the table. > I don't think so. The primary key will change across different different datasets (e.g. external_db_rls_ids). So to summarize, I am not convinced that a table is needed more than the current load process to enable re-starts. If this is true, then a simple flat file log will do. I am also not convinced that we need this file at all if we can efficiently compute these values on the fly from the DB and flat file entries. Last, before we go through the trouble of implementing this, I would like to see it be useful for other plugins. -angel |
From: Michael S. <msa...@pc...> - 2005-06-14 13:33:39
|
All, The GUS workshop is quickly approaching, and we have several fantastic speakers and workshops lined up. The full agenda will be ready shortly. In the meantime, we now have a GUS Workshop website available at: http://gusdb.org/workshop/ In addition to the website, there is now a registration form available for the workshop. If you're planning on attending, please take a minute to complete the short registration form so that we can get an approximate head count: http://www.gusdb.org/workshop/registration.php Thanks for your interest and participation! |
From: Michael S. <msa...@pc...> - 2005-06-13 19:30:59
|
All, I've made a svn web viewer available at: https://www.cbil.upenn.edu/svnweb/ I'll be tweaking the configuration over then next few days, but I wanted to get it out there as soon as possible to support the development that's going on. --Mike |
From: Steve F. <sfi...@pc...> - 2005-06-13 16:02:37
|
yeah, i think you're right. steve Aaron J. Mackey wrote: > > On Jun 12, 2005, at 10:28 PM, Steve Fischer wrote: > >> Because we are treating a feature tree as a unit, all the features >> that are in a tree will have the same digest. They will each have >> their own row in the DigestTable. > > > Why is this? I would have expected only one row for the top-level > parent feature. What facility does the duplication provide? > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: am...@pc... > office: 215-898-1205 > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you > shotput > a projector? How fast can you ride your desk chair down the office > luge track? > If you want to score the big prize, get to know the little guy. Play > to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Aaron J. M. <am...@pc...> - 2005-06-13 15:26:38
|
On Jun 12, 2005, at 10:28 PM, Steve Fischer wrote: > Because we are treating a feature tree as a unit, all the features > that are in a tree will have the same digest. They will each have > their own row in the DigestTable. Why is this? I would have expected only one row for the top-level parent feature. What facility does the duplication provide? -Aaron -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: am...@pc... office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 |
From: Steve F. <sfi...@pc...> - 2005-06-13 14:26:48
|
i am not persuaded that this functionality will be used by many other plugins. Most do inserts, not updates. And, many that do updates are given difference files and have stable identifiers, so the problems of this plugin don't apply. We'll start out with the table in App space, and when we have a second plugin that needs it we'll move it over to core. as far as the schema is concerned, you've reminded me that i left something out. we need to have a fourth column, giving this: digest, primary_key, type, ext_db_rls_id the ext_db_rls_id differentiates different datasets stored in the table. you mentioned data and algorithm stuff. that is handled by the standard GUS overhead rows, which this table must have if its going to be a GUS object. i am confused by your proposal that the life-time of the digest data is the life-time of the version. data is versioned continually, whenever it is modified. I said the lifetime of the project. By that i mean the digest data must remain as long as it possible that there may be any more updates. (maybe that's what you meant) steve Ed Robinson wrote: >I followed along and thought everything was great until you >created the state table. If we are going to make a state >table, I would recommend finding someplace for it in the >schema, preferrably core. What we are creating here is a >methodology that all plugins should follow, so we don't want >to recreate another case of plugins competing for temp_table >names which is even worse that not specifying controlled >vocabularies. > >If checksums and restarts are going to be a standard part of >our architecture, than we need to make the entire architecture >transparent by making the table a permanent part of the >architecture and all plugins should use the same table. The >data should remain for the life-time of the version, not the >project. i.e. this table should disappear when the data >loaded is versioned and passed to the version tables. So long >as the data is live, i.e. updatable, you will need this state >information. My suggestion is the following: > >Core.DataDigest >Date, Digest, type, primary_key, AlgorithmID (to id the >plugin), Algorithm_version. > >Also, type in this case, is up the plugin. LSF would have two >types, Seq and Feats, other plugins could have whatever types >they want to checksum. This field does NOT need to be >controlled because the key is mutli-column (it includes the >AlgID). > >-ed > > > >---- Original message ---- > > >>Date: Sun, 12 Jun 2005 22:28:48 -0400 >>From: Steve Fischer <sfi...@pc...> >>Subject: [GUSDEV] using checksums for loading seqs and features >>To: gusdev-gusdev <gus...@li...>, >> >> >an...@ma... > > >>folks- >> >>LoadSequencesAndFeatures is a new name for >> >> >LoadAnnotatedSequences, the > > >>replacement for the GBParser and the TIGR xml and EMBL >> >> >plugins that Ed > > >>developed. (Aaron felt that "annotated sequences" connoted an >>annotation center's output while the plugin is broader than >> >> >that...) > > >>Aaron and I have come up a design for using digests (MD5) to >> >> >help manage > > >>restart and updating. Using this design the logic of the >> >> >plugin is the > > >>same whether doing an insert, a restart or an update. >> >>The design requires state in the database. Rather than >> >> >pollute the GUS > > >>schema with it, the plugin will take as a command line >> >> >argument the name > > >>of an application specific table that has three columns: >> >> >digest, type > > >>(seq or feat), primary_key. The table persists for the >> >> >duration of the > > >>project. We'll call it DigestTable here. DigestTable must >> >> >also have > > >>a GUS object for itself if we want transaction level robustness. >> >>We assume for now that the organism we are using isn't too >> >> >huge, ie, > > >>that we can hold DigestTable in memory. >> >>SEQUENCES >> >>Initialization: >> - read the digests for the sequences from DigestTable. >> >> >write them > > >>into a hash, with the digest as a key and the na_sequence_id >> >> >as the > > >>value. This is the SequenceDigest hash >> - read the source_ids for the sequences from GUS, and place >> >> >them as a > > >>key in a hash, and put their na_sequence_id as value. This >> >> >is the > > >>SequenceSourceId hash >> >>For each sequence: >> - create the digest as follows: >> - unpack all the info from the bioperl sequence >> >> >object and its > > >>children, but excluding feature children. >> - unpack it into a hash, with the name of the >> >> >attribute as key > > >>and the value as value. >> - for weakly typed fields, use the tag name as key >> >> >and the value > > >>as the value. >> - loop through the keys in sorted order (using Perl's >> >> >sort), and > > >>concatenate the values into a string >> - pass the string to the MD5 processor >> - create a DigestTable object from the na_sequence_id >> >> >and the > > >>digest value >> - add that object as a child of the NASequence >> >> - use the digest as an index into the SequenceDigest hash. >> >> > if it is > > >>found then the sequence record in the db is fine. if it is >> >> >not found > > >>then either: >> - if it is not in the SequenceSourceId hash then it >> >> >is a new > > >>sequence, in which case we do a normal insert >> - otherwise we fall into update logic. We trace >> >> >the objects > > >>that are associated with this sequence in the database >> >> >(excluding > > >>features) to get their foreign keys, build up an updated gus >> >> >object > > >>tree, and submit, letting the object layer handle the update. >> >> - when we submit the sequence the DigestTable child object >> >> >will be > > >>submitted as part of the same transaction. >> >>Because sequences have stable identifiers (source_ids), it is >> >> >possible > > >>for us to identify a sequence in the database even if some of >> >> >its values > > >>have changed. this allows us to do a real update and, in >> >> >theory, to > > >>keep some of the analysis against the sequence if irrelevant >> >> >bits of it > > >>have changed. >> >>FEATURES >> >>Features, however, are different. They don't have stable >> >> >ids. Nor do > > >>they have alternate keys (no, type and location is not good >> >> >enough). > > >>This means that if a feature has changed, we have no choice >> >> >but to take > > >>the delete-and-insert approach to updating. Here is how we >> >> >do it.... > > >>Initialization: read from DigestTable and create the >> >> >FeatureDigest hash > > >>with digest as key and na_feature_id as value. >> >>Because we are treating a feature tree as a unit, all the >> >> >features that > > >>are in a tree will have the same digest. They will each >> >> >have their own > > >>row in the DigestTable. >> >>For each bioperl feature tree: >> - generate a string representation of the feature tree by: >> - initializing an empty string to hold the string >> >> >version of > > >>the feature tree >> - recursively traversing the tree in a reproducible way >> - for each individual feature (nodes of the tree), >> >> >get all its > > >>values, sort by tag name, and concatenate to the growing string >> - when done recursing, cmake a digest with that string >> - use the digest as an index into the FeatureDigestHash >> - if we find one or more features, then the feature >> >> >tree is ok > > >>remove those features from the FeatureDigestHash >> - if we don't find any: >> - for each feature in the tree, make a new >> >> >DigestTable > > >>object with the tree's digest and the feature's feature_id. >> >> > add each > > >>DigestTable object to the corresponding feature >> - insert the tree >> >>When all features have been processed, delete from the >> >> >database any > > >>feature remaining in the FeatureDigestHash. >> >>steve >> >> >> >> >> >> >>------------------------------------------------------- >>This SF.Net email is sponsored by: NEC IT Guy Games. How far >> >> >can you shotput > > >>a projector? How fast can you ride your desk chair down the >> >> >office luge track? > > >>If you want to score the big prize, get to know the little guy. >>Play to win an NEC 61" plasma display: >> >> >http://www.necitguy.com/?r=20 > > >>_______________________________________________ >>Gusdev-gusdev mailing list >>Gus...@li... >>https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> >----------------- >Ed Robinson >Center for Tropical and Emerging Global Diseases >University of Georgia, Athens, GA 30602 >ero...@ug.../(706)542.1447/254.8883 > > |
From: Ed R. <ero...@ug...> - 2005-06-13 14:09:56
|
I followed along and thought everything was great until you created the state table. If we are going to make a state table, I would recommend finding someplace for it in the schema, preferrably core. What we are creating here is a methodology that all plugins should follow, so we don't want to recreate another case of plugins competing for temp_table names which is even worse that not specifying controlled vocabularies. If checksums and restarts are going to be a standard part of our architecture, than we need to make the entire architecture transparent by making the table a permanent part of the architecture and all plugins should use the same table. The data should remain for the life-time of the version, not the project. i.e. this table should disappear when the data loaded is versioned and passed to the version tables. So long as the data is live, i.e. updatable, you will need this state information. My suggestion is the following: Core.DataDigest Date, Digest, type, primary_key, AlgorithmID (to id the plugin), Algorithm_version. Also, type in this case, is up the plugin. LSF would have two types, Seq and Feats, other plugins could have whatever types they want to checksum. This field does NOT need to be controlled because the key is mutli-column (it includes the AlgID). -ed ---- Original message ---- >Date: Sun, 12 Jun 2005 22:28:48 -0400 >From: Steve Fischer <sfi...@pc...> >Subject: [GUSDEV] using checksums for loading seqs and features >To: gusdev-gusdev <gus...@li...>, an...@ma... > >folks- > >LoadSequencesAndFeatures is a new name for LoadAnnotatedSequences, the >replacement for the GBParser and the TIGR xml and EMBL plugins that Ed >developed. (Aaron felt that "annotated sequences" connoted an >annotation center's output while the plugin is broader than that...) > >Aaron and I have come up a design for using digests (MD5) to help manage >restart and updating. Using this design the logic of the plugin is the >same whether doing an insert, a restart or an update. > >The design requires state in the database. Rather than pollute the GUS >schema with it, the plugin will take as a command line argument the name >of an application specific table that has three columns: digest, type >(seq or feat), primary_key. The table persists for the duration of the >project. We'll call it DigestTable here. DigestTable must also have >a GUS object for itself if we want transaction level robustness. > >We assume for now that the organism we are using isn't too huge, ie, >that we can hold DigestTable in memory. > >SEQUENCES > >Initialization: > - read the digests for the sequences from DigestTable. write them >into a hash, with the digest as a key and the na_sequence_id as the >value. This is the SequenceDigest hash > - read the source_ids for the sequences from GUS, and place them as a >key in a hash, and put their na_sequence_id as value. This is the >SequenceSourceId hash > >For each sequence: > - create the digest as follows: > - unpack all the info from the bioperl sequence object and its >children, but excluding feature children. > - unpack it into a hash, with the name of the attribute as key >and the value as value. > - for weakly typed fields, use the tag name as key and the value >as the value. > - loop through the keys in sorted order (using Perl's sort), and >concatenate the values into a string > - pass the string to the MD5 processor > - create a DigestTable object from the na_sequence_id and the >digest value > - add that object as a child of the NASequence > > - use the digest as an index into the SequenceDigest hash. if it is >found then the sequence record in the db is fine. if it is not found >then either: > - if it is not in the SequenceSourceId hash then it is a new >sequence, in which case we do a normal insert > - otherwise we fall into update logic. We trace the objects >that are associated with this sequence in the database (excluding >features) to get their foreign keys, build up an updated gus object >tree, and submit, letting the object layer handle the update. > > - when we submit the sequence the DigestTable child object will be >submitted as part of the same transaction. > >Because sequences have stable identifiers (source_ids), it is possible >for us to identify a sequence in the database even if some of its values >have changed. this allows us to do a real update and, in theory, to >keep some of the analysis against the sequence if irrelevant bits of it >have changed. > >FEATURES > >Features, however, are different. They don't have stable ids. Nor do >they have alternate keys (no, type and location is not good enough). >This means that if a feature has changed, we have no choice but to take >the delete-and-insert approach to updating. Here is how we do it.... > >Initialization: read from DigestTable and create the FeatureDigest hash >with digest as key and na_feature_id as value. > >Because we are treating a feature tree as a unit, all the features that >are in a tree will have the same digest. They will each have their own >row in the DigestTable. > >For each bioperl feature tree: > - generate a string representation of the feature tree by: > - initializing an empty string to hold the string version of >the feature tree > - recursively traversing the tree in a reproducible way > - for each individual feature (nodes of the tree), get all its >values, sort by tag name, and concatenate to the growing string > - when done recursing, cmake a digest with that string > - use the digest as an index into the FeatureDigestHash > - if we find one or more features, then the feature tree is ok >remove those features from the FeatureDigestHash > - if we don't find any: > - for each feature in the tree, make a new DigestTable >object with the tree's digest and the feature's feature_id. add each >DigestTable object to the corresponding feature > - insert the tree > >When all features have been processed, delete from the database any >feature remaining in the FeatureDigestHash. > >steve > > > > > > >------------------------------------------------------- >This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput >a projector? How fast can you ride your desk chair down the office luge track? >If you want to score the big prize, get to know the little guy. >Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >_______________________________________________ >Gusdev-gusdev mailing list >Gus...@li... >https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev ----------------- Ed Robinson Center for Tropical and Emerging Global Diseases University of Georgia, Athens, GA 30602 ero...@ug.../(706)542.1447/254.8883 |
From: Steve F. <sfi...@pc...> - 2005-06-13 02:28:35
|
folks- LoadSequencesAndFeatures is a new name for LoadAnnotatedSequences, the replacement for the GBParser and the TIGR xml and EMBL plugins that Ed developed. (Aaron felt that "annotated sequences" connoted an annotation center's output while the plugin is broader than that...) Aaron and I have come up a design for using digests (MD5) to help manage restart and updating. Using this design the logic of the plugin is the same whether doing an insert, a restart or an update. The design requires state in the database. Rather than pollute the GUS schema with it, the plugin will take as a command line argument the name of an application specific table that has three columns: digest, type (seq or feat), primary_key. The table persists for the duration of the project. We'll call it DigestTable here. DigestTable must also have a GUS object for itself if we want transaction level robustness. We assume for now that the organism we are using isn't too huge, ie, that we can hold DigestTable in memory. SEQUENCES Initialization: - read the digests for the sequences from DigestTable. write them into a hash, with the digest as a key and the na_sequence_id as the value. This is the SequenceDigest hash - read the source_ids for the sequences from GUS, and place them as a key in a hash, and put their na_sequence_id as value. This is the SequenceSourceId hash For each sequence: - create the digest as follows: - unpack all the info from the bioperl sequence object and its children, but excluding feature children. - unpack it into a hash, with the name of the attribute as key and the value as value. - for weakly typed fields, use the tag name as key and the value as the value. - loop through the keys in sorted order (using Perl's sort), and concatenate the values into a string - pass the string to the MD5 processor - create a DigestTable object from the na_sequence_id and the digest value - add that object as a child of the NASequence - use the digest as an index into the SequenceDigest hash. if it is found then the sequence record in the db is fine. if it is not found then either: - if it is not in the SequenceSourceId hash then it is a new sequence, in which case we do a normal insert - otherwise we fall into update logic. We trace the objects that are associated with this sequence in the database (excluding features) to get their foreign keys, build up an updated gus object tree, and submit, letting the object layer handle the update. - when we submit the sequence the DigestTable child object will be submitted as part of the same transaction. Because sequences have stable identifiers (source_ids), it is possible for us to identify a sequence in the database even if some of its values have changed. this allows us to do a real update and, in theory, to keep some of the analysis against the sequence if irrelevant bits of it have changed. FEATURES Features, however, are different. They don't have stable ids. Nor do they have alternate keys (no, type and location is not good enough). This means that if a feature has changed, we have no choice but to take the delete-and-insert approach to updating. Here is how we do it.... Initialization: read from DigestTable and create the FeatureDigest hash with digest as key and na_feature_id as value. Because we are treating a feature tree as a unit, all the features that are in a tree will have the same digest. They will each have their own row in the DigestTable. For each bioperl feature tree: - generate a string representation of the feature tree by: - initializing an empty string to hold the string version of the feature tree - recursively traversing the tree in a reproducible way - for each individual feature (nodes of the tree), get all its values, sort by tag name, and concatenate to the growing string - when done recursing, cmake a digest with that string - use the digest as an index into the FeatureDigestHash - if we find one or more features, then the feature tree is ok remove those features from the FeatureDigestHash - if we don't find any: - for each feature in the tree, make a new DigestTable object with the tree's digest and the feature's feature_id. add each DigestTable object to the corresponding feature - insert the tree When all features have been processed, delete from the database any feature remaining in the FeatureDigestHash. steve |
From: Eric E. S. <es...@vb...> - 2005-06-10 20:27:48
|
As the original poster, I should say that the XML-based loaders people have mentioned (thanks, BTW!) are the ultimate solution I was looking for. While the specific problem I have today came to me as a tab delimited file (TIGR annotation from their web site), the data must be mapped to the schema somehow; XML seems like the obvious way to do that. If it simultaneously addresses the other issues, then we are way ahead of the game. As Pablo suggests, maybe the real solution to the specific problems of tab delimited data as a medium of exchange should be addressed by encouraging adoption of XML by more data providers. >From: "Pablo N. Mendes" <pa...@pa...> >Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files >Date: Fri, 10 Jun 2005 11:05:58 -0300 >Hi folks, >I find working with tab delimited files quite uncomfortable and >sometimes dangerous. We don't have ways to check well formedness or >schema compliance (like in XML with XSDs or DTDs). This could cause >execution halts after long time running or worse: wrong data loaded >into the database. >Any thoughts on this? |
From: Michael S. <msa...@pc...> - 2005-06-10 19:09:40
|
No, it looks like it may have been an end of line conversion issue. There are some additional options to address this in the conversion utility. For future switches I'll explore the trade-offs of using that. Fortunately there aren't too many binary files floating around the repository, so this is a relatively minor issue. --Mike On 6/10/05 2:58 PM, "Aaron J. Mackey" <am...@pc...> wrote: > Did you do CVS keyword substitution during the CVS export? If so, > you mangled any binary file that happened to have a keyword-like > "string" in it. > > -Aaron > > On Jun 10, 2005, at 2:52 PM, Michael Saffitz wrote: > >> >> Folks, >> >> We have our first subversion switch issue. It looks like binary >> files (i.e. >> jars) did not get properly handled in the move. I'm addressing >> them as I >> get errors, which can be strange: >> >> [javac] An exception has occurred in the compiler (1.4.2_04). >> Please >> file a bug at the Java Developer Connection >> (http://java.sun.com/cgi-bin/bugreport.cgi) after checking the Bug >> Parade >> for duplicates. Include your program and the following diagnostic >> in your >> report. Thank you. >> [javac] java.lang.InternalError: jzentry == 0, >> [javac] jzfile = -1502537536, >> [javac] total = 103, >> [javac] name = /home/msaffitz/pggus/gus/lib/java/pg74jdbc3.jar, >> [javac] i = 1, >> [javac] message = invalid LOC header (bad signature) >> [javac] at java.util.zip.ZipFile$2.nextElement(ZipFile.java: >> 321) >> [javac] at >> com.sun.tools.javac.v8.code.ClassReader.openArchive >> (ClassReader.java:975) >> [javac] at >> com.sun.tools.javac.v8.code.ClassReader.list(ClassReader.java:1218) >> [javac] at >> com.sun.tools.javac.v8.code.ClassReader.listAll(ClassReader.java:1339) >> [javac] at >> com.sun.tools.javac.v8.code.ClassReader.fillIn(ClassReader.java:1361) >> [javac] at >> com.sun.tools.javac.v8.code.ClassReader.complete(ClassReader.java: >> 1052) >> [javac] at >> com.sun.tools.javac.v8.code.Symbol.complete(Symbol.java:372) >> [javac] at >> com.sun.tools.javac.v8.comp.Enter.visitTopLevel(Enter.java:467) >> [javac] at >> com.sun.tools.javac.v8.tree.Tree$TopLevel.accept(Tree.java:390) >> [javac] at >> com.sun.tools.javac.v8.comp.Enter.classEnter(Enter.java:442) >> [javac] at >> com.sun.tools.javac.v8.comp.Enter.classEnter(Enter.java:456) >> [javac] at >> com.sun.tools.javac.v8.comp.Enter.complete(Enter.java:596) >> [javac] at com.sun.tools.javac.v8.comp.Enter.main >> (Enter.java:582) >> [javac] at >> com.sun.tools.javac.v8.JavaCompiler.compile(JavaCompiler.java:331) >> [javac] at com.sun.tools.javac.v8.Main.compile(Main.java:569) >> [javac] at com.sun.tools.javac.Main.compile(Main.java:36) >> [javac] at com.sun.tools.javac.Main.main(Main.java:27) >> >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. How far can >> you shotput >> a projector? How fast can you ride your desk chair down the office >> luge track? >> If you want to score the big prize, get to know the little guy. >> Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > > -- > Aaron J. Mackey, Ph.D. > Project Manager, ApiDB Bioinformatics Resource Center > Penn Genomics Institute, University of Pennsylvania > email: am...@pc... > office: 215-898-1205 > fax: 215-746-6697 > postal: Penn Genomics Institute > Goddard Labs 212 > 415 S. University Avenue > Philadelphia, PA 19104-6017 |
From: Aaron J. M. <am...@pc...> - 2005-06-10 18:58:35
|
Did you do CVS keyword substitution during the CVS export? If so, you mangled any binary file that happened to have a keyword-like "string" in it. -Aaron On Jun 10, 2005, at 2:52 PM, Michael Saffitz wrote: > > Folks, > > We have our first subversion switch issue. It looks like binary > files (i.e. > jars) did not get properly handled in the move. I'm addressing > them as I > get errors, which can be strange: > > [javac] An exception has occurred in the compiler (1.4.2_04). > Please > file a bug at the Java Developer Connection > (http://java.sun.com/cgi-bin/bugreport.cgi) after checking the Bug > Parade > for duplicates. Include your program and the following diagnostic > in your > report. Thank you. > [javac] java.lang.InternalError: jzentry == 0, > [javac] jzfile = -1502537536, > [javac] total = 103, > [javac] name = /home/msaffitz/pggus/gus/lib/java/pg74jdbc3.jar, > [javac] i = 1, > [javac] message = invalid LOC header (bad signature) > [javac] at java.util.zip.ZipFile$2.nextElement(ZipFile.java: > 321) > [javac] at > com.sun.tools.javac.v8.code.ClassReader.openArchive > (ClassReader.java:975) > [javac] at > com.sun.tools.javac.v8.code.ClassReader.list(ClassReader.java:1218) > [javac] at > com.sun.tools.javac.v8.code.ClassReader.listAll(ClassReader.java:1339) > [javac] at > com.sun.tools.javac.v8.code.ClassReader.fillIn(ClassReader.java:1361) > [javac] at > com.sun.tools.javac.v8.code.ClassReader.complete(ClassReader.java: > 1052) > [javac] at > com.sun.tools.javac.v8.code.Symbol.complete(Symbol.java:372) > [javac] at > com.sun.tools.javac.v8.comp.Enter.visitTopLevel(Enter.java:467) > [javac] at > com.sun.tools.javac.v8.tree.Tree$TopLevel.accept(Tree.java:390) > [javac] at > com.sun.tools.javac.v8.comp.Enter.classEnter(Enter.java:442) > [javac] at > com.sun.tools.javac.v8.comp.Enter.classEnter(Enter.java:456) > [javac] at > com.sun.tools.javac.v8.comp.Enter.complete(Enter.java:596) > [javac] at com.sun.tools.javac.v8.comp.Enter.main > (Enter.java:582) > [javac] at > com.sun.tools.javac.v8.JavaCompiler.compile(JavaCompiler.java:331) > [javac] at com.sun.tools.javac.v8.Main.compile(Main.java:569) > [javac] at com.sun.tools.javac.Main.compile(Main.java:36) > [javac] at com.sun.tools.javac.Main.main(Main.java:27) > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can > you shotput > a projector? How fast can you ride your desk chair down the office > luge track? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: am...@pc... office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 |
From: Michael S. <msa...@pc...> - 2005-06-10 18:52:20
|
Folks, We have our first subversion switch issue. It looks like binary files (i.e. jars) did not get properly handled in the move. I'm addressing them as I get errors, which can be strange: [javac] An exception has occurred in the compiler (1.4.2_04). Please file a bug at the Java Developer Connection (http://java.sun.com/cgi-bin/bugreport.cgi) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you. [javac] java.lang.InternalError: jzentry == 0, [javac] jzfile = -1502537536, [javac] total = 103, [javac] name = /home/msaffitz/pggus/gus/lib/java/pg74jdbc3.jar, [javac] i = 1, [javac] message = invalid LOC header (bad signature) [javac] at java.util.zip.ZipFile$2.nextElement(ZipFile.java:321) [javac] at com.sun.tools.javac.v8.code.ClassReader.openArchive(ClassReader.java:975) [javac] at com.sun.tools.javac.v8.code.ClassReader.list(ClassReader.java:1218) [javac] at com.sun.tools.javac.v8.code.ClassReader.listAll(ClassReader.java:1339) [javac] at com.sun.tools.javac.v8.code.ClassReader.fillIn(ClassReader.java:1361) [javac] at com.sun.tools.javac.v8.code.ClassReader.complete(ClassReader.java:1052) [javac] at com.sun.tools.javac.v8.code.Symbol.complete(Symbol.java:372) [javac] at com.sun.tools.javac.v8.comp.Enter.visitTopLevel(Enter.java:467) [javac] at com.sun.tools.javac.v8.tree.Tree$TopLevel.accept(Tree.java:390) [javac] at com.sun.tools.javac.v8.comp.Enter.classEnter(Enter.java:442) [javac] at com.sun.tools.javac.v8.comp.Enter.classEnter(Enter.java:456) [javac] at com.sun.tools.javac.v8.comp.Enter.complete(Enter.java:596) [javac] at com.sun.tools.javac.v8.comp.Enter.main(Enter.java:582) [javac] at com.sun.tools.javac.v8.JavaCompiler.compile(JavaCompiler.java:331) [javac] at com.sun.tools.javac.v8.Main.compile(Main.java:569) [javac] at com.sun.tools.javac.Main.compile(Main.java:36) [javac] at com.sun.tools.javac.Main.main(Main.java:27) |
From: Michael S. <msa...@pc...> - 2005-06-10 15:31:00
|
All, The move to subversion for the GUS repository (GUS, WDK, WDKToySite, and install projects) is complete. Please review the documentation at: https://www.gusdb.org/wiki/index.php/UsingSubversion Specifically, you _must_ subscribe to the GUS-commits mailing list if you plan on committing to the repository. (See documentation). Please go easy on the repository for the next few days, and let me know if you have any issues. Note: When writing to the repository, you will be prompted for a password. This is the same password that you use on the Wikis. Thanks, Mike |
From: Ed R. <ero...@ug...> - 2005-06-10 14:58:13
|
It is robust for embl but it is not fully tested for TIGR. The main tigr dataset we use is not consistent with the TIGR DTD, so the testing on that data is incomplete. If anyone uses the plugin, they will have to update the XML Map to add their features and they may need to make a few other modifications to handle the structure of their GB data. When I used the sequence loader to load S.mansoni data, we found that gene features had multiple db_xrefs, so I had to modify that sub-routine. IF ANYBODY IN THE GUS COMMUNITY WANTS TO USE THE PLUGIN FOR THEIR GB, EMBL OR TIGR DATA, I WILL GLADLY DO THE SUPPORT WORK ON THE PLUGIN. The more data we load with the plugin, the more robust it will become. This is the best way I can think of to work out the kinks. Ultimately, the plugin should support GB, EMBL, DBJ, Tigr, Chado and any other rich-seq format supported by BioPerl. -ed ---- Original message ---- >Date: Fri, 10 Jun 2005 10:19:19 -0400 >From: Steve Fischer <sfi...@pc...> >Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files >To: "Pablo N. Mendes" <pa...@pa...> >Cc: gus...@li... > >The UGA and Penn folks are working on a plugin that uses bioperl to >parse the input files sequence/feature files, and then load them into >GUS. It takes a simple XML mapping file that specifies how to go from >the bioperl objects to gus objects. > >it is nowhere near as sophisticated as the GUS XML made by Terry Clark. > >It will handle genbank, tigr xml and embl. > >so far it is working in production for genbank files (but, it only >inserts and will update soon) > >basically, at the start it will be a replacement for the GBParser plugin. > >for the 3.5 release it will be called InsertGenbankSequenceRecords (up >for debate). (Ed, how robust is it for embl and/or tigr xml?) > >steve > >Pablo N. Mendes wrote: > >> Hi folks, >> I find working with tab delimited files quite uncomfortable and >> sometimes dangerous. >> We don't have ways to check well formedness or schema compliance (like >> in XML with XSDs or DTDs). >> This could cause execution halts after long time running or worse: >> wrong data loaded into the database. >> >> I defend the idea of having such a generic plugin for loading XML into >> GUS, also based on >> a data description file. I've noticed that NCBI already offer XML as a >> possible format for download. >> Other data sources tend to do the same. >> >> Any thoughts on this? >> >> About the GUS XML effort, I find it very interesting. I'll check the >> material to get to know it better. >> >> Best, >> Pablo >> >> ----- Original Message ----- From: "Terry Clark" <tc...@it...> >> To: "Eric E. Snyder" <es...@vb...> >> Cc: <gus...@li...> >> Sent: Thursday, June 09, 2005 7:43 PM >> Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files >> >> >>> Dear Eric, >>> We have a such an effort underway using XML formatted input data. >>> Here's a pointer to the project >>> http://flora.ittc.ku.edu/xmlgus/ >>> This method requires >>> some_format -> GUS' XML -> GUS object layer >>> >>> The system, running as a plugin, reads input in a GUS XML format that >>> is formatted to correspond with relational tables and GUS objects. >>> The mapping is instantiated in the XMLGUS framework as a YACC grammar >>> chosen for structure and the declarative approach for the plugin. >>> We're adding automation to some of the intermediate steps presently. >>> I'd be happy to help you try this out if you are interested. >>> >>> all the best, >>> >>> Terry >>> >>> On 0, "Eric E. Snyder" <es...@vb...> wrote: >>> >>>> Dear GUSdev, >>>> >>>> We have been having some trouble loading DNA annotation data via the >>>> gbparser plugin. We have been able to get around the problem in this >>>> instance by using addrow, which is quite general but impossibly slow. I >>>> cannot help but think there must be a generic tool for loading >>>> tab-delimited data files into GUS. >>>> >>>> Assuming there isn't, I think it would be time well spent if someone >>>> wrote a plugin for GUS that would *efficiently* load data in >>>> tab-delimited format based on instructions described in a >>>> general-purpose data description file. This file would identify the >>>> tables and fields corresponding to each column in the input file. It >>>> would also need to define the rules for associating data from records >>>> stored in multiple tables and probably do other things as well. >>>> >>>> Any takers? I would be happy to spend whatever time is necessary to >>>> define the requirements for such a system. If it doesn't already exist >>>> somewhere in the GUS community, I certainly think it would be useful. >>>> >>>> I apologize in advance if this is a recent or frequent topic for this >>>> list. I just subscribed and wasn't able to access sourceforge to check >>>> the archives. >>>> >>>> Thanks! >>>> eesnyder >>>> -- >>>> Eric E. Snyder, Ph.D. >>>> Virginia Bioinformatics Institute >>>> Washington Street Phase 1 (0447) >>>> Virginia Polytechnic Institute and State University >>>> Blacksburg, VA 24061 >>>> USA >>>> >>>> Office: (540) 231-5428 >>>> Mobile: (540) 230-5225 >>>> Fax: (540) 231-2891 >>>> Email: ees...@vb... >>>> JDAM: N 37 12'01.6", W 80 24'26.9" >>> >>> >>> >>> >>> >>> ------------------------------------------------------- >>> This SF.Net email is sponsored by: NEC IT Guy Games. How far can you >>> shotput >>> a projector? How fast can you ride your desk chair down the office >>> luge track? >>> If you want to score the big prize, get to know the little guy. >>> Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. How far can you >> shotput >> a projector? How fast can you ride your desk chair down the office >> luge track? >> If you want to score the big prize, get to know the little guy. Play >> to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > >------------------------------------------------------- >This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput >a projector? How fast can you ride your desk chair down the office luge track? >If you want to score the big prize, get to know the little guy. >Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >_______________________________________________ >Gusdev-gusdev mailing list >Gus...@li... >https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev ----------------- Ed Robinson Center for Tropical and Emerging Global Diseases University of Georgia, Athens, GA 30602 ero...@ug.../(706)542.1447/254.8883 |
From: Steve F. <sfi...@pc...> - 2005-06-10 14:19:02
|
The UGA and Penn folks are working on a plugin that uses bioperl to parse the input files sequence/feature files, and then load them into GUS. It takes a simple XML mapping file that specifies how to go from the bioperl objects to gus objects. it is nowhere near as sophisticated as the GUS XML made by Terry Clark. It will handle genbank, tigr xml and embl. so far it is working in production for genbank files (but, it only inserts and will update soon) basically, at the start it will be a replacement for the GBParser plugin. for the 3.5 release it will be called InsertGenbankSequenceRecords (up for debate). (Ed, how robust is it for embl and/or tigr xml?) steve Pablo N. Mendes wrote: > Hi folks, > I find working with tab delimited files quite uncomfortable and > sometimes dangerous. > We don't have ways to check well formedness or schema compliance (like > in XML with XSDs or DTDs). > This could cause execution halts after long time running or worse: > wrong data loaded into the database. > > I defend the idea of having such a generic plugin for loading XML into > GUS, also based on > a data description file. I've noticed that NCBI already offer XML as a > possible format for download. > Other data sources tend to do the same. > > Any thoughts on this? > > About the GUS XML effort, I find it very interesting. I'll check the > material to get to know it better. > > Best, > Pablo > > ----- Original Message ----- From: "Terry Clark" <tc...@it...> > To: "Eric E. Snyder" <es...@vb...> > Cc: <gus...@li...> > Sent: Thursday, June 09, 2005 7:43 PM > Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files > > >> Dear Eric, >> We have a such an effort underway using XML formatted input data. >> Here's a pointer to the project >> http://flora.ittc.ku.edu/xmlgus/ >> This method requires >> some_format -> GUS' XML -> GUS object layer >> >> The system, running as a plugin, reads input in a GUS XML format that >> is formatted to correspond with relational tables and GUS objects. >> The mapping is instantiated in the XMLGUS framework as a YACC grammar >> chosen for structure and the declarative approach for the plugin. >> We're adding automation to some of the intermediate steps presently. >> I'd be happy to help you try this out if you are interested. >> >> all the best, >> >> Terry >> >> On 0, "Eric E. Snyder" <es...@vb...> wrote: >> >>> Dear GUSdev, >>> >>> We have been having some trouble loading DNA annotation data via the >>> gbparser plugin. We have been able to get around the problem in this >>> instance by using addrow, which is quite general but impossibly slow. I >>> cannot help but think there must be a generic tool for loading >>> tab-delimited data files into GUS. >>> >>> Assuming there isn't, I think it would be time well spent if someone >>> wrote a plugin for GUS that would *efficiently* load data in >>> tab-delimited format based on instructions described in a >>> general-purpose data description file. This file would identify the >>> tables and fields corresponding to each column in the input file. It >>> would also need to define the rules for associating data from records >>> stored in multiple tables and probably do other things as well. >>> >>> Any takers? I would be happy to spend whatever time is necessary to >>> define the requirements for such a system. If it doesn't already exist >>> somewhere in the GUS community, I certainly think it would be useful. >>> >>> I apologize in advance if this is a recent or frequent topic for this >>> list. I just subscribed and wasn't able to access sourceforge to check >>> the archives. >>> >>> Thanks! >>> eesnyder >>> -- >>> Eric E. Snyder, Ph.D. >>> Virginia Bioinformatics Institute >>> Washington Street Phase 1 (0447) >>> Virginia Polytechnic Institute and State University >>> Blacksburg, VA 24061 >>> USA >>> >>> Office: (540) 231-5428 >>> Mobile: (540) 230-5225 >>> Fax: (540) 231-2891 >>> Email: ees...@vb... >>> JDAM: N 37 12'01.6", W 80 24'26.9" >> >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. How far can you >> shotput >> a projector? How fast can you ride your desk chair down the office >> luge track? >> If you want to score the big prize, get to know the little guy. >> Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you > shotput > a projector? How fast can you ride your desk chair down the office > luge track? > If you want to score the big prize, get to know the little guy. Play > to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Pablo N. M. <pa...@pa...> - 2005-06-10 14:06:18
|
Hi folks, I find working with tab delimited files quite uncomfortable and sometimes dangerous. We don't have ways to check well formedness or schema compliance (like in XML with XSDs or DTDs). This could cause execution halts after long time running or worse: wrong data loaded into the database. I defend the idea of having such a generic plugin for loading XML into GUS, also based on a data description file. I've noticed that NCBI already offer XML as a possible format for download. Other data sources tend to do the same. Any thoughts on this? About the GUS XML effort, I find it very interesting. I'll check the material to get to know it better. Best, Pablo ----- Original Message ----- From: "Terry Clark" <tc...@it...> To: "Eric E. Snyder" <es...@vb...> Cc: <gus...@li...> Sent: Thursday, June 09, 2005 7:43 PM Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files > Dear Eric, > We have a such an effort underway using XML formatted input data. > Here's a pointer to the project > http://flora.ittc.ku.edu/xmlgus/ > This method requires > some_format -> GUS' XML -> GUS object layer > > The system, running as a plugin, reads input in a GUS XML format that > is formatted to correspond with relational tables and GUS objects. > The mapping is instantiated in the XMLGUS framework as a YACC grammar > chosen for structure and the declarative approach for the plugin. > We're adding automation to some of the intermediate steps presently. > I'd be happy to help you try this out if you are interested. > > all the best, > > Terry > > On 0, "Eric E. Snyder" <es...@vb...> wrote: >> Dear GUSdev, >> >> We have been having some trouble loading DNA annotation data via the >> gbparser plugin. We have been able to get around the problem in this >> instance by using addrow, which is quite general but impossibly slow. I >> cannot help but think there must be a generic tool for loading >> tab-delimited data files into GUS. >> >> Assuming there isn't, I think it would be time well spent if someone >> wrote a plugin for GUS that would *efficiently* load data in >> tab-delimited format based on instructions described in a >> general-purpose data description file. This file would identify the >> tables and fields corresponding to each column in the input file. It >> would also need to define the rules for associating data from records >> stored in multiple tables and probably do other things as well. >> >> Any takers? I would be happy to spend whatever time is necessary to >> define the requirements for such a system. If it doesn't already exist >> somewhere in the GUS community, I certainly think it would be useful. >> >> I apologize in advance if this is a recent or frequent topic for this >> list. I just subscribed and wasn't able to access sourceforge to check >> the archives. >> >> Thanks! >> eesnyder >> -- >> Eric E. Snyder, Ph.D. >> Virginia Bioinformatics Institute >> Washington Street Phase 1 (0447) >> Virginia Polytechnic Institute and State University >> Blacksburg, VA 24061 >> USA >> >> Office: (540) 231-5428 >> Mobile: (540) 230-5225 >> Fax: (540) 231-2891 >> Email: ees...@vb... >> JDAM: N 37 12'01.6", W 80 24'26.9" > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you > shotput > a projector? How fast can you ride your desk chair down the office luge > track? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Terry C. <tc...@it...> - 2005-06-09 22:43:37
|
Dear Eric, We have a such an effort underway using XML formatted input data. Here's a pointer to the project http://flora.ittc.ku.edu/xmlgus/ This method requires some_format -> GUS' XML -> GUS object layer The system, running as a plugin, reads input in a GUS XML format that is formatted to correspond with relational tables and GUS objects. The mapping is instantiated in the XMLGUS framework as a YACC grammar chosen for structure and the declarative approach for the plugin. We're adding automation to some of the intermediate steps presently. I'd be happy to help you try this out if you are interested. all the best, Terry On 0, "Eric E. Snyder" <es...@vb...> wrote: > Dear GUSdev, > > We have been having some trouble loading DNA annotation data via the > gbparser plugin. We have been able to get around the problem in this > instance by using addrow, which is quite general but impossibly slow. I > cannot help but think there must be a generic tool for loading > tab-delimited data files into GUS. > > Assuming there isn't, I think it would be time well spent if someone > wrote a plugin for GUS that would *efficiently* load data in > tab-delimited format based on instructions described in a > general-purpose data description file. This file would identify the > tables and fields corresponding to each column in the input file. It > would also need to define the rules for associating data from records > stored in multiple tables and probably do other things as well. > > Any takers? I would be happy to spend whatever time is necessary to > define the requirements for such a system. If it doesn't already exist > somewhere in the GUS community, I certainly think it would be useful. > > I apologize in advance if this is a recent or frequent topic for this > list. I just subscribed and wasn't able to access sourceforge to check > the archives. > > Thanks! > eesnyder > -- > Eric E. Snyder, Ph.D. > Virginia Bioinformatics Institute > Washington Street Phase 1 (0447) > Virginia Polytechnic Institute and State University > Blacksburg, VA 24061 > USA > > Office: (540) 231-5428 > Mobile: (540) 230-5225 > Fax: (540) 231-2891 > Email: ees...@vb... > JDAM: N 37 12'01.6", W 80 24'26.9" |
From: Fernan A. <fe...@ii...> - 2005-06-09 21:05:58
|
+----[ Michael Saffitz <msa...@pc...> (09.Jun.2005 17:44): | | Hi Fernan, Hi Junmin and Michael and thanks for your help, | > Where should I do my edits? Or perhaps the object layer is | > created by reading from the database instance only at | > installation time? | | This is correct, what you'll need to do is to rebuild the objects. Before | you do that, however, you'll need to add entries to Core.Tableinfo for any | new tables you've created. (Also make sure your tables have proper | sequences and primary keys). OK, but not needed in this case, since there are no new tables, just new columns and a renamed column. | Once you've done that, you touch the VERSION file for the schema like so: | | touch $PROJECT_HOME/GUS/Model/schema/VERSION | | And then you rebuild | | build GUS install -append That did it ... thanks! | More details are on the wiki: | | http://www.gusdb.org/wiki/index.php/UpdateAWorkingGusInstallation | | You can skip the step about getting the latest GUS. Sorry there aren't | better instructions-- there will be for 3.5 | | --Mike | +----] |
From: Michael S. <msa...@pc...> - 2005-06-09 20:43:50
|
Hi Fernan, > Where should I do my edits? Or perhaps the object layer is > created by reading from the database instance only at > installation time? This is correct, what you'll need to do is to rebuild the objects. Before you do that, however, you'll need to add entries to Core.Tableinfo for any new tables you've created. (Also make sure your tables have proper sequences and primary keys). Once you've done that, you touch the VERSION file for the schema like so: touch $PROJECT_HOME/GUS/Model/schema/VERSION And then you rebuild build GUS install -append More details are on the wiki: http://www.gusdb.org/wiki/index.php/UpdateAWorkingGusInstallation You can skip the step about getting the latest GUS. Sorry there aren't better instructions-- there will be for 3.5 --Mike On 6/9/05 4:18 PM, "Fernan Aguero" <fe...@ii...> wrote: > Hi! > > regarding the changes proposed in > https://www.cbil.upenn.edu/tracker/show_bug.cgi?id=33 > > I want to move on and have them available for testing in a > database instance. We have already a working gus test > installation and we just used the submitted SQL DDL to make > the changes to the schema _in the database_. > > The question is how do I make this changes reflect in the > object layer? I have a new est_uid column and will thus need > new set/get methods for this one. > > I was tempted to edit the corresponding _Table.pm and > _Row.pm ... but since they are autogenerated perhaps it's > better if I ask first. > > Where should I do my edits? Or perhaps the object layer is > created by reading from the database instance only at > installation time? > > Thanks in advance, > > Fernan > > PS: this is still in gus 3.0 > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput > a projector? How fast can you ride your desk chair down the office luge track? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Fernan A. <fe...@ii...> - 2005-06-09 20:18:54
|
Hi! regarding the changes proposed in https://www.cbil.upenn.edu/tracker/show_bug.cgi?id=33 I want to move on and have them available for testing in a database instance. We have already a working gus test installation and we just used the submitted SQL DDL to make the changes to the schema _in the database_. The question is how do I make this changes reflect in the object layer? I have a new est_uid column and will thus need new set/get methods for this one. I was tempted to edit the corresponding _Table.pm and _Row.pm ... but since they are autogenerated perhaps it's better if I ask first. Where should I do my edits? Or perhaps the object layer is created by reading from the database instance only at installation time? Thanks in advance, Fernan PS: this is still in gus 3.0 |
From: Sharma, S. <Sur...@ng...> - 2005-06-09 19:20:15
|
Hi All, We have downloaded Protein data in uni-prot format for F.tularensis from the url listed below: http://www.ebi.ac.uk/integr8/FtpSearch.do?orgProteomeID=3D20762 We are using the plugin NRDBEntry to load this uni-prot data, but it is giving us errors. Are we using right Plugin? Is there is any other we should be using for loading this data. Thanks Surabhi |
From: Y. T. G. <yg...@pc...> - 2005-06-09 17:42:11
|
Is the move completed yet? What is the repository URL? I'd like to use it but can not seem to find it. -Thomas > -----Original Message----- > From: gus...@li... > [mailto:gus...@li...] On Behalf > Of Michael Saffitz > Sent: Thursday, June 09, 2005 11:50 AM > To: Steve Fischer; Gusdev gusdev-gusdev > Subject: Re: [GUSDEV] you MUST check in your work > > > > All, > > A really effective way of doing this is to use the cvs > release command. When done a the top of a project, it will > confirm that there are no pending > (uncommitted) changes, and then note in the history file that > you've released. > > You then won't be able to commit changes from the checkout again. > > --Mike > > > > On 6/9/05 12:06 PM, "Steve Fischer" <sfi...@pc...> wrote: > > > Folks- > > > > You MUST check in your work on the GUS project by 6 am EST > Friday 6/9 > > (tomorrow) > > > > otherwise your changes will be LOST. > > > > as described in previous mail from Mike, we are switching from the > > Sanger CVS service to a Subversion service at CBIL. > > > > steve > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: NEC IT Guy Games. How > far can you > > shotput a projector? How fast can you ride your desk chair down the > > office luge track? If you want to score the big prize, get > to know the > > little guy. Play to win an NEC 61" plasma display: > > http://www.necitguy.com/?r=20 > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far > can you shotput a projector? How fast can you ride your desk > chair down the office luge track? If you want to score the > big prize, get to know the little guy. > Play to win an NEC 61" plasma display: > http://www.necitguy.com/?r=20 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |