From: Aaron J. M. <am...@pc...> - 2005-07-14 19:50:09
|
As we struggle to use GUS the "right way", this is throwing us for a loop. On the one hand, our GUS client applications want to see features in the coordinate system of the assembly (i.e. the virtual sequence) -- on the other hand, it makes sense from a data integrity viewpoint to only load/store feature coordinates with respect to the static underlying scaffold coordinates, since the scaffold-to- chromosome mapping (as defined by DoTS.SequencePiece) may change over time. One option is to instantiate a read-only materialized view of the NALocation for clients to use. A second option (which we've just discussed, and people seem to like) is for the InsertVirtualSequenceFromMapping plugin we just wrote to (re)generate duplicate versions of all NALocations attached to a given SequencePiece in the new coordinate system (requiring the virtual sequence building to be the last step in our pipeline, instead of the first). -Aaron On Jul 14, 2005, at 2:53 PM, Chris Stoeckert wrote: > Hi Aaron, > I don't have a strong argument for either way. In terms of > coordinate mapping utilities, I'm not aware of one so certainly > would welcome yours (but if others know of ones please speak up). > > Chris > > On Jul 14, 2005, at 11:13 AM, Aaron J. Mackey wrote: > > >> >> Thanks Chris, I got it. >> >> If we are going to start hanging features off these, should we >> hang them off the virtual chromosome sequence entries, or the >> scaffold entries in externalnasequence? Would it make sense to >> "codify" this usage with associate PL/SQL code to reconstruct >> virtual sequence and associated features in the virtual coordinate >> space? I guess one way to do this would be to have >> Virtual*Feature read-only views (and thus target everything to the >> "real" coordinate system such that future rebuilds of the virtual >> sequence would not require recalculation of feature locations)? >> >> Relatedly, is there coordinate mapping code already in some GUS >> utility module (if not, I'm happy to contribute mine, based on >> BioPerl's powerful Bio::Coordinate::Map framework)? >> >> -Aaron >> >> On Jul 14, 2005, at 11:05 AM, Chris Stoeckert wrote: >> >> >> >>> Hi Aaron, >>> >>> >>> >>> >>>> 1) VirtualSequence has a required sequence_version attribute - >>>> what is this for? Is this redundant to >>>> external_database_release_id? >>>> >>>> >>>> >>> This is a superclass attribute inherited by all NASequence views. >>> My recollection is that individual GenBank sequence entries have >>> version tags at the end of accessions as in "DQ094190.1" for >>> Toxoplasma gondii ATP-binding cassette protein subfamily B member >>> 3 (found in VERSION field). >>> >>> >>> >>> >>>> 2) VirtualSequence has a clob for storing the assembled sequence >>>> (I suspect), but the Perl object layer doesn't use this slot, >>>> instead rebuilding the sequence from the sequence pieces. Am I >>>> correct in this usage, or should I not, in fact, be storing the >>>> assembled sequence in VirtualSequence? >>>> >>>> >>>> >>> >>> Again this is a superclass attribute. I think using it is >>> optional. Reasons not to use it are that the virtual sequence is >>> hard to represent as a single entity (e.g., contains gaps) or is >>> very large and has a significant overhead cost of storing what >>> can be easily regenerated (and avoid denormalization). Reasons to >>> use are for convenience and efficiency of retrieving the sequence >>> without the need to rebuild. >>> >>> Chris >>> >>> >>> >>> >>> >>>> >>>> Thanks, >>>> >>>> -Aaron >>>> >>>> -- >>>> Aaron J. Mackey, Ph.D. >>>> Project Manager, ApiDB Bioinformatics Resource Center >>>> Penn Genomics Institute, University of Pennsylvania >>>> email: am...@pc... >>>> office: 215-898-1205 >>>> fax: 215-746-6697 >>>> postal: Penn Genomics Institute >>>> Goddard Labs 212 >>>> 415 S. University Avenue >>>> Philadelphia, PA 19104-6017 >>>> >>>> >>>> >>>> ------------------------------------------------------- >>>> This SF.Net email is sponsored by the 'Do More With Dual!' >>>> webinar happening >>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest >>>> in dual >>>> core and dual graphics technology at this free one hour event >>>> hosted by HP,AMD, and NVIDIA. To register visit http:// >>>> www.hp.com/go/dualwebinar >>>> _______________________________________________ >>>> Gusdev-gusdev mailing list >>>> Gus...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>> >>>> >>>> >>> >>> >>> >> >> -- >> Aaron J. Mackey, Ph.D. >> Project Manager, ApiDB Bioinformatics Resource Center >> Penn Genomics Institute, University of Pennsylvania >> email: am...@pc... >> office: 215-898-1205 >> fax: 215-746-6697 >> postal: Penn Genomics Institute >> Goddard Labs 212 >> 415 S. University Avenue >> Philadelphia, PA 19104-6017 >> >> >> >> ------------------------------------------------------- >> SF.Net email is sponsored by: Discover Easy Linux Migration >> Strategies >> from IBM. Find simple to follow Roadmaps, straightforward articles, >> informative Webcasts and more! Get everything you need to get up to >> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: am...@pc... office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 |