From: Aaron J. M. <am...@pc...> - 2005-07-14 21:52:23
|
Exactly. No logic is required, because we simply copy any and all NALocation objects attached to the sequences and generate new NALocation objects that point to the virtual sequence, with new coordinate/strand, but all other foreign keys remain the same (i.e. children of the same feature). Hmm, that means that if you blindly pull locations for a given feature, you will get two locations, not just one (so you'll need to specify which reference sequence you wish to obtain the location on). -Aaron On Jul 14, 2005, at 5:41 PM, Chris Stoeckert wrote: > Let's see if I understand your proposal. Generate features and > locations based on the static scaffold sequence coordinates. Then > at the end of the pipeline generate the same (conceptual) features > with locations based on the virtual sequence coordinates. That > makes sense to me. The advantage is that you have both, one that is > stable (scaffold) and one that can be regenerated as needed > (virtual) but stored for convenience. I don't really see a > disadvantage - sure it's twice as many rows but if you materialize > a view you adding these anyway. > > Chris > > On Jul 14, 2005, at 3:50 PM, Aaron J. Mackey wrote: > > >> >> As we struggle to use GUS the "right way", this is throwing us for >> a loop. On the one hand, our GUS client applications want to see >> features in the coordinate system of the assembly (i.e. the >> virtual sequence) -- on the other hand, it makes sense from a data >> integrity viewpoint to only load/store feature coordinates with >> respect to the static underlying scaffold coordinates, since the >> scaffold-to-chromosome mapping (as defined by DoTS.SequencePiece) >> may change over time. >> >> One option is to instantiate a read-only materialized view of the >> NALocation for clients to use. >> >> A second option (which we've just discussed, and people seem to >> like) is for the InsertVirtualSequenceFromMapping plugin we just >> wrote to (re)generate duplicate versions of all NALocations >> attached to a given SequencePiece in the new coordinate system >> (requiring the virtual sequence building to be the last step in >> our pipeline, instead of the first). >> >> -Aaron >> >> On Jul 14, 2005, at 2:53 PM, Chris Stoeckert wrote: >> >> >> >>> Hi Aaron, >>> I don't have a strong argument for either way. In terms of >>> coordinate mapping utilities, I'm not aware of one so certainly >>> would welcome yours (but if others know of ones please speak up). >>> >>> Chris >>> >>> On Jul 14, 2005, at 11:13 AM, Aaron J. Mackey wrote: >>> >>> >>> >>> >>>> >>>> Thanks Chris, I got it. >>>> >>>> If we are going to start hanging features off these, should we >>>> hang them off the virtual chromosome sequence entries, or the >>>> scaffold entries in externalnasequence? Would it make sense to >>>> "codify" this usage with associate PL/SQL code to reconstruct >>>> virtual sequence and associated features in the virtual >>>> coordinate space? I guess one way to do this would be to have >>>> Virtual*Feature read-only views (and thus target everything to >>>> the "real" coordinate system such that future rebuilds of the >>>> virtual sequence would not require recalculation of feature >>>> locations)? >>>> >>>> Relatedly, is there coordinate mapping code already in some GUS >>>> utility module (if not, I'm happy to contribute mine, based on >>>> BioPerl's powerful Bio::Coordinate::Map framework)? >>>> >>>> -Aaron >>>> >>>> On Jul 14, 2005, at 11:05 AM, Chris Stoeckert wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Hi Aaron, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> 1) VirtualSequence has a required sequence_version attribute - >>>>>> what is this for? Is this redundant to >>>>>> external_database_release_id? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> This is a superclass attribute inherited by all NASequence >>>>> views. My recollection is that individual GenBank sequence >>>>> entries have version tags at the end of accessions as in >>>>> "DQ094190.1" for Toxoplasma gondii ATP-binding cassette protein >>>>> subfamily B member 3 (found in VERSION field). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> 2) VirtualSequence has a clob for storing the assembled >>>>>> sequence (I suspect), but the Perl object layer doesn't use >>>>>> this slot, instead rebuilding the sequence from the sequence >>>>>> pieces. Am I correct in this usage, or should I not, in fact, >>>>>> be storing the assembled sequence in VirtualSequence? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> Again this is a superclass attribute. I think using it is >>>>> optional. Reasons not to use it are that the virtual sequence >>>>> is hard to represent as a single entity (e.g., contains gaps) >>>>> or is very large and has a significant overhead cost of storing >>>>> what can be easily regenerated (and avoid denormalization). >>>>> Reasons to use are for convenience and efficiency of retrieving >>>>> the sequence without the need to rebuild. >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Aaron >>>>>> >>>>>> -- >>>>>> Aaron J. Mackey, Ph.D. >>>>>> Project Manager, ApiDB Bioinformatics Resource Center >>>>>> Penn Genomics Institute, University of Pennsylvania >>>>>> email: am...@pc... >>>>>> office: 215-898-1205 >>>>>> fax: 215-746-6697 >>>>>> postal: Penn Genomics Institute >>>>>> Goddard Labs 212 >>>>>> 415 S. University Avenue >>>>>> Philadelphia, PA 19104-6017 >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------- >>>>>> This SF.Net email is sponsored by the 'Do More With Dual!' >>>>>> webinar happening >>>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the >>>>>> latest in dual >>>>>> core and dual graphics technology at this free one hour event >>>>>> hosted by HP,AMD, and NVIDIA. To register visit http:// >>>>>> www.hp.com/go/dualwebinar >>>>>> _______________________________________________ >>>>>> Gusdev-gusdev mailing list >>>>>> Gus...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Aaron J. Mackey, Ph.D. >>>> Project Manager, ApiDB Bioinformatics Resource Center >>>> Penn Genomics Institute, University of Pennsylvania >>>> email: am...@pc... >>>> office: 215-898-1205 >>>> fax: 215-746-6697 >>>> postal: Penn Genomics Institute >>>> Goddard Labs 212 >>>> 415 S. University Avenue >>>> Philadelphia, PA 19104-6017 >>>> >>>> >>>> >>>> ------------------------------------------------------- >>>> SF.Net email is sponsored by: Discover Easy Linux Migration >>>> Strategies >>>> from IBM. Find simple to follow Roadmaps, straightforward articles, >>>> informative Webcasts and more! Get everything you need to get up to >>>> speed, fast. http://ads.osdn.com/? >>>> ad_id=7477&alloc_id=16492&op=click >>>> _______________________________________________ >>>> Gusdev-gusdev mailing list >>>> Gus...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>>> >>>> >>>> >>> >>> >>> >> >> -- >> Aaron J. Mackey, Ph.D. >> Project Manager, ApiDB Bioinformatics Resource Center >> Penn Genomics Institute, University of Pennsylvania >> email: am...@pc... >> office: 215-898-1205 >> fax: 215-746-6697 >> postal: Penn Genomics Institute >> Goddard Labs 212 >> 415 S. University Avenue >> Philadelphia, PA 19104-6017 >> >> > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: am...@pc... office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 |