folks-
We expect that LoadAnnotatedSeqs will be using bioPerl's
Bio::SeqFeature::Tools::Unflattener to create the feature hierarchy from
genbank/embl records. (the tigr xml already has that info).
the unflattener (written by Chris Mungall) has a "smart" algorithm to
infer containment relationships from those implicit in the sloppy
genbank syntax.
here is the documentation for the unflattener:
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqFeature/Tools/Unflattener.pm?rev=1.25&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup
if you are interested, you might want to check out the algorithm.
by default, the unflattener only handles:
Gene, RNA, CDS, exon, intron and psuedogenes
probably we'll start out either using that default, or, modifying it to
include, say, polyA_signals, etc.
we would leave to future iterations of the plugin the ability to
configure the plugin to specify what containment to use.
steve
|