From: Steve F. <sfi...@pc...> - 2005-09-05 15:35:34
|
Tony- see below - steve Tony Zhang wrote: > Hi Chris, > > Thanks for your reply. > > InsertSequenceFeatures.pm is helpful.I have more questions about this. > 1. Does InsertSequenceFeatures have the same (or more) functionality > as GBParser? Should we use InsertSequenceFeatures (supported) instead > of GBParser(community)? The functionality that GBParser supports that ISF does not support is updates. ISF is an Insert plugin, it only inserts rows, it never updates rows. GBParser can update a sequence or its features if you have new versions of them. but... we are no longer supporting GBParser (that is why it is Community). also, the current release of ISF does not yet support restart. it will very soon, hopefully next week. it will do so by providing an --undo option that will back out the previous run that failed so you can rerun it cleanly, after you have corrected whatever the problem was. > 2. To use InsertSequenceFeature, should we define "mapFile" (XML) > ourselves and pass it to InsertSequenceFeature as an argument? you can either use the provided map file or define one yourselves. but, only define one if you know that the provided one does not meet your needs > 3. Seems that there are many ways to use GUS to deal with the > relationship of "DNA->gene->CDS->AASEQUNCE". The documentation of > InsertSequenceFeatures.pm seems not very straightforward to me. Could > you or other GUS people give me a suggested method? For example, I > have an example genbank file below. ISF uses the bioperl feature parser. it includes an "unflattener" that analyzes feature locations of genes, rna, cds, etc and constructs gene feature trees of them (ie, gene models). ISF preserves those relationships in gus. AASequences are stored using the following relationship, specified by a "special case" called 'aaseq' (see the mapping file). the special cases are hard coded in the plugin, so, to understand them, read the code: Translation->TranslatedAAFeature->TranslatedAASequence with respect to the documentation, i have improved the notes provided with the plugin. here it is, what else would you like to see? ----------------snip----------------------------- The bioperl parser includes an "unflattener" that analyzes feature locations of genes, rna, cds, etc and constructs gene feature trees of them (ie, gene models). (See the bioperl API documentation for Bio::SeqFeature::Tools::Unflattener.) The plugin preserves these relationships (using the feature's parent_id to capture the tree). The mapping XML file includes five "special cases." These are cases in which some of the qualifiers are stored in tables other than the feature table. The five special cases are: 'dbxref', 'product', 'note', 'gene', and 'aaseq'. The special cases are hard-coded in the pluglin. To understand how the special cases each work, see the plugin code. ----------------------------------------------------- > > LOCUS NC_006932 2124241 bp DNA circular BCT > 08-APR-2005 > DEFINITION Brucella abortus biovar 1 str. 9-941 chromosome I, complete > sequence. > ACCESSION NC_006932 > VERSION NC_006932.1 GI:62288991 > KEYWORDS . > SOURCE Brucella abortus biovar 1 str. 9-941 > ORGANISM Brucella abortus biovar 1 str. 9-941 > <http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=262698> > Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; > Brucellaceae; Brucella. > REFERENCE 1 (bases 1 to 2124241) > AUTHORS Halling,S.M., Peterson-Burch,B.D., Bricker,B.J., Zuerner,R.L., > Qing,Z., Li,L.L., Kapur,V., Alt,D.P. and Olsen,S.C. > TITLE Completion of the genome sequence of Brucella abortus and > comparison to the highly similar genomes of Brucella > melitensis and > Brucella suis > JOURNAL J. Bacteriol. 187 (8), 2715-2726 (2005) > PUBMED 15805518 > <http://www.ncbi.nlm.nih.gov/entrez/utils/qmap.cgi?uid=15805518&form=6&db=m&Dopt=r> > > REFERENCE 2 (bases 1 to 2124241) > AUTHORS . > CONSRTM NCBI Genome Project > TITLE Direct Submission > JOURNAL Submitted (06-APR-2005) National Center for Biotechnology > Information, NIH, Bethesda, MD 20894, USA > REFERENCE 3 (bases 1 to 2124241) > AUTHORS Halling,S.M., Bricker,B.J., Alt,D.P., Peterson-Burch,B.D., > Zuerner,R.L., Olsen,S.C., Whipple,D.L., Zhang,Q., Li,L.-L. and > Kapur,V. > TITLE Direct Submission > JOURNAL Submitted (03-FEB-2004) ARS, USDA, National Animal Disease > Center, > 2300 N. Dayton, P.O. Box 70, Ames, IA 50010, USA > COMMENT PROVISIONAL REFSEQ <http://www.ncbi.nlm.nih.gov/RefSeq/>: > This record has not yet been subject to final > NCBI review. The reference sequence was derived from > AE017223 <http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=AE017223>. > COMPLETENESS: full length. > FEATURES Location/Qualifiers > source 1..2124241 > /organism="Brucella abortus biovar 1 str. 9-941" > /mol_type="genomic DNA" > /strain="9-941" > /db_xref="taxon:262698 > <http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=262698>" > /chromosome="I" > /biovar="1" > gene > <http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=62288991&itemID=2031&view=gbwithparts> > 784..2274 > /gene="dnaA" > /locus_tag="BruAb1_0001" > /db_xref="GeneID:3339217 > <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=3339217>" > > CDS > <http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=62288991&itemID=1&view=gbwithparts> > 784..2274 > /gene="dnaA" > /locus_tag="BruAb1_0001" > /note="similar to BR0001, chromosomal replication > initiator protein DnaA" > /codon_start=1 > /transl_table=11 > <http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG11> > /product="DnaA, chromosomal replication initiator > protein" > /protein_id="YP_220785.1 > <http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=YP_220785.1>" > /db_xref="GI:62288992" > /db_xref="GeneID:3339217 > <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=3339217>" > > > /translation="MKMDSAVSEEAFERLTAKLKARVGGEIYSSWFGRLKLDDISKSI > > VRLSVPTAFLRSWINNHYSELLTELWQEENPQILKVEVVVRGVSRVVRSAAPAETCDN > > AEAKPAVTPREKMVFPVGQSFGGQSLGEKRGSAVVAESAAATGAVLGSPLDPRYTFDT > > FVDGASNRVALAAARTIAEAGSSAVRFNPLFIHASVGLGKTHLLQAIAAAALQRQEKA > > RVVYLTAEYFMWRFATAIRDNNALSFKEQLRDIDLLVIDDMQFLQGKSIQHEFCHLLN > > TLLDSAKQVVVAADRAPSELESLDVRVRSRLQGGVALEVAAPDYEMRLEMLRRRLASA > > QCEDASLDIGEEILAHVARTVTGSGRELEGAFNQLLFRQSFEPNISIDRVDELLGHLT > > RAGEPKRIRIEEIQRIVARHYNVSKQDLLSNRRTRTIVKPRQVAMYLAKMMTPRSLPE > IGRRFGGRDHTTVLHAVRKIEDLVGADTKLAQELELLKRLINDQAA" > ... > > ORIGIN 1 ttttccacac ttatccacag ggcgcgggcg ggactcggtt > gcccctctga gtcaagcata > ... > > > I consider myself a newbie to GUS and the above questions may be naive > to many of you. However, your help is highly appreciated. > > Thanks again. > > - Tony > > > > > > Chris Stoeckert wrote: > >> Hi Tony, >> A beta release of the plugin InsertSequenceFeatures.pm is now >> available for loading GenBank records (as well as other formats). The >> semantics of how to use the GUS tables for this purpose are encoded >> in that plugin. Is that what you were asking? >> Chris >> >> On Sep 2, 2005, at 10:31 AM, Tony Zhang wrote: >> >>> This is old topic and it may have been discussed many times. Still, >>> I would like to get suggestion again about how to use GUS to store >>> DNA, CDS, and corresponding protein sequence. Suppose I have one >>> original record in Genkbank format. Thanks. >>> >>> - Tony >>> >>> >>> >>> ------------------------------------------------------- >>> SF.Net email is Sponsored by the Better Software Conference & EXPO >>> September 19-22, 2005 * San Francisco, CA * Development Lifecycle >>> Practices >>> Agile & Plan-Driven Development * Managing Projects & Teams * >>> Testing & QA >>> Security * Process Improvement & Measurement * >>> http://www.sqe.com/bsce5sf >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> <mailto:Gus...@li...> >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>> >> >> >> >> Chris Stoeckert, Ph.D. >> >> Research Associate Professor, Dept. of Genetics >> >> 1415 Blockley Hall, Center for Bioinformatics >> >> 423 Guardian Dr., University of Pennsylvania >> >> Philadelphia, PA 19104 >> >> Ph: 215-573-4409 FAX: 215-573-3111 >> >> > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |