From: Steve F. <sfi...@pc...> - 2006-03-09 16:23:58
|
ricardo- no we don't have any immediate plans to improve ISF in this direction. what i think you should do is write a pre-processor which searches for known issues. (we pre-process our data all the time.) for example, you could write a program that passes each genbank record separately to bioperl/unflattener for parsing (steal that code from ISF). if there is an exception, catch it, (use perl's eval statement), write the broken accession to a log file, and keep going. now you have a list of accessions to inspect and/or programmatically remove. steve Ricardo Balbi wrote: > Steve, > > We have a GenBank file with 1.5GB and to fix all issues that we find > manually becomes a very cost task, because we do not have as much > time. Do you have plans to improve the ISF in this direction? > > Ricardo > > On 3/7/06, *Steve Fischer* <sfi...@pc... > <mailto:sfi...@pc...>> wrote: > > ISF does not now support any way to ignore a GI. > > so, for now, copy the genbank to file to a .fixed version, and fix the > problem, either by removing the GI or by tweeking the feaatures > > steve > > Ricardo Balbi wrote: > > > Ok Aaron, but what we have to do ? edit the GenBank file? ignore > this > > GI? if yes, how to ignore? > > > > On 2/16/06, *Aaron J. Mackey* < am...@pc... > <mailto:am...@pc...> > > <mailto:am...@pc... <mailto:am...@pc...>>> > wrote: > > > > > > > > On Feb 17, 2006, at 6:58 PM, davila wrote: > > > > > 3. 102829.txt - Apparently a BioPerl error > > > A 102829.gb <http://102829.gb> <http://102829.gb> file is > an example > > > > Read the error message, it tells you exactly what is wrong: > > > > ------------- EXCEPTION ------------- > > MSG: 1 there is a conflict with exons; there was an > explicitly stated > > exon with location 12..21, yet I cannot generate this e > > xon from the supplied mRNA locations > > 1 There are some inferred exons that are not in the explicit > exon > > list; they are the exons at locations: > > 25..447 > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq > /usr/lib/ > > perl5/site_perl/5.8.6/Bio/SeqFeature/Tools/Unflattener.pm:1 > > 606 > > ... > > > > > > Then look at the GenBank file: > > > > gene 1..447 > > /gene="ESAG1" > > primer_bind 1..11 > > /gene="ESAG1" > > exon <12..21 > > /gene="ESAG1" > > /note="min-exon" > > CDS 25..>447 > > /gene="ESAG1" > > > > I have to agree with the Unflattener, this doesn't make any > sense at > > all. > > > > Garbage in, garbage out. > > > > -Aaron > > > > > > -- > > Aaron J. Mackey, Ph.D. > > Project Manager, ApiDB Bioinformatics Resource Center > > Penn Genomics Institute, University of Pennsylvania > > email: am...@pc... > <mailto:am...@pc...> <mailto:am...@pc... > <mailto:am...@pc...>> > > office: 215-898-1205 (Biology, 212 Goddard Labs) > > 215-746-7018 (PCBI, 1428 Blockley Hall) > > fax: 215-746-6697 (Penn Genomics Institute) > > postal: Penn Genomics Institute > > Goddard Labs 212 > > 415 S. University Avenue > > Philadelphia, PA 19104-6017 > > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. Do you grep > through > > log files > > for problems? Stop! Download the new AJAX search engine > that makes > > searching your log files as easy as surfing the web. DOWNLOAD > > SPLUNK! > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642> > > > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642>> > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > <mailto:Gus...@li...> > > <mailto: Gus...@li... > <mailto:Gus...@li...>> > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > |