From: Susan W. <ax...@me...> - 2008-11-06 19:07:02
|
Back again. Sorry to have to have my hand held through all this, but I think there is still a problem with the gff file: $ perl gmod_bulk_load_gff3.pl --recreate_cache --dbname dev_chado_01c --dbxref GeneID --organism fromdata --gff /oracle/flybase-dmel_r5.9/ dmel-2L-r5.12.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the dev_chado_01c database (This may take a while ...) Unable to find srcfeature 2L in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /oracle/genbank2chado/lib/Bio/GMOD/DB/Adapter.pm line 3887 Bio ::GMOD ::DB ::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x89b419c)', 'Bio::SeqFeature::Annotated=HASH(0x89a2074)') called at gmod_bulk_load_gff3.pl line 692 Issuing rollback() for database handle being DESTROY'd without explicit disconnect(). $ head dmel-2L-r5.12.gff ##gff-version 3 ##sequence-region 2L -204333 23011544 2L FlyBase chromosome_band -204333 1326937 . + . ID=band-21_chromosome_band;Name=band-21 2L FlyBase chromosome_band -204333 22221 . + . ID=band-21A_chromosome_band;Name=band-21A 2L FlyBase chromosome_band -204333 -153714 . + . ID=band-21A1_chromosome_band;Name=band-21A1 2L FlyBase chromosome_band -153713 -101818 . + . ID=band-21A2_chromosome_band;Name=band-21A2 2L FlyBase chromosome_band -101817 -66427 . + . ID=band-21A3_chromosome_band;Name=band-21A3 2L FlyBase chromosome_band -66426 -22869 . + . ID=band-21A4_chromosome_band;Name=band-21A4 2L FlyBase chromosome_band -22868 22221 . + . ID=band-21A5_chromosome_band;Name=band-21A5 2L FlyBase chromosome_arm 1 23011544 . . . ID=2L;Dbxref=GB:AE014134 I tried single # on the sequence-region line. Tried deleting the seqence-region line. Same difference.... Susan On Nov 6, 2008, at 11:37 AM, Scott Cain wrote: > Hi Susan, > > There are two problems: most immediately, the bulk load script doesn't > uncompress files for you, so you'll need to ungzip the file: > > gzip -d dmel-all-r5.12.gff.gz > > Second, the bulk loader doesn't deal well with really huge files (like > a whole fly genome), so it would be best to use the individual arm > files and load them separately. > > Scott > > > On Thu, Nov 6, 2008 at 1:32 PM, axiom7 <ax...@me...> wrote: >> >> Hi again, >> >> I downloaded dmel-all-r5.12.gff.gz from flybase, but now I have the >> following problem: >> >> perl gmod_bulk_load_gff3.pl --dbname dev_chado_01c --dbxref GeneID >> --organism fromdata --gff /oracle/flybase-dmel_r5.9/dmel-all- >> r5.12.gff.gz >> Preparing data for inserting into the dev_chado_01c database >> (This may take a while ...) >> Use of uninitialized value in pattern match (m//) at >> gmod_bulk_load_gff3.pl >> line 661, <GEN0> line 1. >> Use of uninitialized value in pattern match (m//) at >> gmod_bulk_load_gff3.pl >> line 679, <GEN0> line 1. >> no cvterm for at /oracle/genbank2chado/lib/Bio/GMOD/DB/Adapter.pm >> line >> 3911, <GEN0> line 1. >> Issuing rollback() for database handle being DESTROY'd without >> explicit >> disconnect(). >> >> My GMOD_ROOT is created by following instructions at >> http://gmod.org/wiki/Chado: >> >> cvs -d:pserver:ano...@gm...:/cvsroot/gmod login >> >> Enter blank password. Then do: >> >> cvs -d:pserver:ano...@gm...:/cvsroot/gmod co >> schema >> >> and then doing a make;make install >> >> Susan >> >> axiom7 wrote: >>> >>> Hi, >>> >>> I have filed the anomaly in the gmod project as you suggested. I >>> didn't >>> use the flybase data source, as I was following the directions >>> from gmod >>> for the genbank2chado package. I will try the other source(s) you >>> suggested and get back to you. >>> >>> Thanks Scott. >>> Susan >>> >>> >>> Scott Cain-3 wrote: >>>> >>>> Hi Susan, >>>> >>>> I can certainly see what is wrong; the fix is another matter: GFF3 >>>> lines are only allowed to have a single ID, but the mRNA line you >>>> pointed to has two: CG17683.t01 and CG17683.t06. Why this >>>> happened is >>>> not clear to me; I would have to assume a bug in >>>> bp_genebank2gff3.pl. >>>> If you could file this as a bug in the gmod project (as part of >>>> Chado), I should be able to look at it in the next few days: >>>> >>>> https://sourceforge.net/tracker2/?group_id=27707&atid=391291 >>>> >>>> On another track, why aren't you using the Dmel GFF3 from flybase: >>>> >>>> >>>> ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r5.9_FB2008_06/gff/ >>>> >>>> (Full disclosure: I haven't tried to load the flybase GFF into a >>>> Chado >>>> instance recently, so I can't comment on whether it will really >>>> work >>>> on not--but it has a much better chance). Or, using the flybase >>>> database dump of Chado: >>>> >>>> ftp://ftp.flybase.net/releases/current/psql/ >>>> >>>> Scott >>>> >>>> >>>> On Thu, Nov 6, 2008 at 11:07 AM, axiom7 <ax...@me...> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have downloaded the Drosophila melanogaster *.gbk.gz files from >>>>> bio-mirror.net/biomirror/ncbigenomes/Drosophila_melanogaster and >>>>> run >>>>> bp_genebank2gff3.pl on them to create the *.gbk.gz.gff files. >>>>> However, >>>>> the >>>>> load fails immediately: >>>>> >>>>> perl bin/gmod_bulk_load_gff3.pl --dbname dev_chado_01c -dbxref >>>>> GeneID >>>>> --organism fromdata --gff >>>>> data/Drosophila_melanogaster/CHR_2/NT_033778.gbk.gz.gff >>>>> (Re)creating the uniquename cache in the database... >>>>> Creating table... >>>>> Populating table... >>>>> Creating indexes...Done. >>>>> Preparing data for inserting into the dev_chado_01c database >>>>> (This may take a while ...) >>>>> Organism Drosophila melanogaster from data >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: Error in line: >>>>> NT_033778 GenBank mRNA 18442 18629 . + . >>>>> ID >>>>> = >>>>> CG17683 >>>>> .t01 >>>>> ,CG17683 >>>>> .t06 >>>>> ;Parent >>>>> = >>>>> CG17683 >>>>> ,CG17683;locus_tag=Dmel_CG17683;gene=CG17683;product=CG17683-RA%2C >>>>> transcript variant >>>>> A;Dbxref=GI:116007463,FLYBASE:FBgn0040002,GeneID: >>>>> 3355011;transcript_id=NM_001042963.1 >>>>> >>>>> A feature may have at most one ID value >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >>>>> /oracle/genbank2chado/lib/Bio/Root/Root.pm:359 >>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>> /oracle/genbank2chado/lib/Bio/FeatureIO/gff.pm:696 >>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>> /oracle/genbank2chado/lib/Bio/FeatureIO/gff.pm:165 >>>>> STACK: bin/gmod_bulk_load_gff3.pl:819 >>>>> ----------------------------------------------------------- >>>>> Issuing rollback() for database handle being DESTROY'd without >>>>> explicit >>>>> disconnect(). >>>>> >>>>> The "head" command on the file is as follows, which shows the >>>>> script >>>>> failing >>>>> on the first mRNA line: >>>>> >>>>> head data/Drosophila_melanogaster/CHR_2/NT_033778.gbk.gz.gff >>>>> ##gff-version 3 >>>>> # sequence-region NT_033778 1 21146708 >>>>> # conversion-by bp_genbank2gff3.pl >>>>> # organism Drosophila melanogaster >>>>> # date 14-MAY-2008 >>>>> # Note Drosophila melanogaster chromosome 2R. >>>>> NT_033778 GenBank chromosome 1 21146708 . >>>>> + >>>>> . ID=NT_033778;mol_type=genomic >>>>> DNA;date=14-MAY-2008;comment1=REVIEWED >>>>> REFSEQ: This record has been curated by FlyBase. The reference >>>>> sequence >>>>> was >>>>> derived from AE013599. On Oct 10%2C 2006 this sequence version >>>>> replaced >>>>> gi:56407907. COMPLETENESS: full length. ;Note=Drosophila >>>>> melanogaster >>>>> chromosome >>>>> 2R.;Alias=2R;chromosome=2R;Dbxref=taxon:7227;organism=Drosophila >>>>> melanogaster >>>>> NT_033778 GenBank region 1 1285689 . + . >>>>> ID=GenBank:region:NT_033778:1:1285689;Note=Heterochromatic >>>>> sequence >>>>> NT_033778 GenBank gene 18442 20468 . + . >>>>> ID=CG17683;locus_tag=Dmel_CG17683;gene=CG17683;Note=CG17683%3B >>>>> Annotated >>>>> by >>>>> Drosophila Heterochromatin Genome Project%2C Lawrence Berkeley >>>>> National >>>>> Lab%2C http://www.dhgp.org;Dbxref=FLYBASE:FBgn0040002,GeneID: >>>>> 3355011 >>>>> NT_033778 GenBank mRNA 18442 18629 . + . >>>>> ID >>>>> = >>>>> CG17683 >>>>> .t01 >>>>> ,CG17683 >>>>> .t06 >>>>> ;Parent >>>>> = >>>>> CG17683 >>>>> ,CG17683;locus_tag=Dmel_CG17683;gene=CG17683;product=CG17683-RA%2C >>>>> transcript variant >>>>> A;Dbxref=GI:116007463,FLYBASE:FBgn0040002,GeneID: >>>>> 3355011;transcript_id=NM_001042963.1 >>>>> >>>>> I obtained the scripts from >>>>> rsync://eugenes.org/argos/gmod/web/gmod/genbank2chado: >>>>> >>>>> head bin/bp_genbank2gff3.pl >>>>> #!/usr/bin/perl -w >>>>> >>>>> #$Id: genbank2gff3.PLS,v 1.11 2007/03/19 16:42:05 bosborne Exp $; >>>>> >>>>> >>>>> head bin/gmod_bulk_load_gff3.pl >>>>> #!/usr/bin/perl >>>>> >>>>> >>>>> =item dgg notes, 2007 march >>>>> >>>>> Can anybody see what is wrong with this? >>>>> >>>>> Thanks. >>>>> Susan >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/gmod_bulk_load_gff3-of-Drosophila-melanogaster-fails-tp20364068p20364068.html >>>>> Sent from the gmod-devel mailing list archive at Nabble.com. >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>> challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win >>>>> great >>>>> prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere >>>>> in the >>>>> world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Gmod-devel mailing list >>>>> Gmo...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. scott at >>>> scottcain >>>> dot net >>>> GMOD Coordinator (http://gmod.org/) >>>> 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge >>>> Build the coolest Linux based applications with Moblin SDK & win >>>> great >>>> prizes >>>> Grand prize is a trip for two to an Open Source event anywhere in >>>> the >>>> world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ >>>> Gmod-devel mailing list >>>> Gmo...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel >>>> >>>> >>> >>> >> >> -- >> View this message in context: http://www.nabble.com/gmod_bulk_load_gff3-of-Drosophila-melanogaster-fails-tp20364068p20367047.html >> Sent from the gmod-devel mailing list archive at Nabble.com. >> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Gmod-devel mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-devel >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at > scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research |