From: Victor de J. <vic...@nb...> - 2008-05-03 17:14:47
|
Hi Scott, The options to show the insert statements will help a lot. Most tools for our legacy database are written in Python but no gff formatter (which I'm writing). Zheng Zha pointed me at xmlxort as other option in his notes on wormbase migration. Victor -----Original Message----- From: Scott Cain [mailto:cai...@gm...] Sent: woensdag 30 april 2008 4:14 To: Victor de Jager Cc: gmo...@li... Subject: Re: [Gmod-schema] gff chado loading question Hi Victor, Sorry about the exon thing--I don't know much a bacterial genomics, but you are welcome to complain to SO or GenBank (depending on who is the offender :-) There isn't a HOWTO for using SQL to insert features, however, there is an option for the gff bulk loader to have it write out INSERT statements instead of PostgreSQL bulk loading format. You'd probably want to use these options: --inserts --noload --save_tmpfiles and then you could examine the result for pointers. Scott On Tue, 2008-04-29 at 10:00 +0200, Victor de Jager wrote: > Thanks John, > > In between I have written an extensive page on the gmod wiki about my > installation on Debian stable. Please feel free to correct where I went > wrong. The trick is to use the genbank file, convert it to gff and then > load the data. > I sometimes run in features with non SO descriptions, like 'pseudotRNA', > which I filter out, but overall all of the bacterial genomes I tried > loaded fine using this scheme. The only thing to argue is the use of > 'exon' as a feature in bacterial genomes. > > Is there a howto on inserting features using plain sql? > > Victor > > -----Original Message----- > From: Scott Cain [mailto:cai...@gm...] > Sent: maandag 28 april 2008 19:56 > To: Victor de Jager > Cc: gmo...@li... > Subject: Re: [Gmod-schema] gff chado loading question > > Hello Victor, > > I will have to look into the warnings about the 'my' variable, but I > don't think that is the main problem. There is at least on problem with > the GFF file you pointed to, and I wouldn't be surprised to find more. > The first problem I found is that it lacks a reference sequence line (at > least, a proper one). The first GFF line is this: > > NC_007404.1 RefSeq source 1 2909809 . + . > organism=Thiobacillus%20denitrificans%20ATCC%2025259;mol_type=genomic%20 > DNA;strain=ATCC%2025259;db_xref=ATCC:25259;db_xref=taxon:292415 > > It does not have "ID=NC_007404.1". Also many of the gene models lack > ID/Parent tags to indicate what belongs to what, which will result in > fairly difficult to use data (GBrowse certainly won't work correctly). > > Scott > > > On Fri, 2008-04-18 at 17:20 +0200, Victor de Jager wrote: > > Hi all, > > > > I have installed a copy of the chado scheme on my local Debian > > installation following the instructions for Ubuntu. > > (loaded ontologies 1,2,3 and 4) > > However but I have a problem with loading a refseq gff file from ncbi. > > I try to follow the Load gff into Chado page. > > > > I want to load > > > ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Thiobacillus_denitrificans_A > TCC_25259/NC_007404.gff > > > > I created an organism using the select statement on the wiki > > INSERT INTO organism (abbreviation, genus, species, common_name) > > VALUES ('NC_007404.1', 'Thiobacillus', > > 'denitrificans', 'Thiobacillus denitrificans ATCC 25259'); > > > > > > gmod_gff3_preprocessor.pl --gfffile /tmp/NC_007404.gff --outfile > > tbd.sorted.gff results in the following error message: > > -------------- > > "my" variable %seen masks earlier declaration in same scope > > at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4199. > > "my" variable %seen masks earlier declaration in same scope > > at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 4223. > > Sorting the contents of /tmp/tmp.gff ... > > > > Writing sorted contents to /tmp/tmp.gff.sorted ... > > ------------ > > tbd.sorted.gff is not created > > > > when I try to load the gff as is: > > > > gmod_bulk_load_gff3.pl --organism 'Thiobacillus denitrificans ATCC > > 25259' --gfffile NC_007404.gff --recreate_cache > > > > I get the following message: > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes...Done. > > Preparing data for inserting into the dev_chado_01 database > > (This may take a while ...) > > Unable to find srcfeature NC_007404.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > > option. at /usr/local/share/perl/5.8.8/Bio/GMOD/DB/Adapter.pm line > > 4026 > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0 > x8bb6808)', 'Bio::SeqFeature::Annotated=HASH(0x8c4cd2c)') called at > /usr/local/bin/gmod_bulk_load_gff3.pl line 758 > > Issuing rollback() for database handle being DESTROY'd without > > explicit disconnect(). > > > > This is where I am stuck. Where should I go from here? > > > > Victor > > > > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save > $100. > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j > avaone > > _______________________________________________ Gmod-schema mailing > list Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cai...@gm... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory __________ Information from ESET NOD32 Antivirus, version of virus signature database 3072 (20080503) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com |