|
From: Scott C. <ca...@cs...> - 2006-07-11 15:30:54
|
Ah, right. In schema/chado/load/bin there is a file called new_bulk_load_gff3.PLS. To create the loader, execute the command `perl new_bulk_load_gff3.PLS`, which is what will get done during the make process (after the name is changed). More problematic, though, is the wormbase file. There are several issues: - it's too big. Unless you have a server with an insane amount of ram, the loader will end up crashing as it runs out of memory. I typically separate GFF files by reference sequences (or srcfeature in chado-ese). That may still result in files that are too large--they should be a few 100K lines a piece to be safe. (For reference, my 2G server can process files with around 600K lines.) - there are no lines for the chromosomes. Unless you are loading those first from a separate file, you need GFF lines for each chromosome before you reference a feature on it. - I can't tell if parent features always come before their children, but that is a potential problem. - Computational results are mixed with curated annotations. When loading, you have to specify whether the file contains computational results or not, and it can be only one or the other (see the docs for prepping the database to accept computational results and commandline args for loading). I am working a GFF preprocessor that will take care of many of these aspects, but up until now, I've been handling these things with quicky perl scripts. Scott On Tue, 2006-07-11 at 10:39 +0100, Anthony Rogers wrote: > Scott Cain wrote: > > Hi Ant, > > > > It looks to me like you are trying to us gmod_load_gff3.pl, which I > > haven't tested and is very likely not working (as you seem to have foun= d > > out). Try gmod_new_bulk_load_gff3.pl (which will have its name changed > > later this week to gmod_bulk_load_gff3.pl). > > > > Related to that, I'll be curious about how the loading goes. The last > > time I looked at wormbase GFF, it looked to me like it wasn't sorted th= e > > way that the loader will like. It is on my list of things to write a > > GFF3 preprocesser so that the loader will like it. > > =20 > Hi Scott, > Where do I find gmod_new_bulk_load_gff3.pl ? when I do a CVS update the=20 > gmod_bulk_load_gff3.pl script is still unchanged from May 16th >=20 > Payan, in Lincoln's group, has generated a new GFF3 version of WormBase=20 > data which I've been trying to use. Its available at . . . > http://dev.wormbase.org/~canaran/gff3_download_2006-06-27/elegansWS159.gf= f3.08.sorted.gz >=20 > Ant > > Nevertheless, I will try to fix the Class::DBI errors that are occurrin= g > > in your output below, since the recently released modware relies on it. > > > > Scott > > > > > > On Fri, 2006-07-07 at 12:15 +0100, Anthony Rogers wrote: > > =20 > >> I've come back to my CHADO adventure and Im stuck on this error with t= he=20 > >> gff loader > >> > >> perl new_gff_loader.pl --gfffile /wormsrv2/GMOD/DOWNLOADS/mini.gff3=20 > >> -organism worm --srcdb wormbase > >> . > >> . > >> . > >> 7.4 pg_catalog. gencode > >> 7.4 pg_catalog. gencode_codon_aa > >> Use of uninitialized value in split at=20 > >> /nfs/disk100/wormpub/lib/perl5/site_perl/5.8.7/Class/DBI/Pg.pm line 10= 8. > >> gencode_codon_aa has no primary key at ../../lib/Bio/GMOD/Load/GFF.pm=20 > >> line 5 > >> Compilation failed in require at ../../lib/Bio/GMOD/Load/GFF.pm line 5= . > >> BEGIN failed--compilation aborted at ../../lib/Bio/GMOD/Load/GFF.pm li= ne 5. > >> Compilation failed in require at new_gff_loader.pl line 8. > >> BEGIN failed--compilation aborted at new_gff_loader.pl line 8. > >> > >> look familiar ? > >> This obviously happens in compilation so it cant be data. There must = be=20 > >> something wrong in the database. > >> > >> Any help appreciated . . > >> Ant > >> > >> =20 >=20 >=20 --=20 ------------------------------------------------------------------------ Scott Cain, Ph. D. ca...@cs... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory |