Re: [Gmod-schema] [GMOD-devel] gencode_codon_aa has no primary key

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ah, right.  In schema/chado/load/bin there is a file called
new_bulk_load_gff3.PLS.  To create the loader, execute the command `perl
new_bulk_load_gff3.PLS`, which is what will get done during the make
process (after the name is changed).

More problematic, though, is the wormbase file.  There are several
issues:

  - it's too big.  Unless you have a server with an insane amount of
ram, the loader will end up crashing as it runs out of memory.  I
typically separate GFF files by reference sequences (or srcfeature in
chado-ese).  That may still result in files that are too large--they
should be a few 100K lines a piece to be safe.  (For reference, my 2G
server can process files with around 600K lines.)

  - there are no lines for the chromosomes.  Unless you are loading
those first from a separate file, you need GFF lines for each chromosome
before you reference a feature on it.

  - I can't tell if parent features always come before their children,
but that is a potential problem.

  - Computational results are mixed with curated annotations.  When
loading, you have to specify whether the file contains computational
results or not, and it can be only one or the other (see the docs for
prepping the database to accept computational results and commandline
args for loading).

I am working a GFF preprocessor that will take care of many of these
aspects, but up until now, I've been handling these things with quicky
perl scripts.

Scott

On Tue, 2006-07-11 at 10:39 +0100, Anthony Rogers wrote:
> Scott Cain wrote:
> > Hi Ant,
> >
> > It looks to me like you are trying to us gmod_load_gff3.pl, which I
> > haven't tested and is very likely not working (as you seem to have foun=
d
> > out).  Try gmod_new_bulk_load_gff3.pl (which will have its name changed
> > later this week to gmod_bulk_load_gff3.pl).
> >
> > Related to that, I'll be curious about how the loading goes.  The last
> > time I looked at wormbase GFF, it looked to me like it wasn't sorted th=
e
> > way that the loader will like.  It is on my list of things to write a
> > GFF3 preprocesser so that the loader will like it.
> >  =20
> Hi Scott,
> Where do I find gmod_new_bulk_load_gff3.pl  ? when I do a CVS update the=20
> gmod_bulk_load_gff3.pl script is still unchanged from May 16th
>=20
> Payan, in Lincoln's group, has generated a new GFF3 version of WormBase=20
> data which I've been trying to use.  Its available at . . .
> http://dev.wormbase.org/~canaran/gff3_download_2006-06-27/elegansWS159.gf=
f3.08.sorted.gz
>=20
> Ant
> > Nevertheless, I will try to fix the Class::DBI errors that are occurrin=
g
> > in your output below, since the recently released modware relies on it.
> >
> > Scott
> >
> >
> > On Fri, 2006-07-07 at 12:15 +0100, Anthony Rogers wrote:
> >  =20
> >> I've come back to my CHADO adventure and Im stuck on this error with t=
he=20
> >> gff loader
> >>
> >> perl new_gff_loader.pl --gfffile /wormsrv2/GMOD/DOWNLOADS/mini.gff3=20
> >> -organism worm --srcdb wormbase
> >> .
> >> .
> >> .
> >> 7.4     pg_catalog.     gencode
> >> 7.4     pg_catalog.     gencode_codon_aa
> >> Use of uninitialized value in split at=20
> >> /nfs/disk100/wormpub/lib/perl5/site_perl/5.8.7/Class/DBI/Pg.pm line 10=
8.
> >> gencode_codon_aa has no primary key at ../../lib/Bio/GMOD/Load/GFF.pm=20
> >> line 5
> >> Compilation failed in require at ../../lib/Bio/GMOD/Load/GFF.pm line 5=
.
> >> BEGIN failed--compilation aborted at ../../lib/Bio/GMOD/Load/GFF.pm li=
ne 5.
> >> Compilation failed in require at new_gff_loader.pl line 8.
> >> BEGIN failed--compilation aborted at new_gff_loader.pl line 8.
> >>
> >> look familiar ?
> >> This obviously happens in compilation so it cant be data.  There must =
be=20
> >> something wrong in the database.
> >>
> >> Any help appreciated . .
> >> Ant
> >>
> >>    =20
>=20
>=20
--=20
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         ca...@cs...
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory