Thread: [Gmod-schema] Chado schema for the bulk loader

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I have a question about a creating a template for the bulk loader. I am 
aware of the tutorial on tripal.org, which is absolutely useful, as well 
as some posts on the list, but my file seems to be too large to be fit 
into a Chado database using the steps from the tutorial.

In the file each protein has multiple references to external databases 
(UniProt, PDB, PFAM) and EC number, information about relevant genes (GI 
number, GN, GO), the organism (Taxonomy ID, domain, class, family, 
genus, species), and some fields for the protein itself (ID, 
description, sequence length). All in all about 20 columns.

Guess creating a template the usual way (as described in the tutorial) 
would lead to a mess, as there are too many fields to use and 
cross-link. So I wonder if there is a (recommended) way to load such 
data into Chado more efficiently. Would long format help (instead of the 
wide format, which is being used now)? Or is it better to split the data 
into several subsets and load each independently? Or re-organise the 
columns somehow?

Any help would be very much appreciated.

Olga

Thread: [Gmod-schema] Chado schema for the bulk loader

gmod-schema