From: Olga K. <Olg...@st...> - 2018-01-25 12:49:10
|
Hi, I have a question about a creating a template for the bulk loader. I am aware of the tutorial on tripal.org, which is absolutely useful, as well as some posts on the list, but my file seems to be too large to be fit into a Chado database using the steps from the tutorial. In the file each protein has multiple references to external databases (UniProt, PDB, PFAM) and EC number, information about relevant genes (GI number, GN, GO), the organism (Taxonomy ID, domain, class, family, genus, species), and some fields for the protein itself (ID, description, sequence length). All in all about 20 columns. Guess creating a template the usual way (as described in the tutorial) would lead to a mess, as there are too many fields to use and cross-link. So I wonder if there is a (recommended) way to load such data into Chado more efficiently. Would long format help (instead of the wide format, which is being used now)? Or is it better to split the data into several subsets and load each independently? Or re-organise the columns somehow? Any help would be very much appreciated. Olga |
From: Scott C. <sc...@sc...> - 2018-01-25 18:45:41
|
Hi Olga, This is more of a Tripal question, so I'm cc'ing the Tripal list. Scott On Thu, Jan 25, 2018 at 7:33 AM, Olga Klonova <Olg...@st...> wrote: > Hi, > > I have a question about a creating a template for the bulk loader. I am > aware of the tutorial on tripal.org, which is absolutely useful, as well > as some posts on the list, but my file seems to be too large to be fit into > a Chado database using the steps from the tutorial. > > In the file each protein has multiple references to external databases > (UniProt, PDB, PFAM) and EC number, information about relevant genes (GI > number, GN, GO), the organism (Taxonomy ID, domain, class, family, genus, > species), and some fields for the protein itself (ID, description, sequence > length). All in all about 20 columns. > > Guess creating a template the usual way (as described in the tutorial) > would lead to a mess, as there are too many fields to use and cross-link. > So I wonder if there is a (recommended) way to load such data into Chado > more efficiently. Would long format help (instead of the wide format, which > is being used now)? Or is it better to split the data into several subsets > and load each independently? Or re-organise the columns somehow? > > Any help would be very much appreciated. > > Olga > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |