From: Scott C. <sc...@sc...> - 2018-01-25 18:45:41
|
Hi Olga, This is more of a Tripal question, so I'm cc'ing the Tripal list. Scott On Thu, Jan 25, 2018 at 7:33 AM, Olga Klonova <Olg...@st...> wrote: > Hi, > > I have a question about a creating a template for the bulk loader. I am > aware of the tutorial on tripal.org, which is absolutely useful, as well > as some posts on the list, but my file seems to be too large to be fit into > a Chado database using the steps from the tutorial. > > In the file each protein has multiple references to external databases > (UniProt, PDB, PFAM) and EC number, information about relevant genes (GI > number, GN, GO), the organism (Taxonomy ID, domain, class, family, genus, > species), and some fields for the protein itself (ID, description, sequence > length). All in all about 20 columns. > > Guess creating a template the usual way (as described in the tutorial) > would lead to a mess, as there are too many fields to use and cross-link. > So I wonder if there is a (recommended) way to load such data into Chado > more efficiently. Would long format help (instead of the wide format, which > is being used now)? Or is it better to split the data into several subsets > and load each independently? Or re-organise the columns somehow? > > Any help would be very much appreciated. > > Olga > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |