From: Sook J. <so...@gm...> - 2018-02-02 21:41:01
|
Hello, Since there is a discussion whether or not to use nd_module for storing phenotype and genotype data in Chado, we wanted to share what we do at GDR, CottonGEN and our other databases. Using the nd_experiment table to link stock and genotype (and project, etc) gave us (and others) a severe performance issue. Lacey came up with the idea of using genotype_call table to link genotype and stock (and project etc). https://github.com/UofS-Pulse-Binfo/nd_genotypes/wiki/How-to-Store-your-Data We adopted this and it has been working very well. We decided to make a similar table to link stock and phenotype tables as well for large-scale phenotype data. We have BIMS Tripal module, which breeders use to store/manage their private data and we anticipate the users (so their data) will grow extensively. Some of them will have phenomics data and the volume of phenotype data will become extensive. The phenotype_call table we create is as follows. phenotype_call_id INT PK phenotype_id INT FK project_id INT FK stock_id INT FK nd_geolocation_id INT FK time TIMESTAMP WITHOUT TIMEZONE This table is basically to link the 'sample' stored in stock table (a specific tree or plot of plants that are being phenotyped) and specific phenotypic value stored in 'phenotype' table. Time field is needed here since the exact phenotyping time can be different for each phenotype for the same sample (this data comes from the FieldBook App that many breeders use). nd_location_id is added here since we don't go through nd_experiment and we need to store the location info of the plant (or animal/insect). I don't think this means that we can get rid of nd module. Nd_experiment table and other associated nd tables are being used for experiments using stocks other than phenotyping and genotyping. Examples are cross, field collection, etc. There are also databases that store various protocols and reagents that are associated with experiments (hence nd_protocol, nd_reagents, etc). For us we need nd_experiment table to store cross data and we don't have data for nd_protocol and nd_reagents. One of the important things that came out of ND discussions were using stock table to store 'samples' and use stock_relationship table to store stock and the sample. There were thoughts to create a separate table such as 'observation_unit' etc. The same principle goes with the project table and project_relationship table to store hierarchical datasets. I think we can think of genotype and phenotype as specific examples of 'experiment' that generate HUGE quantity of data so we use specific linker tables without going through the nd_experiment. It would be nice to open up the conversation to come up with consensus but temporarily we have to use these two tables to speed up loading of our data. Hope it helps! Thanks! Sook and Taein |