From: Jascha S. <jot...@gm...> - 2012-03-03 11:59:34
|
On 1. Mrz 2012, at 09:53, Inus Scheepers wrote: > On 2012/03/01 9:44 AM, Sven Klages wrote: >> Being on the quest for a usable and affordable LIMS for a "next >> generation sequencing" group, >> I am wondering if Bika LIMS offers such kind of sample/workflow/data >> handling? >> We are running Illumina and Roche sequencers. I couldn't find any hint >> in the docs; version 3 >> looks promising, but is not completed yet ... >> >> Any thoughts? >> >> thanks, Sven > > Sven, Bika-LIMS already supports the normal workflow of a generic lab, > i.e., analysis request, > sample reception, results capture, verification/retraction. As it is > built on Zope/Plone and in > python, modification to workflow and automated access to eg. file-based > instrument data is > of course possible. > > What would be needed to start would be a short requirements analysis and to > roughly spec out what would have to be added code-wise. In which form is > the data, > and what volumes do you anticipate? > > regards Hi all, the point to be mentioned here is that next-generation sequencing (NGS) produces vast amounts of data, much more so than simple analyses. Whereas analyses of, say, a blood sample may produce tens of individual results (glucose concentration, etc.) the analysis of a sample for its genomic content may yield hundreds of millions of data pieces—think runs in a sequencing experiment. So naturally storing this data presents a challenge and requires a different approach on the LIMS side. But not only is the storage of sequencing data a challenging task, even more so is the production of meaningful results from these data. This is mostly not a simple matter of assigning a single numeric value —think glucose concentration— to an analysis as its result, but rather requires a complex computational process and can produce large amounts of data, such as a gene sequence. The deduction of results from genomic (or any other "-omics") data is heavily demanding on the computational infrastructure, requiring dedicated servers and clusters, with large capabilities in both memory and processing, as well as specialized software packages optimized to run on parallelized hard ware. How then to best store genomics / -omics sample data and produce results from them via a LIMS? What might present a viable strategy is to use the LIMS part mainly to keep different pieces of information together under a consistent and user-friendly interface, but use dedicated systems for the storage of data and the processing of results. Data storage is certainly the smaller of the two problems, as many possible candidates come to mind quite readily: relational, document and object databases as well as flat-file formats exist and can be scaled to very large size. Producing usable results from -omics data, on the other hand, is a complicated process and the methods by which the processing works are still evolving rapidly so every organization tends to create their own recipe. From proprietary device software and device-specific formats to collections of open source packages and custom-written script files, all combinations of software and data formats are being used. It might well be possible to package an entire results-processing system into a virtual machine, complete with the exact software environment required for the analysis of a certain type of -omics data, and then host multiple virtual machines for the processing of data from different platforms to produce consistent, reliable results. The virtual machines could likely be run in the cloud, making use of scalability, and the results could be returned back to the LIMS upon completion for easy access by the user. This combination of a LIMS, a scalable data store and a scalable, reproducible computation platform would certainly be very useful to any -omics project. The LIMS is the central element here, interfacing between the user and the high-level pieces of information, "what sample? entered by whom? when? what analyses were performed? and what were their results?" and also connects this information with the underlying computational infrastructure and the large data "bobs" from which the results were derived. Just my two cents, but having read Jordi's latest post ("Extending calculation functionalities") I can tell I'm not the only one thinking about how to extend the LIMS functionality Bika offers. Maybe we can connect all these ideas into some kind of common framework. Best, Jascha |