From: Richard S. <r.a...@ru...> - 2008-12-15 15:47:38
|
Dear all, I'm writing a metabolomics Java library for my processing software targeted towards high resolution LC/MS data (components like: peak picking, noise detection, etc). The basic element for this library is a mass chromatogram (x-axis time, y-axis intensity). In order to deal with the size of the datasets, enable other people to easily write components, etc. an intermediate file format is required. I've already defined one myself to see what would be required for a really dynamic pipeline and created a library on top of it. In order to hop on the international standards bandwagon I would off course like to move to a recognized standard like mzML. However, this _seems_ to fall short of my requirements. I've been browsing through this list and the standards document and tried to implement something, but I can't seem to figure out how to approach this. What I would like to be able to do is store the following pieces of information: - Multiple runs in a single file From the specification document: "A run in mzML should correspond to a single, consecutive and coherent set of scans on an instrument". This means that I essentially can only store data from a single raw file? I would like to do mix-models (see sets). - MassChromatograms (single mass) I would like to ubiquitously store mass chromatograms picked from the raw data. This means that both mass chromatograms made from centroid as well as profile data need to be stored. There is the option to store chromatogram data, but this seems limited to '2D' data where mass chromatogram data build with profile data needs '3D' data. In order to solve this the accession="MS:1000627" name="selected ion current chromatogram" needs sub-children? Or can it be set globally in the header or run, but then I would like to be able to mix models (see sets). - BackgroundIons The use of background ions are part of the pipeline. To store this an addition to the CV needs to be made. name: backgroundion chromatogram is_a: chromatogram type (MS:1000626) definition: chromatogram created by creating an array of a ubiquitously present mass. - sets The goal of the pipeline is to combine information from lots of measurements (biological, technical replicates, different machines, etc) and do perform various analysis methods. This means for example that background ions or mass chromatograms from various measurements need to be combined into sets. Different operations and visualizations are then possible on the data stored in the files. The same as the different runs applies here. Another option would be to make sets of sets, which means that the relation needs to be recursive. I can probably solve some of the issues with id-fields, but that would make it hard to parse for other people and sort of rule out the recursive relation. I can off course solve a lot with the use of userParam tags, but then other people will have a hard time reading the data. Another thing, I once heard somebody mention AnalysisML to be something along these lines, however this project seems to have suffered a fatal end as I cannot find anything? Am I trying to use mzML outside of its boundries? If so, is a viable alternative available which I have so far been unable to find? Cheers, Richard -- Drs. ing. RA Scheltema Groningen Bioinformatics Centre Tel: +31 50 363 8078 Kerklaan 30 Fax: +31 50 363 7976 9751 NN, Haren Mob: +31 6 140 280 21 The Netherlands Public PGP key: http://pgp.surfnet.nl:11371/ Key Fingerprint: 3A4F 3029 DF7D 2562 6653 053B 458C 39E7 C428 4618 |