From: jeleclai <jel...@un...> - 2016-01-26 14:12:05
|
Hi, thanks for your email. So far, the file format is implemented for ion mobility data but the design is applicable to non-IMS data as well. So the support for "regular" experiments is planned as well. We are aware of the mz5 implementation as it is a good example how mass spectrometry data can be transferred to a binary format in combination with the benefits from the mzML ontology. Nevertheless, it seems that mz5, although being available now for quite some time (2012), it has not been extensively used. mz5 is more efficient in terms of file size and I/O compared to mzML. But this gain in performance alone didn't seem to convince the broad mass spectrometry community to change to a new file format. I can see two reasons for that: First, never change a running system. Second, data was not "large enough" to push a change in using a different file format. But especially with the addition of ion mobility as an extra dimension of separation, data complexity is increasing and will continue to increase in the future with evolving techniques. Therefore, although it requires the use of an additional library (HDF5) a binary format will become essential when working with large data. HDF5 is portable and has C, C++, Java and python interfaces, so it should not be an obstacle. In addition, it is well-established as scientific file format with many different tools including tools to inspect the file (HDF5View) and the hdf group offers good service ( detailed documentation, free email support, new release twice a year...). Numpress definitely sounds interesting and I will take a look at it. ;-) Best wishes, Jenny Am 25.01.2016 um 16:43 schrieb Johan Teleman: > Hi Jenny, > > Having access to a binary format could clearly benefit many applications. Will this format be solely intended for ion mobility experiments, or do you also aim to target "regular" experiment data? > > The HDF5 library was utilized once already for a MS data format: mz5 (http://www.mcponline.org/content/11/1/O111.011379.full.pdf+html), maybe something can be reused from there. Depending on how data will be stored locally, numerical compression through Numpress might be interesting? Further I would recommend to walk the extra mile and benchmark the file format using data-files from different vendors and on different samples. Your result will vary depending on data density. > > A last mention is that I've had great experience with Google protobuf based binary file format, if the HDF5 does not work out for you. Protobuf does NOT have any support for random access though, so that would require a custom solution. > > Just my few cents, good luck with the implementation! > > /J > > Johan Teleman > Ph.D. student > Dept. of Immunotechnology > Lund University > > ________________________________________ > Från: jeleclai [jel...@un...] > Skickat: den 25 januari 2016 13:34 > Till:psi...@li... > Kopia: Navarro, Pedro > Ämne: [Psidev-ms-dev] HDF5-based mass spectrometry formats > > Hi all, > > I am a Phd student in Stefan Tenzer's lab. We are currently working on a > new format for mass spectrometric data, based on HDF5. > > We started to develop this format because we struggled with ion mobility > data: random access data to vendor's raw files is, as everyone knows, > suboptimal, and especially in the case of including ion mobility, we > foresee better ways to organize MS data in order to improve random access. > > In this regard, the HDF5 library is of great advantage as it combines a > simple and flexible way to structure the data with several efficient > methods for I/O access. > > Additionally, HDF5 library offers a good support for developers. It is > already successfully implemented in many other scientific fields like > Astronomy, Climatology, Genetics... All of this makes it a good > candidate for future developing of standard formats of large data, which > are interrogated in a random access way. > > We would be very happy to bring this topic to discussion for the next > PSI meeting! > > Best wishes, > > Jenny > > -- > > Jennifer Leclaire > M.Sc. Angewandte Bioinformatik > > -------------------------------------------------- > Institut für Immunologie > Universitätsmedizin der > Johannes Gutenberg-Universität > Langenbeckstr.1 > 55131 Mainz > www.immunologie.uni-mainz.de > > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Jennifer Leclaire M.Sc. Angewandte Bioinformatik -------------------------------------------------- Institut für Immunologie Universitätsmedizin der Johannes Gutenberg-Universität Langenbeckstr.1 55131 Mainz www.immunologie.uni-mainz.de |