From: jeleclai <jel...@un...> - 2016-01-26 15:39:52
|
Hi Andy, I am using the c++ libraries but lately, we tried HDF5's dotnet framework with c# successfully to read and write HDF5 files. In case of Java and Python, I currently have little experience, but as far as I know most functionalities of HDF5 are supported in these languages, and some datatypes (such as compound and variable length datatypes), which were problematic some time ago are now supported by translation to alternative datatypes. Concerning the Java interface, there are some functions that are not supported, most of them connected to pointers. See (https://www.hdfgroup.org/products/java/JNI/jhi5/jhi5_unsupported_functions.html) for a complete list. So far, we converted Waters HDMS data to our format without storing empty scans. I will attend the PSI, so it would be great to discuss with you about it. Best, Jenny Am 26.01.2016 um 15:36 schrieb Jones, Andy: > Hi Jenny, > > When we last evaluated mz5, one of the issues appeared to be lack of good libraries beyond C++ for HDF5, so I would be interested to see if Java and Python libraries have improved in the last few years - do you know if much has changed? My recollection was that the Java methods only worked by bridging into C++, and did not provide native support for all reading and writing tasks that would be needed. As such, unless we took on the job of building true HDF5 support into Java and other languages, it seemed that other developers would be locked out. In the PSI we need to be quite conservative about the technologies we build upon, and this was certainly a concern at that time. > > All that said, I think there is still interest in the PSI for exploring how data representations might evolve in the future. It is ultimately up to Eric Deutsch who leads the PSI-MS part, but I would be keen to hear about what you've been doing at the upcoming PSI meeting in Ghent (18-20th April) if you can come along? > > Eric - what do you think, is there room on the agenda for this? > > I also have a few projects going with Waters, and I'm far from convinced that mzML is the right solution for representing their HDMS (ion mobility) data sets, at least without very specialised compression for all the empty scans that result. Independently of progress within the PSI, I would be interested to see what you've been developing. > best wishes > Andy > > > > > > ________________________________________ > From: jeleclai [jel...@un...] > Sent: 26 January 2016 14:14 > To: psi...@li... > Subject: Re: [Psidev-ms-dev] HDF5-based mass spectrometry formats > > Hi, > > thanks for your email. > > So far, the file format is implemented for ion mobility data but the > design is applicable to non-IMS data as well. So the support for > "regular" experiments is planned as well. > > We are aware of the mz5 implementation as it is a good example how mass > spectrometry data can be transferred to a binary format in combination > with the benefits from the mzML ontology. Nevertheless, it seems that > mz5, although being available now for quite some time (2012), it has > not been extensively used. mz5 is more efficient in terms of file size > and I/O compared to mzML. But this gain in performance alone didn't seem > to convince the broad mass spectrometry community to change to a new > file format. I can see two reasons for that: First, never change a > running system. Second, data was not "large enough" to push a change in > using a different file format. But especially with the addition of ion > mobility as an extra dimension of separation, data complexity is > increasing and will continue to increase in the future with evolving > techniques. Therefore, although it requires the use of an additional > library (HDF5) a binary format will become essential when working with > large data. HDF5 is portable and has C, C++, Java and python > interfaces, so it should not be an obstacle. In addition, it is > well-established as scientific file format with many different tools > including tools to inspect the file (HDF5View) and the hdf group offers > good service ( detailed documentation, free email support, new release > twice a year...). > > Numpress definitely sounds interesting and I will take a look at it. ;-) > > Best wishes, > > Jenny > > > > Am 25.01.2016 um 16:43 schrieb Johan Teleman: >> Hi Jenny, >> >> Having access to a binary format could clearly benefit many applications. Will this format be solely intended for ion mobility experiments, or do you also aim to target "regular" experiment data? >> >> The HDF5 library was utilized once already for a MS data format: mz5 (http://www.mcponline.org/content/11/1/O111.011379.full.pdf+html), maybe something can be reused from there. Depending on how data will be stored locally, numerical compression through Numpress might be interesting? Further I would recommend to walk the extra mile and benchmark the file format using data-files from different vendors and on different samples. Your result will vary depending on data density. >> >> A last mention is that I've had great experience with Google protobuf based binary file format, if the HDF5 does not work out for you. Protobuf does NOT have any support for random access though, so that would require a custom solution. >> >> Just my few cents, good luck with the implementation! >> >> /J >> >> Johan Teleman >> Ph.D. student >> Dept. of Immunotechnology >> Lund University >> >> ________________________________________ >> Från: jeleclai [jel...@un...] >> Skickat: den 25 januari 2016 13:34 >> Till:psi...@li... >> Kopia: Navarro, Pedro >> Ämne: [Psidev-ms-dev] HDF5-based mass spectrometry formats >> >> Hi all, >> >> I am a Phd student in Stefan Tenzer's lab. We are currently working on a >> new format for mass spectrometric data, based on HDF5. >> >> We started to develop this format because we struggled with ion mobility >> data: random access data to vendor's raw files is, as everyone knows, >> suboptimal, and especially in the case of including ion mobility, we >> foresee better ways to organize MS data in order to improve random access. >> >> In this regard, the HDF5 library is of great advantage as it combines a >> simple and flexible way to structure the data with several efficient >> methods for I/O access. >> >> Additionally, HDF5 library offers a good support for developers. It is >> already successfully implemented in many other scientific fields like >> Astronomy, Climatology, Genetics... All of this makes it a good >> candidate for future developing of standard formats of large data, which >> are interrogated in a random access way. >> >> We would be very happy to bring this topic to discussion for the next >> PSI meeting! >> >> Best wishes, >> >> Jenny >> >> -- >> >> Jennifer Leclaire >> M.Sc. Angewandte Bioinformatik >> >> -------------------------------------------------- >> Institut für Immunologie >> Universitätsmedizin der >> Johannes Gutenberg-Universität >> Langenbeckstr.1 >> 55131 Mainz >> www.immunologie.uni-mainz.de >> >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- > > Jennifer Leclaire > M.Sc. Angewandte Bioinformatik > > -------------------------------------------------- > Institut für Immunologie > Universitätsmedizin der > Johannes Gutenberg-Universität > Langenbeckstr.1 > 55131 Mainz > www.immunologie.uni-mainz.de > > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Jennifer Leclaire M.Sc. Angewandte Bioinformatik -------------------------------------------------- Institut für Immunologie Universitätsmedizin der Johannes Gutenberg-Universität Langenbeckstr.1 55131 Mainz www.immunologie.uni-mainz.de |