From: Angel P. <an...@ma...> - 2007-04-20 20:24:04
|
On 4/20/07, Coleman, Michael <MK...@st...> wrote: > > Angel, thanks for the clarification. > no prob I agree that there'd generally be no reason not to zlib-compress--the base6= 4 > string is already quite opaque. The only factors I'd see would be (a) > occasionally zlib will make strings larger (though not by too much), and > this does happen occasionally for spectra without any "real" data in it, bu= t as you say, this does not increase much and this is an edge case anyway. By way of (b) I sincerely doubt that any instrument vendor will use dataxml as their native format, so there will always be some sort of native-format to dataxml conversion process, which usually lives on a "smart" data analysis machine ;) -angel (b) whether it'd be a burden for dumb instruments to have to know zlib. > > Mike > > > -----Original Message----- > *From:* Brian Pratt [mailto:bri...@in...] > *Sent:* Friday, April 20, 2007 3:04 PM > *To:* 'Angel Pizarro'; Coleman, Michael > *Cc:* psi...@li...; 'Andreas R=F6mpp' > *Subject:* RE: [Psidev-ms-dev] Separate binary file for very large data > sets? > > Angel is correct - I did mean compression (base64 encoding actually bloat= s > the data a bit). > > I believe the proposed next version of mzData adopts Patrick Pedrioli's > mzXML 3.0 technique of optionally first compressing (with zlib) the binar= y > data before encoding it (with base64). I'd argue for making it mandito= ry, > since it's not like there's any loss of human-readability, and the files = do > shrink admirably. > > - Brian > > ------------------------------ > *From:* psi...@li... [mailto: > psi...@li...] *On Behalf Of *Angel Pizarro > *Sent:* Friday, April 20, 2007 12:43 PM > *To:* Coleman, Michael > *Cc:* psi...@li...; Brian Pratt; Andreas R=F6mpp > *Subject:* Re: [Psidev-ms-dev] Separate binary file for very large data > sets? > > Brian is refering to the way mzXML 3.0 (and the new dataXML format) zlib > compress the "base64-encoded" spectra. While not an official part of the > mzData 1.05 schema, I believe there are several groups that use this > technique in-house for storage of mzData files. > > -angel > > > On 4/20/07, Coleman, Michael <MK...@st...> wrote: > > > > If by "compressed" you mean "base64-encoded", I think it's important to > > use the latter term, to avoid giving the wrong impression. As far as I > > know, compression is not a feature--nor a goal--of mzData. > > > > For what it's worth, I encountered my first mzData file in a work > > situation this week. It's 2.7 times as large as the corresponding ms2 > > file. > > > > Mike > > > > > > > > > -----Original Message----- > > > From: psi...@li... > > > [mailto:psi...@li... ] On > > > Behalf Of Brian Pratt > > > Sent: Friday, April 20, 2007 2:02 PM > > > To: 'Andreas R=F6mpp'; psi...@li... > > > Subject: Re: [Psidev-ms-dev] Separate binary file for very > > > large data sets? > > > > > > > > > I wonder if it wouldn't make as much sense to treat the > > > mzData file as the > > > "binary file" and come up with a sort of summary schema of > > > your own that > > > could point into the mzData file. You'd get maximum reuse of > > > community > > > source code that way. > > > > > > But first, I'd say try it with straight-up mzData with > > > compressed peak lists > > > and see if you really need to go to the bother of a separate > > > file. I'm > > > guessing you'll be pleasantly surprised. Plus, I really, > > > really dislike the > > > use of interdependent files - one or the other is forever > > > getting out of > > > synch, lost, renamed, etc. > > > > > > Hope this helps, > > > > > > Brian Pratt > > > www.insilicos.com > > > > > > -----Original Message----- > > > From: psi...@li... > > > [mailto:psi...@li... ] On > > > Behalf Of Andreas > > > R=F6mpp > > > Sent: Friday, April 20, 2007 8:45 AM > > > To: psi...@li...; > > > And...@an... > > > Subject: [Psidev-ms-dev] Separate binary file for very large > > > data sets? > > > > > > Hello everybody, > > > > > > We develop software for imaging mass spectrometry in the > > > framework of a > > > project funded by the European Union. We intend to use > > > dataXML as a standard > > > format to exchange data between the different partner labs > > > and also (as far > > > as possible) as the internal data format for a joint > > > processing software > > > suite. However, we run into the problem of very large data > > > sets which can > > > easily exceed 1GB (e.g. 256 > > > *256 pixels with one high resolution mass spectrum each). Therefore w= e > > > > > thought about storing the spectrum data > > > ('MassToChargeRatioArray' and ' > > > 'IntensityArray') in a separate binary file. This would make > > > data handling > > > much faster and easier ( e.g. when parsing the XML file). So instead > > of > > > writing the binary data in the XML file we plan to include a link to = a > > > separate file (file location, start and end position of > > > spectrum in binary > > > file). > > > This problem is somewhat similar to the already discussed > > > issue of an index > > > file. > > > Would it be possible to include such an option (external > > > binary file) into > > > the dataXML standard? > > > > > > Best regards, > > > Andreas > > > > > > -- > > > -------------------------------------------------------------- > > > -------------- > > > ------------- > > > Dr. Andreas Roempp > > > Institute of Inorganic and Analytical Chemistry > > > - Analytical Chemistry - > > > Justus Liebig University Giessen > > > Schubertstrasse 60, Build. 16 > > > D-35392 Giessen > > > Germany > > > > > > phone: +49-641-99 34802 > > > fax: +49-641-99 34809 > > > email: And...@an... > > > Internet: http://www.uni-giessen.de/analytik/ > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > ----------- > > > This SF.net email is sponsored by DB2 Express Download DB2 > > > Express C - the > > > FREE version of DB2 express and take control of your XML. No > > > limits. Just > > > data. Click to get it now. > > > http://sourceforge.net/powerbar/db2/ > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > > -------------------------------------------------------------- > > > ----------- > > > This SF.net email is sponsored by DB2 Express > > > Download DB2 Express C - the FREE version of DB2 express and take > > > control of your XML. No limits. Just data. Click to get it now. > > > http://sourceforge.net/powerbar/db2/ > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > > -----------------------------------------------------------------------= -- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -- > Angel Pizarro > Director, Bioinformatics Facility > Institute for Translational Medicine and Therapeutics > University of Pennsylvania > 806 BRB II/III > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > > P: 215-573-3736 > F: 215-573-9004 > > --=20 Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |