From: Brian P. <bri...@in...> - 2007-04-20 20:09:58
|
Oops, right. "dataXML" and "mzData" aren't the same thing. =20 I think my brain keeps saying "mzData" because it just can't handle the awfulness of saying "dataXML". =20 - Brian _____ =20 From: Brian Pratt [mailto:bri...@in...]=20 Sent: Friday, April 20, 2007 1:04 PM To: 'Angel Pizarro'; 'Coleman, Michael' Cc: 'psi...@li...'; 'Andreas R=F6mpp' Subject: RE: [Psidev-ms-dev] Separate binary file for very large data = sets? Angel is correct - I did mean compression (base64 encoding actually = bloats the data a bit). =20 =20 I believe the proposed next version of mzData adopts Patrick Pedrioli's mzXML 3.0 technique of optionally first compressing (with zlib) the = binary data before encoding it (with base64). I'd argue for making it = manditory, since it's not like there's any loss of human-readability, and the files = do shrink admirably. =20 - Brian _____ =20 From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Friday, April 20, 2007 12:43 PM To: Coleman, Michael Cc: psi...@li...; Brian Pratt; Andreas R=F6mpp Subject: Re: [Psidev-ms-dev] Separate binary file for very large data = sets? Brian is refering to the way mzXML 3.0 (and the new dataXML format) zlib compress the "base64-encoded" spectra. While not an official part of the mzData 1.05 schema, I believe there are several groups that use this technique in-house for storage of mzData files.=20 -angel On 4/20/07, Coleman, Michael <MK...@st...> wrote:=20 If by "compressed" you mean "base64-encoded", I think it's important to = use the latter term, to avoid giving the wrong impression. As far as I = know, compression is not a feature--nor a goal--of mzData.=20 For what it's worth, I encountered my first mzData file in a work = situation this week. It's 2.7 times as large as the corresponding ms2 file. Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li... <mailto:psi...@li...> ] On > Behalf Of Brian Pratt > Sent: Friday, April 20, 2007 2:02 PM > To: 'Andreas R=F6mpp'; psi...@li... > Subject: Re: [Psidev-ms-dev] Separate binary file for very=20 > large data sets? > > > I wonder if it wouldn't make as much sense to treat the > mzData file as the > "binary file" and come up with a sort of summary schema of > your own that=20 > could point into the mzData file. You'd get maximum reuse of > community > source code that way. > > But first, I'd say try it with straight-up mzData with > compressed peak lists=20 > and see if you really need to go to the bother of a separate > file. I'm > guessing you'll be pleasantly surprised. Plus, I really, > really dislike the > use of interdependent files - one or the other is forever=20 > getting out of > synch, lost, renamed, etc. > > Hope this helps, > > Brian Pratt > www.insilicos.com > > -----Original Message-----=20 > From: psi...@li... > [mailto:psi...@li... <mailto:psi...@li...> ] On > Behalf Of Andreas > R=F6mpp > Sent: Friday, April 20, 2007 8:45 AM > To: psi...@li...; > And...@an... > Subject: [Psidev-ms-dev] Separate binary file for very large > data sets? > > Hello everybody, > > We develop software for imaging mass spectrometry in the=20 > framework of a > project funded by the European Union. We intend to use > dataXML as a standard > format to exchange data between the different partner labs > and also (as far > as possible) as the internal data format for a joint=20 > processing software > suite. However, we run into the problem of very large data > sets which can > easily exceed 1GB (e.g. 256 > *256 pixels with one high resolution mass spectrum each). Therefore we = > thought about storing the spectrum data > ('MassToChargeRatioArray' and ' > 'IntensityArray') in a separate binary file. This would make > data handling > much faster and easier ( e.g. when parsing the XML file). So instead = of > writing the binary data in the XML file we plan to include a link to a > separate file (file location, start and end position of > spectrum in binary > file).=20 > This problem is somewhat similar to the already discussed > issue of an index > file. > Would it be possible to include such an option (external > binary file) into > the dataXML standard?=20 > > Best regards, > Andreas > > -- > -------------------------------------------------------------- > -------------- > ------------- > Dr. Andreas Roempp > Institute of Inorganic and Analytical Chemistry=20 > - Analytical Chemistry - > Justus Liebig University Giessen > Schubertstrasse 60, Build. 16 > D-35392 Giessen > Germany > > phone: +49-641-99 34802 > fax: +49-641-99 34809=20 > email: And...@an... > Internet: http://www.uni-giessen.de/analytik/ <http://www.uni-giessen.de/analytik/>=20 > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by DB2 Express Download DB2 > Express C - the > FREE version of DB2 express and take control of your XML. No > limits. Just > data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by DB2 Express=20 > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ <http://sourceforge.net/powerbar/db2/>=20 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -------------------------------------------------------------------------= This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take=20 control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Psidev-ms-dev mailing list=20 Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev <https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev>=20 --=20 Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd.=20 Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004=20 |