From: Angel P. <an...@ma...> - 2007-04-20 19:43:59
|
Brian is refering to the way mzXML 3.0 (and the new dataXML format) zlib compress the "base64-encoded" spectra. While not an official part of the mzData 1.05 schema, I believe there are several groups that use this technique in-house for storage of mzData files. -angel On 4/20/07, Coleman, Michael <MK...@st...> wrote: > > If by "compressed" you mean "base64-encoded", I think it's important to > use the latter term, to avoid giving the wrong impression. As far as I > know, compression is not a feature--nor a goal--of mzData. > > For what it's worth, I encountered my first mzData file in a work > situation this week. It's 2.7 times as large as the corresponding ms2 > file. > > Mike > > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Brian Pratt > > Sent: Friday, April 20, 2007 2:02 PM > > To: 'Andreas R=F6mpp'; psi...@li... > > Subject: Re: [Psidev-ms-dev] Separate binary file for very > > large data sets? > > > > > > I wonder if it wouldn't make as much sense to treat the > > mzData file as the > > "binary file" and come up with a sort of summary schema of > > your own that > > could point into the mzData file. You'd get maximum reuse of > > community > > source code that way. > > > > But first, I'd say try it with straight-up mzData with > > compressed peak lists > > and see if you really need to go to the bother of a separate > > file. I'm > > guessing you'll be pleasantly surprised. Plus, I really, > > really dislike the > > use of interdependent files - one or the other is forever > > getting out of > > synch, lost, renamed, etc. > > > > Hope this helps, > > > > Brian Pratt > > www.insilicos.com > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Andreas > > R=F6mpp > > Sent: Friday, April 20, 2007 8:45 AM > > To: psi...@li...; > > And...@an... > > Subject: [Psidev-ms-dev] Separate binary file for very large > > data sets? > > > > Hello everybody, > > > > We develop software for imaging mass spectrometry in the > > framework of a > > project funded by the European Union. We intend to use > > dataXML as a standard > > format to exchange data between the different partner labs > > and also (as far > > as possible) as the internal data format for a joint > > processing software > > suite. However, we run into the problem of very large data > > sets which can > > easily exceed 1GB (e.g. 256 > > *256 pixels with one high resolution mass spectrum each). Therefore we > > thought about storing the spectrum data > > ('MassToChargeRatioArray' and ' > > 'IntensityArray') in a separate binary file. This would make > > data handling > > much faster and easier (e.g. when parsing the XML file). So instead of > > writing the binary data in the XML file we plan to include a link to a > > separate file (file location, start and end position of > > spectrum in binary > > file). > > This problem is somewhat similar to the already discussed > > issue of an index > > file. > > Would it be possible to include such an option (external > > binary file) into > > the dataXML standard? > > > > Best regards, > > Andreas > > > > -- > > -------------------------------------------------------------- > > -------------- > > ------------- > > Dr. Andreas Roempp > > Institute of Inorganic and Analytical Chemistry > > - Analytical Chemistry - > > Justus Liebig University Giessen > > Schubertstrasse 60, Build. 16 > > D-35392 Giessen > > Germany > > > > phone: +49-641-99 34802 > > fax: +49-641-99 34809 > > email: And...@an... > > Internet: http://www.uni-giessen.de/analytik/ > > > > > > > > > > -------------------------------------------------------------- > > ----------- > > This SF.net email is sponsored by DB2 Express Download DB2 > > Express C - the > > FREE version of DB2 express and take control of your XML. No > > limits. Just > > data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > > ----------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > --=20 Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |