psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 118)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> Angel Pizarro:

> I am cringing as I write this, since I really think you=20
> should not go this=20
> route, but look at the supplementary data tags.

I am cringing with you.  :-)

Abusing the supplementary tags for this purpose is definitely out--this
is an even more unpleasant option than going with a home-grown mzData
extension.

> ah, yes, but most probably you have to either zcat the file=20
> or unzip it in=20
> order to read the floats, then zip the whole file back again=20
> once finished, a=20
> situation not unlike decoding byte arrays and base64 strings....

Yes, having zipped files does imply having gzip/etc around, and you are
correct that this is in some ways similar.  A notable difference is that
zip tools are already ubiquitous, standard, reliable, and
well-understood by users.  The scripts I'll have to write to decode
mzData won't be.

(Note, too, that it is not necessary to unzip and rezip in order to just
read a compressed file.  The 'zcat' program and its variations (there's
surely a ruby module, for instance) can read the file without disturbing
it.)

> I'll add to those arguments that we should look at the=20
> computational costs of un/zipping whole files as opposed to stream=20
> en/decoding individual mzData spectra.

I agree that zip'ing will have a greater cost than generating base64.  I
don't think the cost is great, and in any case, zip'ing isn't necessary
unless you're hurting for disk space. =20

Disk is cheap.  If I zip'ed these files, it would be as much to get the
checksumming as to save the disk space.

> 1) it can handle encoding of integers, single and double precision
float=20
> arrays without loss of information

As far as I know, a textual representation can also do this perfectly.

> 2) comparable compression with zipped plain text of the same precision

I agree that they're similar, within the bounds that I care about
(2-3x).

> 3) better performance with respect to accessing individual spectra vs.

> compressed plain text

If you mean that you can easily seek to a particular spectrum in a file
(presuming that some index is already present), I agree that this is
simpler and much faster.  As far as I know, seeking in a zip file isn't
really efficient.  If I thought I was going to need to do this, I'd want
to store the files uncompressed.  (As a practical matter, I can't think
of a reason we'd need to do this here.)

Mike

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (3)	Nov	Dec
2003	Jan	Feb	Mar	Apr (1)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov (3)	Dec
2004	Jan	Feb	Mar	Apr	May (2)	Jun	Jul (1)	Aug (5)	Sep	Oct (5)	Nov (1)	Dec (2)
2005	Jan (2)	Feb (5)	Mar	Apr (1)	May (5)	Jun (2)	Jul (3)	Aug (7)	Sep (18)	Oct (22)	Nov (10)	Dec (15)
2006	Jan (15)	Feb (8)	Mar (16)	Apr (8)	May (2)	Jun (5)	Jul (3)	Aug (1)	Sep (34)	Oct (21)	Nov (14)	Dec (2)
2007	Jan	Feb (17)	Mar (10)	Apr (25)	May (11)	Jun (30)	Jul (1)	Aug (38)	Sep	Oct (119)	Nov (18)	Dec (3)
2008	Jan (34)	Feb (202)	Mar (57)	Apr (76)	May (44)	Jun (33)	Jul (33)	Aug (32)	Sep (41)	Oct (49)	Nov (84)	Dec (216)
2009	Jan (102)	Feb (126)	Mar (112)	Apr (26)	May (91)	Jun (54)	Jul (39)	Aug (29)	Sep (16)	Oct (18)	Nov (12)	Dec (23)
2010	Jan (29)	Feb (7)	Mar (11)	Apr (22)	May (9)	Jun (13)	Jul (7)	Aug (10)	Sep (9)	Oct (20)	Nov (1)	Dec
2011	Jan	Feb (4)	Mar (27)	Apr (15)	May (23)	Jun (13)	Jul (15)	Aug (11)	Sep (23)	Oct (18)	Nov (10)	Dec (7)
2012	Jan (23)	Feb (19)	Mar (7)	Apr (20)	May (16)	Jun (4)	Jul (6)	Aug (6)	Sep (14)	Oct (16)	Nov (31)	Dec (23)
2013	Jan (14)	Feb (19)	Mar (7)	Apr (25)	May (8)	Jun (5)	Jul (5)	Aug (6)	Sep (20)	Oct (19)	Nov (10)	Dec (12)
2014	Jan (6)	Feb (15)	Mar (6)	Apr (4)	May (16)	Jun (6)	Jul (4)	Aug (2)	Sep (3)	Oct (3)	Nov (7)	Dec (3)
2015	Jan (3)	Feb (8)	Mar (14)	Apr (3)	May (17)	Jun (9)	Jul (4)	Aug (2)	Sep	Oct (13)	Nov	Dec (6)
2016	Jan (8)	Feb (1)	Mar (20)	Apr (16)	May (11)	Jun (6)	Jul (5)	Aug	Sep (2)	Oct (5)	Nov (7)	Dec (2)
2017	Jan (10)	Feb (3)	Mar (17)	Apr (7)	May (5)	Jun (11)	Jul (4)	Aug (12)	Sep (9)	Oct (7)	Nov (2)	Dec (4)
2018	Jan (7)	Feb (2)	Mar (5)	Apr (6)	May (7)	Jun (7)	Jul (7)	Aug (1)	Sep (9)	Oct (5)	Nov (3)	Dec (5)
2019	Jan (10)	Feb	Mar (4)	Apr (4)	May (2)	Jun (8)	Jul (2)	Aug (2)	Sep	Oct (2)	Nov (9)	Dec (1)
2020	Jan (3)	Feb (1)	Mar (2)	Apr	May (3)	Jun	Jul (2)	Aug	Sep	Oct (1)	Nov	Dec (1)
2021	Jan	Feb	Mar	Apr (5)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2025	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec

psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 118)

psidev-ms-dev — Mass spectroscopy standard development