Thread: [Psidev-ms-dev] Indexing in mzML

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I really like the new name for the format.  I read the meeting minutes and
such and it seems like you all have gotten the best of both formats (and
then some) into the new format; I will be happy to see mzData and mzXML go
away.  But one thing I've missed a lot in mzData (even though I think it's a
better format because of the flat spectra list) is an index to quickly
access a given scan number.  I know indexes in XML are not pretty or simple,
but I really think having one is the difference between having to load the
whole file into memory (or continually parse the file to find the desired
scan number(s)) and merely jumping right to the correct point in the file to
start parsing.  For spectra files which are a hundred megabytes or more, and
especially when reading them over a network drive, that's a very bad
proposition.  On a related note, is there any guarantee in mzML (or mzData
for that matter) that the spectrum IDs or scan numbers are given in
ascending order?  The latter guarantee would at least make the absence of an
index more tolerable when looking for some range of scan numbers.

Thanks,

Matt Chambers

Thread: [Psidev-ms-dev] Indexing in mzML

psidev-ms-dev