Re: [Psidev-ms-dev] mzML 0.99 remarks

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Mike Coleman wrote:
> I can see why having a 'count' might make it easier for novice
> programmers to *write* a processing program, but I cannot see why
> having a 'count' would make more than a negligible difference in
> performance, if even that.  As a worst case, one could read the mzML
> file into memory, scan it once to calculate the count, and then
> proceed as before.  The additional time required to do a sweep through
> RAM would be trivial.

Isn't one of the features of mzML to store raw scan data?  If so I 
imagine it wouldn't be long before users were generating multi-GB files 
(even possibly with just peak lists) that:

(i) Won't map into the 32bit address space limits of the OS;

(ii) Or if you're either using 64bit or else mapping chunks, you'll hit 
i/o and paging issues as the file will have to be read twice (once for 
the scan and again for the parser) unless you have a huge amount of RAM 
of course.

Not to mention that the source of the data might not support stream 
positioning anyway (eg. compressed stream) or which was simply passed as 
an open stream handle to your program/library and you can't reopen it so 
you only have one shot.

Regards,
Chris