From: Chris A. <ch...@ma...> - 2007-10-09 17:25:36
|
Mike Coleman wrote: > I can see why having a 'count' might make it easier for novice > programmers to *write* a processing program, but I cannot see why > having a 'count' would make more than a negligible difference in > performance, if even that. As a worst case, one could read the mzML > file into memory, scan it once to calculate the count, and then > proceed as before. The additional time required to do a sweep through > RAM would be trivial. Isn't one of the features of mzML to store raw scan data? If so I imagine it wouldn't be long before users were generating multi-GB files (even possibly with just peak lists) that: (i) Won't map into the 32bit address space limits of the OS; (ii) Or if you're either using 64bit or else mapping chunks, you'll hit i/o and paging issues as the file will have to be read twice (once for the scan and again for the parser) unless you have a huge amount of RAM of course. Not to mention that the source of the data might not support stream positioning anyway (eg. compressed stream) or which was simply passed as an open stream handle to your program/library and you can't reopen it so you only have one shot. Regards, Chris |