From: Mike C. <tu...@gm...> - 2007-10-09 19:06:30
|
I knew I was going to regret that (over-)simplification. Okay, so in reality I would never actually read the file twice--that's just easier to describe than something more realistic. Just off the top of my head, I would do roughly what C++ std::vector's (or Python lists, etc.) do in terms of memory allocation. This lets you read in a single pass, and uses memory in proportion to what is actually needed. (There are ways to deal with fragmentation as well, but that's *way* outside the bounds of what the mzML spec should care about.) Also worth noting, in my not-so-humble opinion: (a) for general computation, 32-bit hardware is dead, and (b) if you don't have enough RAM to comfortably hold single mzML files, you probably should just buy more. Mike On 10/9/07, Chris Allen <ch...@ma...> wrote: > > Mike Coleman wrote: > > I can see why having a 'count' might make it easier for novice > > programmers to *write* a processing program, but I cannot see why > > having a 'count' would make more than a negligible difference in > > performance, if even that. As a worst case, one could read the mzML > > file into memory, scan it once to calculate the count, and then > > proceed as before. The additional time required to do a sweep through > > RAM would be trivial. > > Isn't one of the features of mzML to store raw scan data? If so I > imagine it wouldn't be long before users were generating multi-GB files > (even possibly with just peak lists) that: > > (i) Won't map into the 32bit address space limits of the OS; > > (ii) Or if you're either using 64bit or else mapping chunks, you'll hit > i/o and paging issues as the file will have to be read twice (once for > the scan and again for the parser) unless you have a huge amount of RAM > of course. > > Not to mention that the source of the data might not support stream > positioning anyway (eg. compressed stream) or which was simply passed as > an open stream handle to your program/library and you can't reopen it so > you only have one shot. > > Regards, > Chris > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |