From: Eric D. <ede...@sy...> - 2007-10-09 20:01:56
|
Splendid, we appear to be reaching a conclusion, I tally: - Brian votes to keep - Angel votes to keep - Marc votes to keep - David votes to keep - Eric votes to keep - Matt is neutral - ChrisA is neutral - Mike does not want them - everyone else abstains The ayes have it. The schema stays as is wrt count attributes. Thank you! Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Brian Pratt > Sent: Tuesday, October 09, 2007 12:34 PM > To: 'Mass spectrometry standard development' > Subject: Re: [Psidev-ms-dev] mzML 0.99 remarks >=20 > As to performance implications of heap fragmentation, have a look at > http://www.microquill.com/ - they sell a nice heap replacement library > that > can have an impressive impact on program performance without any code > changes just by managing the heap more intelligently (I've used it, its > for > real). But if you can't have a clever heap manager then you have to be > clever in how you manage the heap. >=20 > >> I would do roughly what C++ std::vector's (or Python lists, etc.) do >=20 > I expect you are referring to the way std::vector initially allocates room > for, say, up to 10 items, then when that turns out to be not enough they > reallocate for 20, then 40, 80, 160, ..., 655360, 1310720,... - but > consider > also std::vector's reserve() method, which is a great illustration of the > usefulness of the count. It allows you to declare the *expected* final > size > of the collection without demanding it be the *actual* final size. It > preallocates enough memory to accommodate the addition of up to n elements > to the vector before any reallocation takes place, and heap fragmentation > is > thus avoided along with a great many copy constructor executions (which > engender even more heapfrag, probably). If an n+1'th element is added, > reallocation takes place and performance isn't what it could be, but the > program still runs without error. So it's a risk-free and very simple way > to use the count info. >=20 > If your collection class of choice doesn't have some means of exploiting a > hint about the expected size of the collection, well, no harm done. > Anyone > who is not using robust collection classes and is thus susceptible to > running off the end of an array allocated based on the declared count is > working harder than they need to. >=20 > But Angel is right, it's fun to trade tips and tricks but we should just > vote... I vote keep 'em. >=20 > - Brian >=20 > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Mike > Coleman > Sent: Tuesday, October 09, 2007 12:07 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] mzML 0.99 remarks >=20 > I knew I was going to regret that (over-)simplification. Okay, so in > reality I would never actually read the file twice--that's just easier > to describe than something more realistic. Just off the top of my > head, I would do roughly what C++ std::vector's (or Python lists, > etc.) do in terms of memory allocation. This lets you read in a > single pass, and uses memory in proportion to what is actually needed. > (There are ways to deal with fragmentation as well, but that's *way* > outside the bounds of what the mzML spec should care about.) >=20 > Also worth noting, in my not-so-humble opinion: (a) for general > computation, 32-bit hardware is dead, and (b) if you don't have enough > RAM to comfortably hold single mzML files, you probably should just > buy more. >=20 > Mike >=20 >=20 > On 10/9/07, Chris Allen <ch...@ma...> wrote: > > > > Mike Coleman wrote: > > > I can see why having a 'count' might make it easier for novice > > > programmers to *write* a processing program, but I cannot see why > > > having a 'count' would make more than a negligible difference in > > > performance, if even that. As a worst case, one could read the mzML > > > file into memory, scan it once to calculate the count, and then > > > proceed as before. The additional time required to do a sweep through > > > RAM would be trivial. > > > > Isn't one of the features of mzML to store raw scan data? If so I > > imagine it wouldn't be long before users were generating multi-GB files > > (even possibly with just peak lists) that: > > > > (i) Won't map into the 32bit address space limits of the OS; > > > > (ii) Or if you're either using 64bit or else mapping chunks, you'll hit > > i/o and paging issues as the file will have to be read twice (once for > > the scan and again for the parser) unless you have a huge amount of RAM > > of course. > > > > Not to mention that the source of the data might not support stream > > positioning anyway (eg. compressed stream) or which was simply passed as > > an open stream handle to your program/library and you can't reopen it so > > you only have one shot. > > > > Regards, > > Chris > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >=20 > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >=20 >=20 > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |