Re: [Psidev-ms-dev] mzML 0.99 remarks

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Splendid, we appear to be reaching a conclusion, I tally:

- Brian votes to keep
- Angel votes to keep
- Marc votes to keep
- David votes to keep
- Eric votes to keep

- Matt is neutral
- ChrisA is neutral

- Mike does not want them

- everyone else abstains

The ayes have it. The schema stays as is wrt count attributes.

Thank you!
Eric

> -----Original Message-----
> From: psi...@li...
[mailto:psidev-ms-dev-
> bo...@li...] On Behalf Of Brian Pratt
> Sent: Tuesday, October 09, 2007 12:34 PM
> To: 'Mass spectrometry standard development'
> Subject: Re: [Psidev-ms-dev] mzML 0.99 remarks
>=20
> As to performance implications of heap fragmentation, have a look at
> http://www.microquill.com/ - they sell a nice heap replacement library
> that
> can have an impressive impact on program performance without any code
> changes just by managing the heap more intelligently (I've used it,
its
> for
> real).  But if you can't have a clever heap manager then you have to
be
> clever in how you manage the heap.
>=20
> >> I would do roughly what C++ std::vector's (or Python lists, etc.)
do
>=20
> I expect you are referring to the way std::vector initially allocates
room
> for, say, up to 10 items, then when that turns out to be not enough
they
> reallocate for 20, then 40, 80, 160, ..., 655360, 1310720,... - but
> consider
> also std::vector's reserve() method, which is a great illustration of
the
> usefulness of the count.  It allows you to declare the *expected*
final
> size
> of the collection without demanding it be the *actual* final size.  It
> preallocates enough memory to accommodate the addition of up to n
elements
> to the vector before any reallocation takes place, and heap
fragmentation
> is
> thus avoided along with a great many copy constructor executions
(which
> engender even more heapfrag, probably).  If an n+1'th element is
added,
> reallocation takes place and performance isn't what it could be, but
the
> program still runs without error.  So it's a risk-free and very simple
way
> to use the count info.
>=20
> If your collection class of choice doesn't have some means of
exploiting a
> hint about the expected size of the collection, well, no harm done.
> Anyone
> who is not using robust collection classes and is thus susceptible to
> running off the end of an array allocated based on the declared count
is
> working harder than they need to.
>=20
> But Angel is right, it's fun to trade tips and tricks but we should
just
> vote... I vote keep 'em.
>=20
> - Brian
>=20
> -----Original Message-----
> From: psi...@li...
> [mailto:psi...@li...] On Behalf Of Mike
> Coleman
> Sent: Tuesday, October 09, 2007 12:07 PM
> To: Mass spectrometry standard development
> Subject: Re: [Psidev-ms-dev] mzML 0.99 remarks
>=20
> I knew I was going to regret that (over-)simplification.  Okay, so in
> reality I would never actually read the file twice--that's just easier
> to describe than something more realistic.  Just off the top of my
> head, I would do roughly what C++ std::vector's (or Python lists,
> etc.) do in terms of memory allocation.  This lets you read in a
> single pass, and uses memory in proportion to what is actually needed.
>  (There are ways to deal with fragmentation as well, but that's *way*
> outside the bounds of what the mzML spec should care about.)
>=20
> Also worth noting, in my not-so-humble opinion: (a) for general
> computation, 32-bit hardware is dead, and (b) if you don't have enough
> RAM to comfortably hold single mzML files, you probably should just
> buy more.
>=20
> Mike
>=20
>=20
> On 10/9/07, Chris Allen <ch...@ma...> wrote:
> >
> > Mike Coleman wrote:
> > > I can see why having a 'count' might make it easier for novice
> > > programmers to *write* a processing program, but I cannot see why
> > > having a 'count' would make more than a negligible difference in
> > > performance, if even that.  As a worst case, one could read the
mzML
> > > file into memory, scan it once to calculate the count, and then
> > > proceed as before.  The additional time required to do a sweep
through
> > > RAM would be trivial.
> >
> > Isn't one of the features of mzML to store raw scan data?  If so I
> > imagine it wouldn't be long before users were generating multi-GB
files
> > (even possibly with just peak lists) that:
> >
> > (i) Won't map into the 32bit address space limits of the OS;
> >
> > (ii) Or if you're either using 64bit or else mapping chunks, you'll
hit
> > i/o and paging issues as the file will have to be read twice (once
for
> > the scan and again for the parser) unless you have a huge amount of
RAM
> > of course.
> >
> > Not to mention that the source of the data might not support stream
> > positioning anyway (eg. compressed stream) or which was simply
passed as
> > an open stream handle to your program/library and you can't reopen
it so
> > you only have one shot.
> >
> > Regards,
> > Chris
> >
> >
> >
------------------------------------------------------------------------
> -
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a
browser.
> > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > _______________________________________________
> > Psidev-ms-dev mailing list
> > Psi...@li...
> > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
> >
>=20
>
------------------------------------------------------------------------
-
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a
browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>=20
>=20
>
------------------------------------------------------------------------
-
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a
browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev