From: Angel P. <an...@ma...> - 2007-10-09 19:00:58
|
Hi all, I was never arguing against counts for the spectra, only *maybe* against annotations, and it seems that more people than not want them in, so I say keep 'em. In the interest of not diverting effort from more important issues, can we just take a vote and leave it at that? my vote: keep counts -angel On 10/9/07, Matthew Chambers <mat...@va...> wrote: > > We are a bit off topic but this is interesting. :) To really assess the > performance issues here you have to dig deeper than just heap > fragmentation though. Assuming a list to store the SpectrumHeaders and > vectors to store ms and intensities, and without preallocation based on > counts, because of the tree-like nature of mzML, you'd end up with a > memory footprint like: > Spectrum1Header Spectrum1Mz1...P Spectrum1Inten1...P Spectrum2Header > Spectrum2Mz1...P Spectrum2Inten1...P ... SpectrumNHeader > SpectrumNMz1...P SpectrumNInten1...P > > If you preallocated the SpectrumHeaders in the list based on the count > attribute, you'd instead get a footprint like: > Spectrum2Header Spectrum2Header ... SpectrumNHeader Spectrum1Mz1...P > Spectrum1Inten...P ... SpectrumNMz1...P SpectrumNInten1...P > > So you're going to have a tradeoff of fragmentation either way. The > fragmentation in the first case would be worse for quick sequential > access to each SpectrumHeader, but better for accessing the peaks of a > particular spectrum. The fragmentation in the second case would be > better for quick sequential access to each SpectrumHeader, but worse for > accessing the peaks of a particular spectrum. Access to the peaks could > be further improved by storing the Mz and Inten values together (i.e. in > a struct { float mz, inten; } ). This is all incredibly superfluous > though and I doubt this fragmentation has an appreciable performance > impact on data with any kind of density to it. So if you needed > extremely responsive performance on very sparse spectra, you might think > about this stuff, but most of us are far more limited by the sheer > number of peaks. And if extreme responsiveness is your goal, no > conceivable XML format is going to help you! > > -Matt > > Brian Pratt wrote: > > Heap fragmentation has a performance cost that persists past the initial > > allocation(s), since it affects further allocations as well. If it can > be > > avoided with a relatively simple mechanism like this, that's a good > thing. > > > > I started coding in 1977, FWIW. Long enough to learn to prefer the > simple > > solution over the one that requires a gestalt... > > > > To be fair, having done this stuff for a long time isn't really a > predictor > > of me being any good at it, but I get by OK. > > > > - Brian > > > > > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On Behalf Of Mike > > Coleman > > Sent: Tuesday, October 09, 2007 9:21 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] mzML 0.99 remarks > > > > I can see why having a 'count' might make it easier for novice > > programmers to *write* a processing program, but I cannot see why > > having a 'count' would make more than a negligible difference in > > performance, if even that. As a worst case, one could read the mzML > > file into memory, scan it once to calculate the count, and then > > proceed as before. The additional time required to do a sweep through > > RAM would be trivial. > > > > Mike > > > > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |