From: Joshua T. <jt...@sy...> - 2007-06-19 19:42:14
|
Hi Mike, Not to through any coals on the fire, but just contributing my experience as a software developer here in the Aebersold Lab (ISB): All of our mzXML processing software (the TPP, etc) relies on pre-computed indexes. Yes, it's a matter of trust, but we find it much more efficient to calculate this index once, at the time the mzXML file is created. As Eric has mentioned, in practice we don't find many (any?) errors with this, and the mzXML files actually store a checksum of themselves up to the index, which can be used to give some assurance that the index data is appropriate (I'm sure xml purists are groaning, but it works.) You may be coming from a much more stringent background, such as trying to provide regulatory compliance. As far as I can tell from today's discussions, the index will be optional, and your more stringent programs will be free to generate index-less files, ignore previously generated indexes, rewrite them to a new file, or recalcuate them yourself in your own programs. I still don't think text files are the best way to store large binary arrays (versus, for example, the netCDF format), but we've found xml with indexes to be a reasonable and useful compromise for keeping all the data in one human-readable file. Hope this helps, Josh Mike Coleman wrote: > On 6/19/07, Eric Deutsch <ede...@sy...> wrote: >> - While index/data mismatch is a potential source of problem, it has >> been our experience that problems are rare and the benefits huge. > > Just to be clear, I'm not arguing against indexing in general (which > would be silly), but rather just questioning whether it makes sense to > include indices in (or alongside) mzML files. > >>From a programming perspective, this seems like an implementation > detail. One can imagine that many consumers of these files either > have no use for an index or else are easily capable of simply > generating an index of their own. Furthermore, applications will > often have more information about the specific sort of index that > would be best. > > As you note, if an index is included in the mzML file it can be > checked for sanity. And, in fact, proper engineering requires this. > If a program generates the index itself, it can afford to be somewhat > trusting, but if the index is generated elsewhere, it really needs to > be quite paranoid, which requires extra code. > > So the worry would be that this feature, which is intended to make > life simpler, might end up actually making things more difficult for > implementers (both producers and consumers) and bloat the mzML files > to boot. > > Mike > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |