From: Mike C. <tu...@gm...> - 2007-06-19 15:30:17
|
On 6/19/07, Matthew Chambers <mat...@va...> wrote: > On a related note, is there any guarantee in mzML (or mzData > for that matter) that the spectrum IDs or scan numbers are given in > ascending order? This is a good question. I haven't read the spec closely, but if the answer isn't in there, it ought to be. Along those lines, are IDs and scan numbers even guaranteed to be unique within a file? (I hope the answer will be "yes".) > But one thing I've missed a lot in mzData (even though I think it's a > better format because of the flat spectra list) is an index to quickly > access a given scan number. I'm torn on this myself. On the one hand, adding *any* redundant information seems to go against the basic idea of just representing the experimental data. On the other hand, it *would* make some operations more convenient. Random access reads become easier, altering the file becomes harder, and something like XSLT transformations probably become impossible (I'm not an XSLT fan anyway). One point to consider: do we think that all of the various producers (and transformers) of these files will be capable of producing correct (bug-free) indices? If they're not *always* correct, or if you have to validate the file before you trust it, you're basically having to recreate the index anyway. If that's so, maybe it should just be left out of the mzML file altogether. It looks like indices are currently stored in a separate, optional file. This seems like a good compromise. It's worth noting that these arguments also apply to the other redundant information in the file (counts and checksums, for example). I wouldn't mind seeing those also moved to a separate file. If they're left in, maybe something should be said about what's supposed to happen when the redundant information is inconsistent. Mike |