From: Wilfred H T. <Ta...@ap...> - 2008-08-27 05:58:31
|
* When m/z vs. intensity data is written out in profile mode, it is pretty common to see a LARGE majority of the intensities to be zero. Given the preponderance of zero intensities, a space-efficient way to write the data out would be to specify a point spacing in the m/z dimension and then write out a (m/z, intensity) pair only if the intensity is non-zero. (Call this method 1.) The alternative, less space-efficient way would be to write out all of the (m/z, intensity) data pairs even though most of them have zero intensities and hence are not all that interesting. (Call this method 2.) For method 1 to work well, there must be a way to specify a m/z point spacing. Is there a way to do this currently? Furthermore, the program reading in the mzML must understand that the m/z point spacing implicitly requires reconstruction of all the zero-intensity data pairs; otherwise, for example, a mass spectrum plot would look funny. A further complication for method 1 is that the m/z point spacing may not necessarily be a constant. For example, for the AB/Sciex QSTAR instrument, the m/z spacing is proportional to the square root of m/z, and this is a natural consequence of this being a TOF instrument. * The <scanWindow> element should accept <referenceableParamGroupRef> as a subelement. Banning <userParam> as a subelement should not lead to also banning <referenceableParamGroupRef> as a subelement. * The validator expects elements to appear in a certain order. This is due to the usage of xs:sequence in the XSD file. All deviations from the specified order are marked as errors, and I don't think that this is really the desired behavior. There's nothing intrinsic to XML that makes restricting order desirable, and in most cases for mzML, there is absolutely nothing to be gained by restricting order. * The validator doesn't appear to recognize <userParam> at all - i.e., any time <userParam> is put into the mzML, the validator gives an error. This may possibly be related to the previous point, but I tried putting <userParam> in all possible locations, and nothing seemed to work. * For the <sourceFile> element, the cvParam mapping rule "MUST supply a *child* term of MS:1000561 (data file checksum type) one or more times" should be deleted. The checksum of the SOURCE data file seems to be completely irrelevant. * There is a mistake somewhere in the rules regarding the specification of mass analyzer. There are numerous instrument types that have multiple mass analyzers, but the validator rejects any instrument that contains more than one mass analyzer. Currently, only one <analyzer> subelement is allowed under <componentList>, and the <analyzer> element is only allowed to have one child mass analyzer type CV term. * There are serious problems with the CV terms under scan-->scanning method and spectrum-->spectrum type. There is partial duplication of analogous terms (example of analagous terms: "SIM spectrum" <===> "selected ion monitoring") between the two categories. While it's not clear that the duplication of analogous terms is desirable, as pointed out in a previous email thread, in any case, there should be either no duplication or full duplication; partial duplication is obviously flat-out wrong. Is it possible to devise a plan to resolve this? I don't think it will take too much time and effort to work things out, but the current state of these CV terms is unworkable. * Somewhat related to the previous point, I suggest that the CV terms "full scan" and "zoom scan" under scan-->scanning method be re-named. The reason for doing a zoom scan is to scan more slowly over a smaller, zoomed m/z range (without sacrificing time) in order to obtained a spectrum with improved resolution. A name more descriptive of the purpose would be improved resolution or enhanced resolution. "Full scan" does not convey all that much information, so it could actually be removed. Thanks, Wilfred |