From: Wilfred H T. <Ta...@ap...> - 2008-05-25 02:03:17
|
I recently looked over the mzML format draft and have a few comments. Regarding the schema (http://www.sbeams.org/tmp/mzML0.99.12.html): * For the "cvParam Mapping Rules," I suggest that the number of "musts" be decreased significantly (or changed to "mays"). For example, for the element <sourceFile>, there's a rule saying: "MUST supply a *child* term of MS:1000561 (data file checksum type) one or more times." This sort of information does not seem to be critical for downstream processing of mzML files and thus doesn't deserve "must" status. There are quite a few similar examples. * For the elements <fileDescription>, <sourceFileList>, <sourceFile>, etc., would it make sense to change the name to be more general (such as dataSource) to reflect the fact that not all instrument data is stored in files? For example, for the Applied Biosystems|MDS Sciex instruments, some instruments (such as the QSTAR or QTRAP systems) store data in .wiff files, while other instruments (such as the 4700 and 4800 systems) store data in an Oracle database. Regarding the controlled vocabulary ( http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo ): * Under "spectrum"-->"spectrum representation", the only 2 choices are "centroid mass spectrum" and "profile mass spectrum". That doesn't adequately capture the full range of spectrum representations. For example in addition to centroiding, the data could be de-isotoped, smoothed, converted to +1 charge, etc. Also, having just the 2 choices of centroid vs. profile is inconsistent with the software processing options listed under "data transformation"-->"data processing action", where a much wider range of options are listed ("baseline reduction", "charge deconvolution", "deisotoping", etc.). It seems desirable to expand the choices under "spectrum"-->"spectrum representation" to at least be consistent. Alternatively, maybe a slightly different categorization might make sense - something like raw data vs. processed data (full data vs. reduced data). * Under "spectrum"-->"spectrum type", some triple quad-type scans are missing - precursor ion (scan Q1, fixed Q3), neutral loss (scan Q1 and Q3 together with a constant difference between Q1 and Q3). Please accept my apologies in advance if any of these topics have already been discussed/resolved previously, as I am new to this discussion. Thanks, Wilfred |