Thread: [Psidev-ms-dev] mzML 0.99.0 comments

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

My comments on mzML0.99.0 after reading (most of) the posts on the 
mailing list and trying to convert a peak list into the format are as 
follows:

The standard is composed of a schema with little control and a lot of 
cvParams that are controlled by a separate file. Updates to the CV does 
not require schema updates, and the CV rules file should also be stable. 
For the validation of files it would, as pointed out by several people, 
be straightforward to automate generate an XSD which reflect the current 
CV. Otherwise the semantic Java validator also does the job (and also 
have other benefits when it comes to large files). For us it doesn't 
matter which method is used, but the real issue is how to handle 
versions of the CV. As long as nothing is deleted from the CV everything 
should be fine from an implementation point of view though.

A major problem would be if something is added to the CV which breaks 
current parsers. A new compression type could be added to the CV without 
notice, and if someone is using that compression type they're producing 
standard compliant files, but parsers that are supposed to be standard 
compliant would not be able to parse the file correctly. So, there are a 
few places where I think the allowed values should be set under enum 
constraints in the main standard schema, so that a new schema version is 
enforced if these fields are changed. I have the feeling that CV version 
will not be as controlled as the schema version. Fields that I propose 
should be enums are (this is maybe one step back again...):

In binaryDataArray:

compressionType (no compression/zlib compression)
valueType (32-bit float, 64-bit float, 16-bit integer, 32-bit integer or 
64-bit integer)

In spectrum:

spectrumType (centroid, profile).

these parameters could be attributes or cvParams (but under schema 
control) if CV accession numbers are important.

Other comments:

There is also an acquisitionList spectrumType attribute which probably 
could be removed since we have spectrumDescription - 
spectrumRepresentation (spectrumType). Only use would be if the 
acquisitions were in profile mode but the peak picking algorithm that 
worked on the spectra turned them into a centroid peak list and one 
would like to specify this (?).

If the spectrum is a combination of multiple scans (as specified using 
acquistionList) one would normally not use the 'scan' element. The 
question is then how to give the retention time? We did not succeed in 
doing this in a valid way, see 
http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/FF_070504_MSMS_5B.mzML 

for a simple (but invalid way of doing it). More correct would be to put 
the cvParam under the acquisition with the retention time, but this is 
not allowed either.

Why not allow softwareParam to be userParam or cvParam or must all 
software that work on mzML be in the CV?

How about having precursor m/z, intensity and charge state as 
non-required attributes to ionSelection? These fields are really used in 
every file.

Final comment is though that all these things are really minor, and that 
getting the standard released is what matters!

Regards

Fredrik

Thread: [Psidev-ms-dev] mzML 0.99.0 comments

psidev-ms-dev