|
From: Matthew C. <mat...@va...> - 2007-10-04 17:05:45
|
I'll comment here on the mzML schema and validation of mzML instances. I do not see why a proper XML schema with semantic significance could not be generated for mzML. XML schema have the capability to provide robust restrictions on both elements and attributes, and such a schema could be automatically generated from the CV itself (when combined with a skeleton model of mzML). Some people complain that mzML is not true XML. That's rather misleading. Others say it needs a special "semantic" validator with its own mapping file. I say that is duplicative and even overkill. Existing schema technology can handle the format specified here, but I grant that the schema WILL have to be very complicated (you won't just have a single cvParam type or ParamGroupType, each part of the schema will have its own cvParam elements with semantically relevant restrictions on the accession numbers) and almost certainly should be machine-generated. I see nothing wrong with a complicated schema though, because the variety of data that we are intending to represent is also very complicated! I don't know if existing automatic code generators work for very complicated schema, but the automatic XML validators definitely should and thus the need for a separate "semantic" validator is unclear to me when the semantic relationships can be encapsulated in an automatically generated XML schema. For example, the <contact> element could be defined semantically in XML schema like this: <xs:complexType name="ContactParamGroupType"> <xs:sequence> <xs:element name="paramGroupRef" type="dx:ContactParamGroupRefType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000586"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact name"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="xs:string"/> </xs:complexType> </xs:element> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000587"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact address"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="xs:string"/> </xs:complexType> </xs:element> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000588"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact URL"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="xs:anyURI"/> </xs:complexType> </xs:element> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000589"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact email"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="dx:email"/> </xs:complexType> </xs:element> <xs:element name="userParam" type="dx:UserParamType" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="contact" type="dx:ContactParamGroupType" minOccurs="0" maxOccurs="unbounded"/> Like I said, this needs to be machine generated, but it would create a XML schema that removes the need for any other kind of semantic mapping and any new tool to do the validation with that mapping. Now that I think about it again, this kind of often-updated schema would violate the unchangedness requirement from the specification: "It was hoped that the actual xsd schema could remain stable for many years while the accompanying controlled vocabulary could be frequently updated to support new technologies, instruments, and methods of acquiring data." But what is the different between a frequently updated mapping file which is REQUIRED to get semantic validation, and a frequently updated primary schema which is REQUIRED to get semantic validation? -Matt Lennart Martens wrote: > That mapping file is effectively in use by our mzML semantic validator, > for exactly the reasons you outlined above! > So yes - this has been made available in the larger mzML kit and has > also been implemented online (your above example indeed does not validate). > > |