From: Matthew C. <mat...@va...> - 2009-07-20 16:39:42
|
Hi all, As we discussed at least week's call, there's some debate about the value of having a separate version attribute. There seem to be two sides to the debate: 1) Asserts that the attribute is not redundant with the schemaLocation because the schemaLocation is "optional" and the version attribute can be "required." Thus, this side proposes to keep the attribute and enforce its value based on the current schema version. Parsers are expected to determine the version from this attribute. This side asserts that the schema can be determined by parsing the version attribute. 2) Asserts that the attribute is redundant with the schemaLocation because we version the schema in the xsd's filename. Although it's true that the schemaLocation is "optional", without a schema to validate against, there is by definition no concept of "optional" and "required" in XML. Without a schema or DTD, the only concept that can be validated is well-formedness. Thus, this side proposes to deprecate the attribute and make its value officially meaningless, possibly removing the attribute in a future schema revision. Parsers are expected to determine the version by parsing it from the schemaLocation; note that simply parsing a schemaLocation string does not require downloading the schema. This side of the debate asserts that the schemaLocation attribute must be present so that the schema can be known without making assumptions. The only weakness to this approach that I can think of is that someone could produce mzML documents with different schemaLocations like: schemaLocation="http://psi.hupo.org/ms/mzml http://mysite.org/xsds/mzML/1.1.0/mzML.xsd". We can prevent that in the documentation by mandating that the schemaLocation take the "standard" format: schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd" So in my analysis it comes down to two choices: either we require the version attribute to take a certain form in the standard specification document, or we require the schemaLocation attribute to take a certain form in the standard specification document. In either case, it doesn't make sense to put this requirement into the schema itself. -Matt |
From: Marc S. <st...@in...> - 2009-07-21 04:25:52
|
Hi all, I vote for keeping the schema attribute and forcing its value with a regular expression My reasons for that choice are: - a file is valid and usable without a specified schema location. If the version attribute is given, the parser can assess if it can read the file and can even validate the file against a cached schema. - even if a schema is given in approach 2, it might reference to a local schema without the version string in the name. => version 1 is the only one that ensures that the version information is available. -Marc > 1) Asserts that the attribute is not redundant with the schemaLocation > because the schemaLocation is "optional" and the version attribute can > be "required." Thus, this side proposes to keep the attribute and > enforce its value based on the current schema version. Parsers are > expected to determine the version from this attribute. This side asserts > that the schema can be determined by parsing the version attribute. > > 2) Asserts that the attribute is redundant with the schemaLocation > because we version the schema in the xsd's filename. Although it's true > that the schemaLocation is "optional", without a schema to validate > against, there is by definition no concept of "optional" and "required" > in XML. Without a schema or DTD, the only concept that can be validated > is well-formedness. Thus, this side proposes to deprecate the attribute > and make its value officially meaningless, possibly removing the > attribute in a future schema revision. Parsers are expected to determine > the version by parsing it from the schemaLocation; note that simply > parsing a schemaLocation string does not require downloading the schema. > This side of the debate asserts that the schemaLocation attribute must > be present so that the schema can be known without making assumptions. > The only weakness to this approach that I can think of is that someone > could produce mzML documents with different schemaLocations like: > schemaLocation="http://psi.hupo.org/ms/mzml > http://mysite.org/xsds/mzML/1.1.0/mzML.xsd". > > We can prevent that in the documentation by mandating that the > schemaLocation take the "standard" format: > schemaLocation="http://psi.hupo.org/ms/mzml > http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd" > > So in my analysis it comes down to two choices: either we require the > version attribute to take a certain form in the standard specification > document, or we require the schemaLocation attribute to take a certain > form in the standard specification document. In either case, it doesn't > make sense to put this requirement into the schema itself. > > -Matt > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/Challenge > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Steffen N. <sne...@ip...> - 2009-07-21 05:48:03
|
On Tue, 2009-07-21 at 06:25 +0200, Marc Sturm wrote: > I vote for keeping the schema attribute and forcing its value with a > regular expression I fully agree with Marc (and others) here. Yours, Steffen |
From: Matt C. <mat...@va...> - 2009-07-21 13:54:46
|
I'm voting for #2. Marc, the W3C definition of "valid XML" does not apply to XML files without a specified schema or DTD. And since we don't have a DTD, an mzML document cannot be "valid" without a schema specified. Unless you're talking about validity according to the mzML specification, which is what we're voting on, because right now the specification document doesn't mandate a schema or the version attribute. Consider the following case: <mzML schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.0.0.xsd" version="1.1.0" > According to the W3C, the correct action is to read/validate the file according to the mzML1.0.0 schema no matter what any other attribute says. That is fundamental to XSD semantics. What is the correct behavior in the following case where the version takes an unconventional form? <mzML version="v1.1r0" > Without assuming a schema (which defeats the proposed benefit of having a version attribute), the only way we can try to avoid this in the mzML standard is to say the version attribute must take a certain form in the specification document. And if we must say that, why don't we just say that the schemaLocation XSD filename must always take the form "mzML1.1.0.xsd"? That way people can cache it locally but users can still parse the version easily. And we can get rid of this redundant attribute which unnecessarily deviates from XML/XSD conventions. -Matt Marc Sturm wrote: > Hi all, > > I vote for keeping the schema attribute and forcing its value with a > regular expression > > My reasons for that choice are: > - a file is valid and usable without a specified schema location. If > the version attribute is given, the parser can assess if it can read > the file and can even validate the file against a cached schema. > - even if a schema is given in approach 2, it might reference to a > local schema without the version string in the name. > => version 1 is the only one that ensures that the version information > is available. > > -Marc > > >> 1) Asserts that the attribute is not redundant with the schemaLocation >> because the schemaLocation is "optional" and the version attribute can >> be "required." Thus, this side proposes to keep the attribute and >> enforce its value based on the current schema version. Parsers are >> expected to determine the version from this attribute. This side asserts >> that the schema can be determined by parsing the version attribute. >> >> 2) Asserts that the attribute is redundant with the schemaLocation >> because we version the schema in the xsd's filename. Although it's true >> that the schemaLocation is "optional", without a schema to validate >> against, there is by definition no concept of "optional" and "required" >> in XML. Without a schema or DTD, the only concept that can be validated >> is well-formedness. Thus, this side proposes to deprecate the attribute >> and make its value officially meaningless, possibly removing the >> attribute in a future schema revision. Parsers are expected to determine >> the version by parsing it from the schemaLocation; note that simply >> parsing a schemaLocation string does not require downloading the schema. >> This side of the debate asserts that the schemaLocation attribute must >> be present so that the schema can be known without making assumptions. >> The only weakness to this approach that I can think of is that someone >> could produce mzML documents with different schemaLocations like: >> schemaLocation="http://psi.hupo.org/ms/mzml >> http://mysite.org/xsds/mzML/1.1.0/mzML.xsd". >> >> We can prevent that in the documentation by mandating that the >> schemaLocation take the "standard" format: >> schemaLocation="http://psi.hupo.org/ms/mzml >> http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd" >> >> So in my analysis it comes down to two choices: either we require the >> version attribute to take a certain form in the standard specification >> document, or we require the schemaLocation attribute to take a certain >> form in the standard specification document. In either case, it doesn't >> make sense to put this requirement into the schema itself. >> >> -Matt >> > |
From: Chris A. <ch...@ma...> - 2009-07-21 14:24:45
|
Matthew Chambers wrote: > As we discussed at least week's call, there's some debate about the > value of having a separate version attribute. There seem to be two sides > to the debate: Just out of interest: What's the intention with backwards compatibility policy here? If there are breaking changes to the schema that render previous instance documents no longer valid, anyone using a validating parser (eg. XercesC, MSXML) will struggle to support reading more than one major version without a namespace change since you can only map each unique namespace URI to one xsd file^^, and that mapping has to be given to the parser _before_ it starts reading the document (chicken and egg). Relying only on a version attribute in the document or the "schemaLocation" would require a 2-pass approach, which isn't very practical if you're reading the document from a (forward-only) stream. See also: http://209.85.229.132/search?q=cache%3AVyWUtbQUsFAJ%3Awww.xfront.com%2FVersioning.pdf+%22XML+Schema+Versioning+Best+Practices%22&hl=en&gl=uk ^^ Yes, instance documents can also provide this mapping (via "schemaLocation") but it is only a hint and quite often it's not practical to rely on for an application because it is optional and may point to a schema location that doesn't exist locally (eg. transferring file between platforms) or isn't accessible for some reason. Usually the application will ship with a copy of the schema that it supports and it will want to use that instead. btw, if you choose to encode the full version number in the XSD filename, that potentially means the application requires a copy of each and every version it will support. If the changes are backwards compatible that seems over the top since the newer version will still validate older instance documents. Regards, Chris |
From: Matthew C. <mat...@va...> - 2009-07-21 15:05:21
|
Our version system as I understand it is major.minor.revision. A major or minor change indicates a break in backward compatibility. Thus, 1.0 parsers will not be able to read 1.1 files. Strict 1.1 parsers won't be able to read 1.0 files either, but we have tried to avoid making it very hard to support 1.0 from a 1.1 parser. The chicken and the egg problem occurs in another place in XML: the encoding. A parser must start reading the file before it can know what encoding it is (e.g. ascii, utf8, utf16, etc.). If it's a truly forward-only stream, that does make things difficult, but dealing with the encoding is much harder than dealing with the schema, so forward-only XML readers probably have a way to deal with the latter if they have a way to deal with the former. Like you said, the "schemaLocation" attribute is an optional "hint" - but without a schema, ALL attributes are optional, including the root "version" attribute (not to be confused with the document declaration "version" attribute, another reason not to use it). And the only way to get a schema without the hint from schemaLocation is to get it based on some other hint. Better the official hint than an unofficial one. :) -Matt Chris Allen wrote: > Matthew Chambers wrote: > >> As we discussed at least week's call, there's some debate about the >> value of having a separate version attribute. There seem to be two sides >> to the debate: >> > > Just out of interest: What's the intention with backwards compatibility > policy here? If there are breaking changes to the schema that render > previous instance documents no longer valid, anyone using a validating > parser (eg. XercesC, MSXML) will struggle to support reading more than > one major version without a namespace change since you can only map each > unique namespace URI to one xsd file^^, and that mapping has to be given > to the parser _before_ it starts reading the document (chicken and egg). > Relying only on a version attribute in the document or the > "schemaLocation" would require a 2-pass approach, which isn't very > practical if you're reading the document from a (forward-only) stream. > > See also: > http://209.85.229.132/search?q=cache%3AVyWUtbQUsFAJ%3Awww.xfront.com%2FVersioning.pdf+%22XML+Schema+Versioning+Best+Practices%22&hl=en&gl=uk > > ^^ Yes, instance documents can also provide this mapping (via > "schemaLocation") but it is only a hint and quite often it's not > practical to rely on for an application because it is optional and may > point to a schema location that doesn't exist locally (eg. transferring > file between platforms) or isn't accessible for some reason. Usually > the application will ship with a copy of the schema that it supports and > it will want to use that instead. > > btw, if you choose to encode the full version number in the XSD > filename, that potentially means the application requires a copy of each > and every version it will support. If the changes are backwards > compatible that seems over the top since the newer version will still > validate older instance documents. > > Regards, > Chris > > |
From: Chris A. <ch...@ma...> - 2009-07-21 16:46:37
|
Matthew Chambers wrote: > Our version system as I understand it is major.minor.revision. A major > or minor change indicates a break in backward compatibility. Thus, 1.0 > parsers will not be able to read 1.1 files. Strict 1.1 parsers won't be > able to read 1.0 files either, but we have tried to avoid making it very > hard to support 1.0 from a 1.1 parser. > > The chicken and the egg problem occurs in another place in XML: the > encoding. A parser must start reading the file before it can know what > encoding it is (e.g. ascii, utf8, utf16, etc.). If it's a truly > forward-only stream, that does make things difficult, but dealing with > the encoding is much harder than dealing with the schema, so > forward-only XML readers probably have a way to deal with the latter if > they have a way to deal with the former. Yep, that's true. However most XML parsers take care of the encoding detection issue behind the scenes (eg. using an internal buffer until they see the <?xml?> encoding declaration). My point was simply that if you choose a versioning scheme that will require peering into the file first to figure out which schema to use, then readers will have to implement some kind of similar dual-parse/buffering mechanism themselves which is extra work. > > Like you said, the "schemaLocation" attribute is an optional "hint" - > but without a schema, ALL attributes are optional, including the root > "version" attribute (not to be confused with the document declaration > "version" attribute, another reason not to use it). And the only way to > get a schema without the hint from schemaLocation is to get it based on > some other hint. Better the official hint than an unofficial one. :) OK, but from an application perspective it is more reliable to override the hints in the document and specify the namespace to XSD location mapping yourself (eg. "setExternalSchemaLocation()" in Xerces) because the application knows where its schema copies are located, and that means it can validate documents even if the schema location hint is missing or refers to a location that is not accessible. With the above approach, you also have the option of avoiding the two-pass problem providing that you require that each incompatible version of the schema has a unique namespace URI which you can then map to different XSD files. Regards, Chris |
From: Matthew C. <mat...@va...> - 2009-07-21 16:53:39
|
Regardless of whether the application stores the schema locally or not, it's impossible for the parser to know which schema to use if it doesn't look in the file. The only alternative I can think of is file extension and that is not merely unreliable: it can't feasibly give version information. Bottom line is an mzML consuming application cannot know whether to validate with mzML 1.0 or mzML 1.1 without looking in the file. If strict 1.1 parsers could read 1.0 it would be a different matter (the application could always validate with 1.1) but that is not the case here. -Matt Chris Allen wrote: > OK, but from an application perspective it is more reliable to override > the hints in the document and specify the namespace to XSD location > mapping yourself (eg. "setExternalSchemaLocation()" in Xerces) because > the application knows where its schema copies are located, and that > means it can validate documents even if the schema location hint is > missing or refers to a location that is not accessible. > > With the above approach, you also have the option of avoiding the > two-pass problem providing that you require that each incompatible > version of the schema has a unique namespace URI which you can then map > to different XSD files. > > Regards, > Chris > |
From: Darren K. <da...@pr...> - 2009-07-28 13:41:57
|
Hi all, I just realized that I can't make the meeting this morning. Regarding the version attribute, I still agree with Matt that having an extra version attribute is redundant, and therefore unnecessary. It's not difficult to implement, of course, but I just don't see the point -- implementors will either: 1) ignore the field, or 2) read it, and verify that it matches the name of the xsd filename, reporting error if there is a mismatch Darren |
From: Matthew C. <mat...@va...> - 2009-07-28 14:29:51
|
I can't make the call this morning either unfortunately. What I've tried to impress is that whatever option we choose, the version attribute or the schemaLocation, it can only be a hint, because before you know what schema to use there is no concept of attributes being required. Thus, whatever option we choose, it would be something that needs to be documented in the specification doc. So my recommendation is to not remove the version attribute in 1.1.1 but to document that it is meaningless and the schemaLocation hint should be present with the specified xsd filename. The alternative - forcing the version attribute to be a specific version (and presumably documenting that the schemaLocation is meaningless?) - seems unnecessarily un-xml-ish. Thanks, Matt Darren Kessner wrote: > Hi all, > > I just realized that I can't make the meeting this morning. > > Regarding the version attribute, I still agree with Matt that having > an extra version attribute is redundant, and therefore unnecessary. > > It's not difficult to implement, of course, but I just don't see the > point -- implementors will either: > 1) ignore the field, or > 2) read it, and verify that it matches the name of the xsd filename, > reporting error if there is a mismatch > > > Darren > > |
From: Chris A. <ch...@ma...> - 2009-07-21 17:53:25
|
Matthew Chambers wrote: > Regardless of whether the application stores the schema locally or not, > it's impossible for the parser to know which schema to use if it doesn't > look in the file. The only alternative I can think of is file extension > and that is not merely unreliable: it can't feasibly give version > information. Bottom line is an mzML consuming application cannot know > whether to validate with mzML 1.0 or mzML 1.1 without looking in the > file. If strict 1.1 parsers could read 1.0 it would be a different > matter (the application could always validate with 1.1) but that is not > the case here. Agreed, although it's really only an issue for 1.0 readers at present. If you were to say that going forward any backwards compatibility breakages require a namespace change then 1.1-only (and future) readers wouldn't need to worry about figuring out which schema to use as the XML parser would handle it. Anyway, it's probably a bit late now. Regards, Chris |