The explanations added for @xml:space still need work. I'm afraid the author was laboring under misunderstandings about @xml:space comparable to the ones I had on my first of many forays into the whitespace forest.
The three bulleted points are untrue. And the advice about using "preserve" in transcriptions, if taken literally, is sure to mislead.
Some clarifications, reminders, principles:
- XML defines only "preserve" and "default" for @xml:space. It does not have, for example, xsd's "collapse" or "replace."
- "preserve", "collapse", "normalize", and "trim" are different.
- XML does not define "default". If an XML processor's handling of "default" is predictable it is for some other reason -- convention, quirks everyone knows about, programming culture, standard settings, some other spec, whatever.
- There is no such thing as a whitespace character. There are whitespace characters (plural).
Yes, the behavior of XML processors is generally predictable, but the expected behavior is not to PRESERVE but to COLLAPSE whitespace. In text nodes, XML considers whitespace significant, but not white space characters. So processors will generally COLLAPSE "carriage return - tab - space - space - space" and treat that string as if it had been one space character. To PRESERVE that sequence means to retain all five characters as is. (REPLACE is to convert it into five space characters.)
PRESERVE and COLLAPSE are also different from TRIM and NORMALIZE. Whether the processor will trim -- an that's the crucial and tricky piece in mixed-content documents -- is less predictable. A stock XSL transformation will, but one designed for mixed-content documents might not. But again, no processor will by default preserve -- collapse yes, trim probably (to frequent disappointment) , preserve definitely not.
Bullet 3 and the list example:
It is not true that XML generally assumes whitespace between elements is insignificant. By default XSL must preserve whitespace nodes as significant. To reverse this, the programmer must insert <xsl:strip-space elements="*"/>. Many, many XSL programmers have never dealt with mixed-content XML and very, very few with a hybrid like TEI, where mixed-content and structured vary element by element. Most XSL programmers are trained to work in corporate IT departments where structured data is all they will ever see. They will insert the global strip-space command because that's what they've always done and forget why or even that it is there.
So a lot of XML programming culture assumes whitespace between elements is insignificant, but the programming tools by default assume the exact opposite.
It is wrong to say that "not all processors can detect [the significance of inter-element whitespace] reliably." Processors will treat the whitespace exactly as told. There is no detecting for the processor to do. If <xsl:strip-space> (or some other schema-communicating instruction) tells them to strip space, they will. Otherwise they won't. Authors of vocabularies are responsible for telling processors which elements are mixed-content and which are structured. (Sebastian posted this once, but I don't believe it became a standard part of a release.)
TEI makes such communication difficult, because some elements are mixed-content, some are structured, and -- to make it worse -- some are defined as mixed-content yet treated, even in the Guidelines, as structured. The document will appear conformant but then get corrupted downstream when the consumers assume the element was used as spec'd. (An error like this occurred to one TEI user who contacted me after they corrupted a whole batch of TEI files and didn't realize it soon enough and couldn't even figure out what went wrong.)
The case of transcription:
This sounds incorrect and a trap for the unwary. PRESERVE means "do not REMOVE whitespace characters." It says nothing about INSERTING whitespace between elements where no whitespace exists -- and it shouldn't need to. No consumer of an XML file should ever do that. That would add a node to the tree. Bad. Maybe some application does that as part of a suite of tools, but then that application is not an XML editor. It's an XML changer.
So there should be no need to add 'preserve' just to stop downsteam apps from inserting whitespace nodes. But also, adding 'preserve' to a DIV tells processors to retain every space character, tab, and carriage return anywhere in that DIV. It's hard to imagine a scenario where that is intended.
Anyway, anyway -- the attempt to better explain @xml:space, I'm afraid, makes matters worse. Whitespace handling in XML is difficult enough. In TEI it's even more so. The Guidelines need to be super accurate.
Whoever gets stuck with writing a short bit for that passage really needs to understand, I think, everything it took me so long to figure out and that I put in http://wiki.tei-c.org/index.php/XML_Whitespace -- as well as all that is in the external resources linked there, including the discussion about @xml:space on xml-dev, where some really top XML experts got challenged with how to apply xml:space in an architecture like TEI's.
I don't think the new passage needs just a few tweaks. If the subtleties of @xml:space can trip up someone as experienced as whoever wrote that passage, then general readers need specific, careful, and accurate guidance indeed. I think the passage needs substantive reworking.