So far we have taken care only of the SGML and XML parsers' needs, making sure they would like the result. What can we do for human readers?
In the case of SGML, there are many little tweaks we could apply to the code and make the output more condensed. We could leave out the quotes around token-valued attributes, for example, or shorten the end-tags to </>
. It is questionable, however, if these shortcut improve or impede the readability of the markup, and the answer mostly depends on the readers' personal tastes and habits. We'll steer clear of these reefs.
There is one thing, however, that we can change and that is (almost) universally acknowledged as a readability improvement: adding whitespace. A glance at the following two XML fragments, equivalent except for the whitespace, should be a sufficient demonstration of its utility:
<ul><li><p>So that was that.</p></li><li><p>And now...</p><p>something <em>completely<sup>1</sup></em> different.</p></li></ul> <ul> <li> <p>So that was that.</p> </li> <li> <p>And now...</p> <p>something <em>completely<sup>1</sup></em> different.</p> </li> </ul>
All we've done above is add newlines and indentation. More specifically, a newline and indentation precedes every element tag in any context where they are ignorable, i.e., where only elements are allowed. We cannot add whitespace to mixed content, like around the <sup>
element, because it could change the meaning of the content. OmniMark's validating XML parser strips away all ignorable whitespace, so the two XML fragments above are identical as far as your markup rules are concerned.
We can now sprinkle some whitespace-emitting code into our XML serializer:
domain-bound global string indentation initial { "%n" } domain-bound global string indentation-increment element #implied output indentation when content of parent is element output "<%q" using group "translate attributes" repeat over specified attributes as a output " " || key of attribute a || '="%v(a)"' again output "/" when content is empty-tag do save indentation set indentation to indentation || indentation-increment output ">%c" done do unless content is empty-tag output indentation when content is element & children != 0 output "</%q>" done
This new pretty-printing rule is similar to the XML serializer implemented earlier, except it keeps track and outputs the indentation
. Similar modifications would need to be made to the markup-comment
, processing-instruction
, and other rules as well.
Instead of maintaining these two slightly different ways of producing XML output, we can have the pretty-printer delegate the XML serialization to the omxmlwrite
library, and insert the whitespace into the markup stream just before it reaches the serializer:
import "omxmlwrite.xmd" prefixed by xml. export string source function pretty-printed-xml value markup source s as output xml.written from indented s
The function indented
is independent of the serializer. All it does is insert the extra whitespace indentation into the markup stream:
domain-bound global string indentation initial { "%n" } domain-bound global string indentation-increment export markup source function indented value markup source s by value string increment optional initial { " " } as save indentation-increment set indentation-increment to increment do markup-parse s output "%c" done element #implied output indentation when content of parent is element signal throw #markup-start #current-markup-event do save indentation set indentation to indentation || indentation-increment output "%c" done output indentation when content is element & children != 0 signal throw #markup-end #current-markup-event
Apart from avoiding the code duplication, this way of organizing the code is more flexible. We can, for example, use the function indented
with the omsgmlwrite
library instead of omxmlwrite
, without any change to its implementation.