Markup pretty-printer Wiki

OmniMark library to safely indent markup for SGML or XML serialization

Status: Beta

Brought to you by: omnimark-code

Pretty-printing

So far we have taken care only of the SGML and XML parsers' needs, making sure they would like the result. What can we do for human readers?

In the case of SGML, there are many little tweaks we could apply to the code and make the output more condensed. We could leave out the quotes around token-valued attributes, for example, or shorten the end-tags to </>. It is questionable, however, if these shortcut improve or impede the readability of the markup, and the answer mostly depends on the readers' personal tastes and habits. We'll steer clear of these reefs.

There is one thing, however, that we can change and that is (almost) universally acknowledged as a readability improvement: adding whitespace. A glance at the following two XML fragments, equivalent except for the whitespace, should be a sufficient demonstration of its utility:

<ul><li><p>So that was that.</p></li><li><p>And now...</p><p>something
<em>completely<sup>1</sup></em> different.</p></li></ul>

<ul>
  <li>
    <p>So that was that.</p>
  </li>
  <li>
    <p>And now...</p>
    <p>something
<em>completely<sup>1</sup></em> different.</p>
  </li>
</ul>

All we've done above is add newlines and indentation. More specifically, a newline and indentation precedes every element tag in any context where they are ignorable, i.e., where only elements are allowed. We cannot add whitespace to mixed content, like around the <sup> element, because it could change the meaning of the content. OmniMark's validating XML parser strips away all ignorable whitespace, so the two XML fragments above are identical as far as your markup rules are concerned.

We can now sprinkle some whitespace-emitting code into our XML serializer:

domain-bound global string indentation initial { "%n" }
domain-bound global string indentation-increment
 
element #implied
   output indentation
      when content of parent is element
   output "<%q"
   using group "translate attributes"
   repeat over specified attributes as a
      output " " || key of attribute a || '="%v(a)"'
   again
   output "/"
      when content is empty-tag
   do
      save indentation
 
      set indentation to indentation || indentation-increment
      output ">%c"
   done
   do unless content is empty-tag
      output indentation
         when content is element & children != 0
      output "</%q>"
   done

This new pretty-printing rule is similar to the XML serializer implemented earlier, except it keeps track and outputs the indentation. Similar modifications would need to be made to the markup-comment, processing-instruction, and other rules as well.

Instead of maintaining these two slightly different ways of producing XML output, we can have the pretty-printer delegate the XML serialization to the omxmlwrite library, and insert the whitespace into the markup stream just before it reaches the serializer:

import "omxmlwrite.xmd" prefixed by xml.
 
export string source function
   pretty-printed-xml value markup source s
as
   output xml.written from indented s

The function indented is independent of the serializer. All it does is insert the extra whitespace indentation into the markup stream:

domain-bound global string indentation initial { "%n" }
domain-bound global string indentation-increment
 
export markup source function
   indented value markup source s
         by value string        increment optional initial { "  " }
as
   save indentation-increment
 
   set indentation-increment to increment
   do markup-parse s
      output "%c"
   done
 
element #implied
   output indentation
      when content of parent is element
   signal throw #markup-start #current-markup-event
   do
      save indentation
      set indentation to indentation || indentation-increment
      output "%c"
   done
   output indentation
      when content is element & children != 0
   signal throw #markup-end #current-markup-event

Apart from avoiding the code duplication, this way of organizing the code is more flexible. We can, for example, use the function indented with the omsgmlwrite library instead of omxmlwrite, without any change to its implementation.