From the point of view of a mere serializer, SGML is a very different beast from XML. A parsed SGML document cannot be exactly reproduced. That is true of XML to a lesser extent, as we have already seen. With SGML, however, there are many more things that parser either elides from the raw input or inserts where they should have been:
There are many shortcuts available to an SGML document author that are invisible on the other side of the parser: the element
<greeting>Hello, World!</greeting>
for example, could also be written as
<greeting>Hello, World!</>
or even
<greeting/Hello, World!/
For the reasons listed above and others beside, the generated SGML will by necessity be heavily normalized and XML-like. Our main goal will be to make it valid.
There are four additional considerations in element tag formatting compared to the XML case:
empty
or if a conref
attribute is specified, the end tag must not be output. cdata
, instead of translating the content we merely make sure it contains no </
delimiters. The resulting element
rule has grown quite a bit:
element #implied output "<%q" using group "translate attributes" repeat over specified attributes as a output " " || key of attribute a || '="%v(a)"' again output ">" output "<!USEMAP #EMPTY>" unless usemap is (#empty | #none) output "%n" when content is element do when content is cdata repeat scan "%zc" match any ++ => rest lookahead "</" output rest match "</" not-reached message "A CDATA element content cannot contain the string '</'" again else output "%c" done output "</%q>" unless content is (empty | conref) output "%n" when content of parent is element