From: Martin Q. <mqu...@us...> - 2005-04-06 10:12:36
|
Update of /cvsroot/flexml/flexml In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21246 Modified Files: paper.html Log Message: some validating work, not done yet Index: paper.html =================================================================== RCS file: /cvsroot/flexml/flexml/paper.html,v retrieving revision 1.11 retrieving revision 1.12 diff -u -d -r1.11 -r1.12 --- paper.html 12 Feb 2003 02:55:41 -0000 1.11 +++ paper.html 6 Apr 2005 10:12:27 -0000 1.12 @@ -1,4 +1,6 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<?xml version="1.0" encoding="iso-8859-1"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" + "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html> <head> <title>Generating Fast Validating XML Processors</title> @@ -39,34 +41,34 @@ </h6> <h6> - Kristoffer Rose<br> - LIP, ENS-Lyon<br> + Kristoffer Rose<br/> + LIP, ENS-Lyon<br/> <a href="mailto:kri...@de...">kri...@de...</a> </h6> <h6> Abstract </h6> - We present <em>FleXML</em>, a program that generates fast + <p>We present <em>FleXML</em>, a program that generates fast validating XML processors from `self-contained' XML DTDs. It uses the <em>flex</em> (lexical analyser generator) program to translate the DTD into a <em>finite automaton</em> enriched with a stack with the `element context'. This means that the XML processor will act directly on each character received. The program is freely redistributable and modifyable (under GNU - `copyleft'). + `copyleft').</p> <h6> Keywords </h6> - Validating XML, DTD, lexical analysis, finite automata. + <p>Validating XML, DTD, lexical analysis, finite automata.</p> <h4> Overview </h4> - The `X' of XML stands for <em>Extensible</em> [<cite><a + <p>The `X' of XML stands for <em>Extensible</em> [<cite><a href="#XML">XML</a></cite>]. This signifies that each and every XML document specifies in its header the details of the format that it will use and <em>may</em> change its format a bit relative - to the used base format. + to the used base format.</p> <p> This has influenced the tools available to write validating XML processors: they use a <em>call-back</em> model where the XML @@ -76,13 +78,13 @@ extending its own notation with more tags and attributes. For <em>well-formed</em> but non-validated XML documents this makes a lot of sense, of course, but we would in general like to - exploit the information in the DTD for optimizations. + exploit the information in the DTD for optimizations.</p> <p> In particular, for many applications a <em>fixed</em> format suffices in the sense that a single DTD is used without individual extensions for a large number of documents. In that case we can do much better because the possible tags and - attributes are static. + attributes are static.</p> <p> We have implemented an XML processor <em>generator</em> using the <cite><a href="#Flex">Flex</a></cite> scanner generator that @@ -91,7 +93,7 @@ no overhead for XML processing: the generated XML processors read the XML document character by character and can immediately dispatch the actions associated with each element (or reject the - document as invalid). + document as invalid).</p> <p> Furthermore we have devised a simple extension of the C programming language that facilitates the writing of `element @@ -99,18 +101,18 @@ applications. In particular we represent XML attribute values efficiently in C when this is possible, thus avoiding the otherwise ubiquitous conversions between strings and data - values. + values.</p> <p> FleXML is available for free (from <a href="http://flexml.sourceforge.net">SourceForge</a>). In this paper we present FleXML through an elaborated <a href="#what">example</a> and discuss some of the - <a href="#how">technical issues</a>. + <a href="#how">technical issues</a>.</p> <h4><a name="what">What can it do?</a></h4> - Assume that we have an XML document <code>my-joke.xml</code> + <p>Assume that we have an XML document <code>my-joke.xml</code> containing the classical joke <blockquote><code><pre><!DOCTYPE joke SYSTEM "my.dtd"> @@ -130,12 +132,12 @@ </pre></code></blockquote> and, furthermore, we wish to write an XML application for - displaying such messages in an amusing way. + displaying such messages in an amusing way.</p> <p> With FleXML this can be done by creating an `actions file' <code>my-show.act</code> which implements the desired actions for each element. The remainder of this section explains the - contents of such an actions file. + contents of such an actions file.</p> <p> An actions file is itself an XML document which must begin with @@ -144,17 +146,17 @@ </pre></code></blockquote> (the <code>flexml-act.dtd</code> DTD is part of the FleXML - system and is reproduced in the manual page. + system and is reproduced in the manual page.</p> <p> We decide that our application should react to a <code>line</code> element by printing the text inside it, and that it should differentiate between the three possible `type' - attribute values by printing corresponding trailing punctuation. + attribute values by printing corresponding trailing punctuation.</p> <p> This introduces a slight complication, because the attribute values are available when parsing the start tag whereas the element text is not available until we parse the end tag (where - it has been read). + it has been read).</p> <p> This means that we must declare a top-level variable. @@ -165,7 +167,7 @@ Notice how we use <code>CDATA</code> sections to make sure that all characters (including white-space) are passed unmodified to - the C compiler. + the C compiler.</p> <p> With this we can write the action to set it when reading the <code>line</code> start tag as @@ -178,7 +180,7 @@ case {type=punch-line}: terminator = "!!"; break; } ]]></start> -</pre></code></blockquote> +</pre></code></blockquote></p> <p> The idea is that the enumeration attribute <code>type</code> is @@ -194,7 +196,7 @@ setting the attribute; in this example the attribute has a default value so this can never happen, however, we include the choice anyway to prevent the C compiler from issuing warnings - about missing choices in <code>switch</code> statements. + about missing choices in <code>switch</code> statements.</p> <p> With this in place we can write the action for <code></line></code>. Since it prints something, however, @@ -207,7 +209,7 @@ <end tag='line'><![CDATA[ printf("%s%s\n", pcdata, terminator); ]]></end> -</pre></code></blockquote> +</pre></code></blockquote></p> <p> Finally, we will make the application amusing by `displaying' @@ -221,7 +223,7 @@ <start tag='suspense'><![CDATA[ sleep(2); ]]></start> -</pre></code></blockquote> +</pre></code></blockquote></p> <p> That is all; the only remaining thing is to terminate the action |