Thread: [exprla-devel] Re: [XPL] Oracle and Sun debut "translets" and virtual machine for XSLT
Status: Pre-Alpha
Brought to you by:
xpl2
From: reid_spencer <ras...@re...> - 2002-01-31 09:24:16
|
--- In xpl-dev@y..., Jonathan Burns <saski@w...> wrote: Richard Anthony Hein wrote: > Everyone, > > www.xml.com has an articles about some things we need to be informed > about, > including the foundational infrastructure of the 'net and how XML > makes too > much of a demand on the current infrastructure, and one about > "translets" > and an XSLT virtual machine! Very important to XPL I think! No kidding. That www.xml.com/pub is a very interesting place. I just scanned the St. Laurent and Dodds articles - I get part of them, but more of them makes reference to issues I haven't begun to study. What they're talking about, though, is related to what I've been brooding about while offline. Does XML demand something new in our Web paradigm? Should we expect compiled XSLT to make a real difference? Compiling is what I'm talking about here. There is a persistent interest on the list, in compiling XPL. I share in it, but from a skewed perspective. From one angle, I'm keen on grammars - it's a disappointment for me that EBNF should be set into the foundations, as a means of defining correct source parsing, but ignored as a high-level mechanism for combining XML structures. From another angle... I think that the benefit of compilation will not be transferred easily (if at all) from complex applications resident on single machines, to complex interactions distributed via comms protocols. I think they will show up to a degree on servers that are dealing heavily in XML - but only when a whole lot of related efficiency issues are addressed at the same time. Roughly estimating, in the time my system downloads 1 kilobyte of HTML, the CPU can execute 100 million instructions. That wealth of processing power is employed by my browser, to access local resources like fonts, to render the content as X Windows primitives, and to pass them through to X Windows - which uses more CPU power to get them to the graphics board. Compared with all of that going on, the processing requirements of an XML parser should be marginal. It could be implemented quite inefficiently, and hardly make a dent. Which gives us valuable leeway, for more important requirements. I like it, that we are starting to see XML parsers being written in all the common scripting languages. It means you can choose your own platform-above-the- platform, and XML will be available to you. If you think about it, it's just an extension to CGI - i.e. processing in the interpreted language of your choice, including the generation of HTML output on the fly. I stress: your choice, of conceptual Web lubricant. The downside is, What happens when development efforts for the various scripting languages get out of step with one another? And, What happens when they get out of step with XML tech developments? There is the horrid potential for a Balkanization of the platform-independent platforms - with one crowd of developers rushing in to capitalize on XML-via-Java, while another exploits XML-via-Perl - with the same wheels (hell, with giant chains of interdependencies) being invented on both sides of the divide. Supplementing the chaos with compiled XML-via-C, or -via-i386 machine architecture, brings nothing to the table, except some additional processing speed in the parsing and transformations parts of XML processing - which, on the client side, would hardly be noticed. What about the server side, then? And what about the Internet relay in between? Naturally, I've thought about how that 1K of HTML or XML is The Bottleneck, and about how to pack more value into that 1K. We could compress the text, of course, before transmission, and unpack it on receipt. Or we could tokenize it - encode it into a stream of binary numbers. That would double or triple the content of the average kilobyte. Maybe it's worth doing, but my sense is that a compression stage would be so straightforward, that people will be doing it without advice from me :-) And as for tokenization, that has problems - namespace and addressing problems (i.e. any two processes communicating by numbers must share equivalent lookup tables for what the numbers mean). On the server side, is there enough XML processing going on in one place, that compilation is a significant gain? Maybe - and the Oracle people must think so, if they're excited by compiled XSLT translets. I'm thinking about online transaction processing (OLTP). Here we are in the DB and application services context, surrounded by interface formats - SQL and a thousand COM and CORBA interface schemata. To filter and join and translate among them is relatively easy - but it takes a bit of effort to set up, and probably the effort has to be reinvented system by system to some degree. And above all, the result of the effort is a translation stage which could be a bad bottleneck in a high-transaction-rate pipeline. If the translation stage can be compiled, no more bottleneck. And if it can be compiled automatically, from an XML document set which contains the source and target interfaces in XML form, then no more system-by-system reinvention of the translator. That's the rationale I'm seeing for compilation. I think that's what the translet stuff is about. Does XPL change this context? Or is it changed in this context? There are a lot of factors here, and a huge discussion, of which this post just scataches the surface. For a minute, put yourself in the position of a server system - whether it's raw data you're serving, or personalized interactions. Under your control is an inventory of data, the bulk of it perhaps of the same type, but generally heterogeneous. Your business is to search it, sort it, reformat and rearrange it, pack it up for transmission, unpack it on receipt, and maybe do some calculations on it. You are equipped with XPL, which we'll assume is some extension of XSLT. By default, what you're doing most of, is accessing XPL source (tags, indentations and all) and passing it to an interpreter. The interpreter parses the source, builds a tree structure (parse tree), and sets this tree to work on the data at hand. (Below, I'll split this into a parser stage and a tree-processing stage, and use "interpreter" for the latter.) Some kind of cursor runs up and down the parse tree - as directed by the XML data it's working on - and as a result, cursors run up and down trees of XML data as well, identifying elements and leaves. As a further result, the parse tree elements are activated, causing elements to be added to output trees in process of construction. In some cases, activated parse tree elements will make requests of the native system, e.g. to render the state of processing in a window. But by and large the server system is self-contained, the way that an HTML browser is. The basic rule of economy is, never do the same job three times. That is, if you find yourself doing something for the second time, and you could recognize a third time in advance if you saw it coming - then don't just do the job and forget about it. Instead, cache the results. When the third time comes, just output the cached results. You can save lots of time that way. In the days when CPU time was expensive, this technique was taken to extremes, in respect of processing overhead. The entire range of jobs which an application was to perform was worked out in advance, coded in some language, and pre-translated to machine code. Compilation. Ironically, the art of caching data was an afterthought, effectively done only in major shops. The job that had been automated was the translation from high-level source to machine ocde. It had only to be done once, in advance, never on the fly. Interpreted languages, which repeated and repeated the parsing and machine-code-greneration overhead, were regarded as something less than rocket science. The catch was this: In compilation, information was thrown away. This was partly because memory space was also at a premium. That which actually did the job, raw machine code, contained no labels, no syntactic niceties, no structured programming constructs, and of course no comments. There was no possibility for decompilation into something legible, nor for reflexive operations on the working code. With all this in mind, consider how that server is spending its time, given XPL. I find it plausible to suppose that the server is I/O bound - if its XPL-based software is simplistic. I think it will be spending most of its time queued on communications, with brief periods in which it is queued on local disk access - and eyeblinks, in which it is actually CPU-bound. During the I/O bound intervals, processing will be going on, though not nearly to the capacity of the CPU. There will be CPU time to waste - and it will indeed be wasted. On the other hand, I find it plausible that the server may spend a good deal of time CPU-bound - if it is being fed a steady transaction stream and also its XPL-based software is sophisticated, with sorting and caching and hashing employed to supply the end-use XPL processes with precisely the data which needs to be worked on. In the latter case it makes sense to ask, Is the XPL processing efficient in itself? Or is it throwing away results, and repeating operations needlessly? Well, for one thing there will be a lot of parsing going on, by default, of both XPL code and XML data. That is sensible, if the source text is usually different with each parse; but it is wasteful, if the same source is being parsed repeatedly, just to build the same parse trees over and over. Most of the XPL code will be fixed - and so, its parsing should be done just once, and its parse-trees retained. But most of the XML data will be heterogeneous, selected from all over the place, and some of it will be volatile, i.e. its content will be changing as it is updated, written out, read back in - and re-parsed to updated parse-trees. In that case, there will be benefit in making the parsing of data fast. So let's assume that the parser will be compiled. As I've said in earlier posts, the way to get a fast parser for an EBNF language is to employ some equivalent of Yacc, to produce a recognizer automaton for the XML grammar. The form of the automaton is a lookup table - and looking up tables, and jumping from row to row, are based on a very small primitive set of operations, quite cheap to reimplememnt for multiple platforms. This leaves us pretty much with the hard core of XPL processing - traversal and reconstruction of trees, with a little number-crunching on the side. Is there enough needless reproduction of results, to justify compilation? On the negative side, we have here a process which can be considered a series of little processes, in which an XPL parse-tree is traversed, with the effect that a data tree is also traversed, and an output tree produced. Likely enough, parts of the code tree will be traversed many times - there has to be some equivalent of looping, after all. But also likely, there will not be much needlessly repeated overhead, merely from shifting from node to node of the code tree via links. The fact is, once we have parsed the source and created the code tree, we have more or less compiled the code already. Good compiled code - lean, mean machine code, Real Programmers' code - is a string of primitives translated directly to the machine instruction set, and held together by the brute fact that they follow one another in memory. The minimal overhead of loading up the address of the next instruction is being carried out by the CPU itself, except for loops and calls. Not an instruction is wasted. Good semi-compiled code allows a bit more slack. It is permissable that the next instruction is not hardwired in, but discovered on the fly, by handing a token to a tiny interpreter, or indexing into a lookup table. Finite-state automata are in this class; so are the threaded languages like Forth; and so is Java, with its virtual machine architecture. In the server scenarios I've sketched, we have the slack. To imagine the server being CPU- bound, I had to imagine it being driven to the limits of its I/O by a continuous transaction stream, and its code having been heroically engineered to squeeze out unnecessary repetitions of data fetching. Within reasonable bounds, we can implement our low-level tree processing on whatever little interpreter is appropriate - say the JVM - without accusations flying around that we're wasting CPU power. On the positive side ... Yes, yes, there's a positive side :-) ... The ideal is that our server is spending most of its time traversing trees. That's where the work gets done. To approach the ideal, we need the XML data we're working on to be in tree form. Before even that, we need it to be in memory. (I've just lately been to Tom Bray's Annotated XML 1.0 Spec - an intricately hyperlinked document, backed by a couple thousand lines of Javascript. Tom notes that there's a problem getting the whole document into memory. He suggests the need for a "virtual tree-walking" mechanism, analogous to virtual memory. It's a little scary to consider that one document can occupy several meg of RAM. ) I think - this is vague as yet - that we get the most use of our CPU, if most of our code and data are in tree form, and the tree form is succinct. I see a parsed document as a list of nodes, side by side in memory in tree-traversal order. Each node has addresses of parent, sibs and kiddies, token numbers for each attribute, and the address of a data structure which contains a property definition of its element type - including all values used for each attribute, by every element of its type within the document. I'd guess 20-40 bytes per node, average. With that, we can keep the tree structure of a good many kilonode documents in memory - and stand a fair chance of keeping one kilonode document in a hardware data cache, once we've read it from end to end. CDATA leaves are special. They stand for the actual content, and read that content into memory when requested. They have some extra gear in them, to support hashing and sorting and stuff. XLink leaves are special too. They stand for separate documents and specific nodes in them. Physically, they contain the addresses of proxy elements, which specify whether the document in question is parsed in at present, and if so where it is, and if not, where to find it as a resource. Put the pieces all together, and the picture emerges of our server comprising three major processes: (1) The parser, running on a queue of document requests; compiled to EBNF automaton form, constantly converting XML text to tree form. (2) The interpreter, running on a queue of execution requests; traversing in-memory parse trees, and building new ones; written in JVM code, or something similar. (3) The deparser, converting new parse trees to source form, and flushing them back to disk; probably compiled, because it must maintain the free memory reserve. That's the kind of system I think would keep a server I/O-bound, as it should be, with disk, RAM and CPU running pretty much in harmony. There's more to a good XML prcoessing system than I've described here. For instance, there's a content manager, which accesses and works through a mass of CDATA, searching and sorting - ultimately to return selected CDATA lists to the interpreter. Think of it as our internal search engine. There's need for an XML-based internal file system architecture, which can handle and cache directory searches and such. Without taking those into account, though, I think I see the outlines of an XML system which runs, byte for byte of source text, about as fast as your average C compiler. More important than speed, is correctness. But that's another story. Tata for now Jonathan A client! Okay, you guys start coding, and I'll go and see what they want. --- End forwarded message --- |
From: reid_spencer <ras...@re...> - 2002-01-31 09:24:43
|
--- In xpl-dev@y..., cagle@o... wrote: Just a quick observation. I think we need to qualify what is specifically meant by compilation here, and to note that similar compiled stylesheets exist on the Microsoft side in the form of IXSLProcessor entitites. -- Kurt ----- Original Message ----- From: Jonathan Burns To: xpl@e... Sent: Sunday, June 25, 2000 5:29 AM Subject: Re: [XPL] Oracle and Sun debut "translets" and virtual machine for XSLT Richard Anthony Hein wrote: Everyone, www.xml.com has an articles about some things we need to be informed about, including the foundational infrastructure of the 'net and how XML makes too much of a demand on the current infrastructure, and one about "translets" and an XSLT virtual machine! Very important to XPL I think! No kidding. That www.xml.com/pub is a very interesting place. I just scanned the St. Laurent and Dodds articles - I get part of them, but more of them makes reference to issues I haven't begun to study. What they're talking about, though, is related to what I've been brooding about while offline. Does XML demand something new in our Web paradigm? Should we expect compiled XSLT to make a real difference? Compiling is what I'm talking about here. There is a persistent interest on the list, in compiling XPL. I share in it, but from a skewed perspective. From one angle, I'm keen on grammars - it's a disappointment for me that EBNF should be set into the foundations, as a means of defining correct source parsing, but ignored as a high-level mechanism for combining XML structures. From another angle... I think that the benefit of compilation will not be transferred easily (if at all) from complex applications resident on single machines, to complex interactions distributed via comms protocols. I think they will show up to a degree on servers that are dealing heavily in XML - but only when a whole lot of related efficiency issues are addressed at the same time. Roughly estimating, in the time my system downloads 1 kilobyte of HTML, the CPU can execute 100 million instructions. That wealth of processing power is employed by my browser, to access local resources like fonts, to render the content as X Windows primitives, and to pass them through to X Windows - which uses more CPU power to get them to the graphics board. Compared with all of that going on, the processing requirements of an XML parser should be marginal. It could be implemented quite inefficiently, and hardly make a dent. Which gives us valuable leeway, for more important requirements. I like it, that we are starting to see XML parsers being written in all the common scripting languages. It means you can choose your own platform-above-the- platform, and XML will be available to you. If you think about it, it's just an extension to CGI - i.e. processing in the interpreted language of your choice, including the generation of HTML output on the fly. I stress: your choice, of conceptual Web lubricant. The downside is, What happens when development efforts for the various scripting languages get out of step with one another? And, What happens when they get out of step with XML tech developments? There is the horrid potential for a Balkanization of the platform-independent platforms - with one crowd of developers rushing in to capitalize on XML-via-Java, while another exploits XML-via-Perl - with the same wheels (hell, with giant chains of interdependencies) being invented on both sides of the divide. Supplementing the chaos with compiled XML-via-C, or -via-i386 machine architecture, brings nothing to the table, except some additional processing speed in the parsing and transformations parts of XML processing - which, on the client side, would hardly be noticed. What about the server side, then? And what about the Internet relay in between? Naturally, I've thought about how that 1K of HTML or XML is The Bottleneck, and about how to pack more value into that 1K. We could compress the text, of course, before transmission, and unpack it on receipt. Or we could tokenize it - encode it into a stream of binary numbers. That would double or triple the content of the average kilobyte. Maybe it's worth doing, but my sense is that a compression stage would be so straightforward, that people will be doing it without advice from me :-) And as for tokenization, that has problems - namespace and addressing problems (i.e. any two processes communicating by numbers must share equivalent lookup tables for what the numbers mean). On the server side, is there enough XML processing going on in one place, that compilation is a significant gain? Maybe - and the Oracle people must think so, if they're excited by compiled XSLT translets. I'm thinking about online transaction processing (OLTP). Here we are in the DB and application services context, surrounded by interface formats - SQL and a thousand COM and CORBA interface schemata. To filter and join and translate among them is relatively easy - but it takes a bit of effort to set up, and probably the effort has to be reinvented system by system to some degree. And above all, the result of the effort is a translation stage which could be a bad bottleneck in a high-transaction-rate pipeline. If the translation stage can be compiled, no more bottleneck. And if it can be compiled automatically, from an XML document set which contains the source and target interfaces in XML form, then no more system-by-system reinvention of the translator. That's the rationale I'm seeing for compilation. I think that's what the translet stuff is about. Does XPL change this context? Or is it changed in this context? There are a lot of factors here, and a huge discussion, of which this post just scataches the surface. For a minute, put yourself in the position of a server system - whether it's raw data you're serving, or personalized interactions. Under your control is an inventory of data, the bulk of it perhaps of the same type, but generally heterogeneous. Your business is to search it, sort it, reformat and rearrange it, pack it up for transmission, unpack it on receipt, and maybe do some calculations on it. You are equipped with XPL, which we'll assume is some extension of XSLT. By default, what you're doing most of, is accessing XPL source (tags, indentations and all) and passing it to an interpreter. The interpreter parses the source, builds a tree structure (parse tree), and sets this tree to work on the data at hand. (Below, I'll split this into a parser stage and a tree-processing stage, and use "interpreter" for the latter.) Some kind of cursor runs up and down the parse tree - as directed by the XML data it's working on - and as a result, cursors run up and down trees of XML data as well, identifying elements and leaves. As a further result, the parse tree elements are activated, causing elements to be added to output trees in process of construction. In some cases, activated parse tree elements will make requests of the native system, e.g. to render the state of processing in a window. But by and large the server system is self-contained, the way that an HTML browser is. The basic rule of economy is, never do the same job three times. That is, if you find yourself doing something for the second time, and you could recognize a third time in advance if you saw it coming - then don't just do the job and forget about it. Instead, cache the results. When the third time comes, just output the cached results. You can save lots of time that way. In the days when CPU time was expensive, this technique was taken to extremes, in respect of processing overhead. The entire range of jobs which an application was to perform was worked out in advance, coded in some language, and pre- translated to machine code. Compilation. Ironically, the art of caching data was an afterthought, effectively done only in major shops. The job that had been automated was the translation from high- level source to machine ocde. It had only to be done once, in advance, never on the fly. Interpreted languages, which repeated and repeated the parsing and machine-code-greneration overhead, were regarded as something less than rocket science. The catch was this: In compilation, information was thrown away. This was partly because memory space was also at a premium. That which actually did the job, raw machine code, contained no labels, no syntactic niceties, no structured programming constructs, and of course no comments. There was no possibility for decompilation into something legible, nor for reflexive operations on the working code. With all this in mind, consider how that server is spending its time, given XPL. I find it plausible to suppose that the server is I/O bound - if its XPL-based software is simplistic. I think it will be spending most of its time queued on communications, with brief periods in which it is queued on local disk access - and eyeblinks, in which it is actually CPU-bound. During the I/O bound intervals, processing will be going on, though not nearly to the capacity of the CPU. There will be CPU time to waste - and it will indeed be wasted. On the other hand, I find it plausible that the server may spend a good deal of time CPU-bound - if it is being fed a steady transaction stream and also its XPL-based software is sophisticated, with sorting and caching and hashing employed to supply the end-use XPL processes with precisely the data which needs to be worked on. In the latter case it makes sense to ask, Is the XPL processing efficient in itself? Or is it throwing away results, and repeating operations needlessly? Well, for one thing there will be a lot of parsing going on, by default, of both XPL code and XML data. That is sensible, if the source text is usually different with each parse; but it is wasteful, if the same source is being parsed repeatedly, just to build the same parse trees over and over. Most of the XPL code will be fixed - and so, its parsing should be done just once, and its parse-trees retained. But most of the XML data will be heterogeneous, selected from all over the place, and some of it will be volatile, i.e. its content will be changing as it is updated, written out, read back in - and re-parsed to updated parse-trees. In that case, there will be benefit in making the parsing of data fast. So let's assume that the parser will be compiled. As I've said in earlier posts, the way to get a fast parser for an EBNF language is to employ some equivalent of Yacc, to produce a recognizer automaton for the XML grammar. The form of the automaton is a lookup table - and looking up tables, and jumping from row to row, are based on a very small primitive set of operations, quite cheap to reimplememnt for multiple platforms. This leaves us pretty much with the hard core of XPL processing - traversal and reconstruction of trees, with a little number-crunching on the side. Is there enough needless reproduction of results, to justify compilation? On the negative side, we have here a process which can be considered a series of little processes, in which an XPL parse-tree is traversed, with the effect that a data tree is also traversed, and an output tree produced. Likely enough, parts of the code tree will be traversed many times - there has to be some equivalent of looping, after all. But also likely, there will not be much needlessly repeated overhead, merely from shifting from node to node of the code tree via links. The fact is, once we have parsed the source and created the code tree, we have more or less compiled the code already. Good compiled code - lean, mean machine code, Real Programmers' code - is a string of primitives translated directly to the machine instruction set, and held together by the brute fact that they follow one another in memory. The minimal overhead of loading up the address of the next instruction is being carried out by the CPU itself, except for loops and calls. Not an instruction is wasted. Good semi-compiled code allows a bit more slack. It is permissable that the next instruction is not hardwired in, but discovered on the fly, by handing a token to a tiny interpreter, or indexing into a lookup table. Finite-state automata are in this class; so are the threaded languages like Forth; and so is Java, with its virtual machine architecture. In the server scenarios I've sketched, we have the slack. To imagine the server being CPU- bound, I had to imagine it being driven to the limits of its I/O by a continuous transaction stream, and its code having been heroically engineered to squeeze out unnecessary repetitions of data fetching. Within reasonable bounds, we can implement our low-level tree processing on whatever little interpreter is appropriate - say the JVM - without accusations flying around that we're wasting CPU power. On the positive side ... Yes, yes, there's a positive side :-) ... The ideal is that our server is spending most of its time traversing trees. That's where the work gets done. To approach the ideal, we need the XML data we're working on to be in tree form. Before even that, we need it to be in memory. (I've just lately been to Tom Bray's Annotated XML 1.0 Spec - an intricately hyperlinked document, backed by a couple thousand lines of Javascript. Tom notes that there's a problem getting the whole document into memory. He suggests the need for a "virtual tree-walking" mechanism, analogous to virtual memory. It's a little scary to consider that one document can occupy several meg of RAM. ) I think - this is vague as yet - that we get the most use of our CPU, if most of our code and data are in tree form, and the tree form is succinct. I see a parsed document as a list of nodes, side by side in memory in tree- traversal order. Each node has addresses of parent, sibs and kiddies, token numbers for each attribute, and the address of a data structure which contains a property definition of its element type - including all values used for each attribute, by every element of its type within the document. I'd guess 20-40 bytes per node, average. With that, we can keep the tree structure of a good many kilonode documents in memory - and stand a fair chance of keeping one kilonode document in a hardware data cache, once we've read it from end to end. CDATA leaves are special. They stand for the actual content, and read that content into memory when requested. They have some extra gear in them, to support hashing and sorting and stuff. XLink leaves are special too. They stand for separate documents and specific nodes in them. Physically, they contain the addresses of proxy elements, which specify whether the document in question is parsed in at present, and if so where it is, and if not, where to find it as a resource. Put the pieces all together, and the picture emerges of our server comprising three major processes: (1) The parser, running on a queue of document requests; compiled to EBNF automaton form, constantly converting XML text to tree form. (2) The interpreter, running on a queue of execution requests; traversing in-memory parse trees, and building new ones; written in JVM code, or something similar. (3) The deparser, converting new parse trees to source form, and flushing them back to disk; probably compiled, because it must maintain the free memory reserve. That's the kind of system I think would keep a server I/O-bound, as it should be, with disk, RAM and CPU running pretty much in harmony. There's more to a good XML prcoessing system than I've described here. For instance, there's a content manager, which accesses and works through a mass of CDATA, searching and sorting - ultimately to return selected CDATA lists to the interpreter. Think of it as our internal search engine. There's need for an XML-based internal file system architecture, which can handle and cache directory searches and such. Without taking those into account, though, I think I see the outlines of an XML system which runs, byte for byte of source text, about as fast as your average C compiler. More important than speed, is correctness. But that's another story. Tata for now Jonathan A client! Okay, you guys start coding, and I'll go and see what they want. ---------------------------------------------------------------------- -------- ---------------------------------------------------------------------- -------- To unsubscribe from this group, send an email to: xpl-unsubscribe@o... --- End forwarded message --- |
From: reid_spencer <ras...@re...> - 2002-01-31 09:25:00
|
--- In xpl-dev@y..., Jonathan Burns <saski@w...> wrote: cagle@o... wrote: Just a quick observation. I think we need to qualify what is specifically meant by compilation here, and to note that similar compiled stylesheets exist on the Microsoft side in the form of IXSLProcessor entitites. You're right. It's the first time I've pushed the argument right through, in my own understanding. You're a writer, you understand :-) What's implicit in my exposition is that the usual idea of oompilation falls apart, into separate connotations, in the context of XML development. Here's the definition - and I'll have to stick to it, because it's what most readers will understand by it: Compilation is the process which translates a definition of a process, expressed in a human-readable source syntax, to a series of instructions in a machine code architecture which actually carries out the process. Do we want compilation for XPL, then? NO WAY! People go to all this trouble to define a platform-independent syntax for XML - and we propose to give it a machine-dependent semantics? We'd have to be nuts. But, we still want speed, and memory economy. So we're bound to propose something like compilation, but machine-independent. There are two dimensions along which we can modify the strict definition. (1) We can define a virtual machine architecture, which is similar to actual machine architectures, and translate to that. Loosely speaking, we can "compile to JVM bytecode", for example. Problem solved - provided we include a JVM as part of the XPL environment. (2) We can include under the heading of compilation, correctly, translation to a list of indirectly-expressed instructions, which is executed in traversal. Strictly speaking, this is what we do in (1). But the broadened definition includes executables such as Forth - subroutine-threaded code, expressed as a list of subroutine addresses, with embedded machine code for a small set of primitives. (2a) And if we can do that, then why can't we traverse a tree structure of indirectly- expressed instructions, in memory, in the same form as parsed XML data trees? It's only a degree more abstract than (2). Just where along the line, the mechanism departs from the reader's understanding of compilation, is a matter of the reader's background. Instead of saying "compilation", we should be saying "parsing" for translation of source (e.g. paths and templates) to logical tree structure; "realization" or perhaps "encoding" for implementation of the trees as instructions on one of the models above; and "execution" for the actual transform process. This may all be clearer once I've researched SAX, and your XPipes. Jonathan --- End forwarded message --- |
From: reid_spencer <ras...@re...> - 2002-01-31 09:25:15
|
--- In xpl-dev@y..., cagle@o... wrote: Jonathon, I was basically stumbling tired yesterday, or I would have responded more cogently on this. I personally don't think you'll see "one" language emerge under the rubrick of XPL. Rather, I sense that we're talking about a methodology for creating languages within the constraints of an XML environment, and that, just as there are multiple procedural languages that have already filled their respective niches (you wouldn't program low level system components with Visual Basic, nor would you write high level "Business Logic" with C++), that we'll see analogous low and highl level XML based languages, many based upon some variation of XSLT. The way I'm handling XPipes (which I really hope to get up to the VBXML site sometime this week), is that it is essentially an uncompiled language that is then processed into XSLT. I liken it to the Java model --> the original source code is first compiled into the java bytecode, which is then interpreted using the Java virtual machine into a binary representation. XPipes works on a similar premise: XPL Precompiled code --> XSLT Raw code --> XSLT Compiled code. The difference between these is that whereas the Java VM is an event driven message loop environment (is stateful), XPipes is stateless and is driven by the movement of streams. That's not to say that you couldn't create an even driven version of this in a client environment (a stream of XML triggers a series of events within the XPipes message loop which in turn cascade into responses), but the model assumes that most of the code there is still XSLT specific. One of the goals that I have for XPipes and similar XPL languages is that they serve as testbeds or catalysts for the XSLT2 developments. XSLT is Turing complete, but its also a language with some gaping holes. It's scoping model is rather skewed, you have to create multiple recursive instances of for-each loops to handle indexed-for expressions, it really does need regular expressions as an integral part of XPath (why they took it out is beyond me, that was the first thing to make sense in XSLT for me in a long time). The binding between XSLT and XML Schema needs to be written. There needs to be a much tighter story for module deployment standards (otherwise we have this proliferation of procedural scripting languages to handle the shortfall, reducing interoperability and adding to the general headache of developers). I would agree with you, though, on the notion that compilation in and of itself for any XSLT language is a local rather than a global thing. I view an XSLT stylesheet as a filter which takes one or more incoming XML streams and creates zero or more outgoing XML streams, though it may in the process perform some side effect that is the actual desired result (much as a function may take information to blit an image to a screen and then return an error code as a result -- the error code is very much secondary to the blitting, but the blitting is effectively just a side effect). In short, there is a fairly high degree of correspondance between an XSLT stylesheet and a compiled function. Making the jump to the next level of abstraction -- between a stylesheet and a component -- is a more sophisticated process, but certainly doable; however, in this case you're effectively talking about the "methods" potentially spanning more than one computer, as would the component itself. In this case, the methods may be compiled, even though the component itself is most certainly not (it in fact exists not as a discrete entity but rather a pattern of actions). I think we're going to find this to be the case with a number of elements that have traditionally been compiled -- the compilation process creates a tightly bound entity, whereas XML tends to create decoupled systems, much more loosely bound than is traditional with procedural languages. This will in turn force us to reconsider out paradigms, and look more closely at the nature of programming across distributed systems. I think the results will be much more organic and self-organizing than procedural programming will, but I'm not a hundred percent sure of this. -- Kurt Cagle ----- Original Message ----- From: Jonathan Burns To: xpl@e... Sent: Sunday, June 25, 2000 4:57 PM Subject: Re: [XPL] Oracle and Sun debut "translets" and virtual machine for XSLT cagle@o... wrote: Just a quick observation. I think we need to qualify what is specifically meant by compilation here, and to note that similar compiled stylesheets exist on the Microsoft side in the form of IXSLProcessor entitites. You're right. It's the first time I've pushed the argument right through, in my own understanding. You're a writer, you understand :-) What's implicit in my exposition is that the usual idea of oompilation falls apart, into separate connotations, in the context of XML development. Here's the definition - and I'll have to stick to it, because it's what most readers will understand by it: Compilation is the process which translates a definition of a process, expressed in a human-readable source syntax, to a series of instructions in a machine code architecture which actually carries out the process. Do we want compilation for XPL, then? NO WAY! People go to all this trouble to define a platform-independent syntax for XML - and we propose to give it a machine-dependent semantics? We'd have to be nuts. But, we still want speed, and memory economy. So we're bound to propose something like compilation, but machine-independent. There are two dimensions along which we can modify the strict definition. (1) We can define a virtual machine architecture, which is similar to actual machine architectures, and translate to that. Loosely speaking, we can "compile to JVM bytecode", for example. Problem solved - provided we include a JVM as part of the XPL environment. (2) We can include under the heading of compilation, correctly, translation to a list of indirectly-expressed instructions, which is executed in traversal. Strictly speaking, this is what we do in (1). But the broadened definition includes executables such as Forth - subroutine-threaded code, expressed as a list of subroutine addresses, with embedded machine code for a small set of primitives. (2a) And if we can do that, then why can't we traverse a tree structure of indirectly- expressed instructions, in memory, in the same form as parsed XML data trees? It's only a degree more abstract than (2). Just where along the line, the mechanism departs from the reader's understanding of compilation, is a matter of the reader's background. Instead of saying "compilation", we should be saying "parsing" for translation of source (e.g. paths and templates) to logical tree structure; "realization" or perhaps "encoding" for implementation of the trees as instructions on one of the models above; and "execution" for the actual transform process. This may all be clearer once I've researched SAX, and your XPipes. Jonathan To unsubscribe from this group, send an email to: xpl-unsubscribe@o... --- End forwarded message --- |