From: Ian B. <ia...@co...> - 2001-04-10 07:35:28
Attachments:
wvHtml.xml
|
Gee whiz, all this back and forth. The number of lines of code written on this subject compared to the number of lines that will actually be used... Anyway, I was thinking about the idea I proposed earlier, with structured text. So I implemented something based on it. I haven't documented it, but I'd be interested on what people think of it in theory. There are a couple levels. The parsing of the document is done without any particular concept of how it will be later used. The parsing is vaguely general -- a number of trivial syntaxes could be used (like {}, @[], <% %>, <tmpl:something></tmpl:something>, <!--#BeginEditable "something"--><!--#EndEditable-->) Which reminds me -- does anyone know what WYSIWYG tools use for templates natively? I know what DreamWeaver does. I can't figure out if FrontPage has templates at all (!?). Adobe GoLive? Others? Immitating these templates should be easier and cleaner than complicated template languages like ZPT (ZPL?) or such. Anyway, these are all fairly shallow generalisms. The real concept: whatever the syntax, they all map to a structured form of the text. The structure is made up of nodes, each of which has a name (which is basically a free-form string). The nodes can contain further nodes and strings. There is a distinction between nodes which have empty contents, and those which have no contents at all. So, for instance: Test doc {name1:}{name 2}text{name3:}{end name3}{end name1} doc finished Will map to: head: constant: 'Test doc\012' node 'name1': node 'name 2': None constant: 'text' node 'name3': [] # no contents, but not "None" contents constant: '\012docfinished' This is the base class for a simple template system (just substitution), and a slightly more complicated template system (that has if and for). But this basic data structure is the most important part. It's almost like XML, except for the distinction between empty containers ({name3:}{end}) and non-containers ({name 2}). But, unlike XML, it's entirely orthogonal to HTML, XHTML, or even XML. I think this is advantageous -- you don't need well-formed XML, you don't need to quote the constant output, and it works much better in WYSIWYG editers. Just for the curious, a *real* XML-based HTML generation file is attached, taken from wvWare. You'll notice how hard it is to read. I don't know how much anyone was sold on the beauty of XML, but... I'm less than impressed. Or at least if you want something simple and general, as opposed to complicated and specific (like XSL), it ends up looking pretty ugly. The template systems use this structured text like so (but this is only my immediate convention -- this part of the implementation is isolated from the earlier parts): If a tag is not a container, it is meant to be substituted. Container tags *define* values, while non-containers *ask for* values. To have values substituted, you provide an evaluator object. There's one method it uses -- .eval(name) -- which returns an object to be inserted. The name can be just about any string that shows up in the {} delimiters (or whatever other delimiters you might use). This could be something for use with NamedValueAccess, or it could even be evaluated as Python code -- whatever you want. It's all up to the evaluator implementation. Unlike subclassing to deal with this evaluation, it's easy to compose different templates with different evaluators. This allows easy feeding of information. In particular, I've made a evaluator that uses NamedValueAccess, and another evaluator that uses the defined values in another template (i.e., one document is the framework, and the other is the content). You could easily do something that composed a few of these -- like searching the field namespace, then the cookie namespace, then a default namespace, etc. You could do aquisition and do DTML (or at least an XML version of DTML). It can't do everything by any means, but it's a decently general. General enough that you can do things you probably shouldn't do. The template can also parse the name some. The template system that does for and if checks for these particular constructs, and does special things based on them. Anyway, this is a slightly different take on templates, I think. You can't optimize it like you can optimize straight substitution, but that's always a cost of more generality. Opinions? Problems this doesn't solve (but you think templates should solve)? Gee, what a long email I've written. Thanks for reading to the bottom. Ian |
From: Chuck E. <ec...@mi...> - 2001-04-27 16:44:20
|
At 02:35 AM 4/10/2001 -0500, Ian Bicking wrote: >Opinions? Problems this doesn't solve (but you think templates should >solve)? > >Gee, what a long email I've written. Thanks for reading to the >bottom. Hey, I just read to the bottom today. :-) Looks like an interesting idea. Have you looked at Webware.WebUtils.HTMLTag? It just so happens that it uses the same structure you described. I wrote it for the purpose of parsing HTML into a structured format that I could then inspect. This comes in handy in automated regression testing. Looks like we have some implementation overlap. If you read the doc strings for HTMLTag, you'll see that it has issues with "nebulous" tags like <p> which sometimes are closed (</p>) and sometimes not. Although I do have a proposed solution mentioned in there. Can you take a look at this and tell me if you had the same problem and how you dealt with it? -Chuck |
From: - 2001-04-27 18:06:46
|
On Fri, Apr 27, 2001 at 12:41:36PM -0400, Chuck Esterbrook wrote: > If you read the doc strings for HTMLTag, you'll see that it has issues with Where is HTMLTag? I'm not finding it in rc3, under WebUtils or anywhere. -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@ji...) http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Chuck E. <ec...@mi...> - 2001-04-27 18:45:38
|
At 11:04 AM 4/27/2001 -0700, SIZE=3797 wrote: >On Fri, Apr 27, 2001 at 12:41:36PM -0400, Chuck Esterbrook wrote: > > If you read the doc strings for HTMLTag, you'll see that it has issues > with > >Where is HTMLTag? I'm not finding it in rc3, under WebUtils or anywhere. > >-- >-Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@ji...) > http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol I see it: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/Webware/Webware/WebUtils/?sortby=date CVS -Chuck |
From: Ian B. <ia...@co...> - 2001-04-27 18:20:04
|
Chuck Esterbrook <ec...@mi...> wrote: > Looks like an interesting idea. Have you looked at > Webware.WebUtils.HTMLTag? It just so happens that it uses the same > structure you described. I wrote it for the purpose of parsing HTML into a > structured format that I could then inspect. This comes in handy in > automated regression testing. Why, it is indeed nearly the same thing. Too bad I didn't find it earlier. Slightly (but just barely) different interface, and I don't have the searching methods (but could probably copy your methods over with few changes). > Looks like we have some implementation overlap. > > If you read the doc strings for HTMLTag, you'll see that it has issues with > "nebulous" tags like <p> which sometimes are closed (</p>) and sometimes > not. Although I do have a proposed solution mentioned in there. I've done it so that certain tags can't nest inside each other. So if <li> ... <li> is encountered, the parser sees that the enclosing tag would be <li>, and then ends that tag and starts a new <li> over. Ditto with <p>, <th>, and <dt> -- maybe there's others. Then all tags end implicitly if their enclosing tag ends, so that <ol> <li> </ol> will close <li>. In an earlier version with a more monolithic parser, I had <li> look back for previous unclosed <li> elements, until it found <ol> or <ul>. This way <li> would implicitly end any dangling tags... in particular: <ol> <li> Some text. <p> Another paragraph <li> Some more text. </ol> Would become: <ol> <li> Some text. <p> Another paragraph </p></li><li> Some more text. </li></ol> But with my present method this would become: <ol> <li> Some text. <p> <li> Some more text. </li></p></li></ol> Which isn't as correct. That's a bummer. But it would make the code much more dirty right now to do this -- maybe I should anyway, though. Ian |
From: Chuck E. <ec...@mi...> - 2001-04-27 18:38:03
|
At 01:20 PM 4/27/2001 -0500, Ian Bicking wrote: >Why, it is indeed nearly the same thing. Too bad I didn't find it >earlier. Slightly (but just barely) different interface, and I don't >have the searching methods (but could probably copy your methods over >with few changes). The searching methods still need a lot of enhancement so the code will change in the future. Let's avoid the "copy" reusability technique. How about one or more of inheritance, mix-ins, wrappers, delegates and callbacks? HTMLTag already has a focus on parsing, and a test suite and doc strings. Could you subclass it? -Chuck |
From: Mike O. <ir...@ms...> - 2001-04-27 19:05:38
|
On Fri, Apr 27, 2001 at 02:35:19PM -0400, Chuck Esterbrook wrote: > The searching methods Speaking of searching, what are people using as a search engine for their site? Are there any plans for a SearchKit? :) -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@ji...) http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol |
From: Chuck E. <ec...@mi...> - 2001-04-27 19:14:35
|
At 12:05 PM 4/27/2001 -0700, Mike Orr wrote: >On Fri, Apr 27, 2001 at 02:35:19PM -0400, Chuck Esterbrook wrote: > > The searching methods > >Speaking of searching, what are people using as a search engine for >their site? Are there any plans for a SearchKit? :) I use http://www.atomz.com/. It's not perfect, but it has plenty of features and is easy to use. Obviously, depending on your site it may or may not fit. In those cases that it has fit, I really liked it. I'm sure SearchKit will show up some day. htDig might be a place to start for ideas and examples. I think Zope would probably have some interesting search ideas possibly both good and bad. -Chuck |
From: Chuck E. <ec...@mi...> - 2001-04-27 18:30:11
|
At 01:20 PM 4/27/2001 -0500, Ian Bicking wrote: >I've done it so that certain tags can't nest inside each other. So if ><li> ... <li> is encountered, the parser sees that the enclosing tag >would be <li>, and then ends that tag and starts a new <li> over. >Ditto with <p>, <th>, and <dt> -- maybe there's others. Then all tags >end implicitly if their enclosing tag ends, so that <ol> <li> </ol> >will close <li>. > >In an earlier version with a more monolithic parser, I had <li> look >back for previous unclosed <li> elements, until it found <ol> or ><ul>. This way <li> would implicitly end any dangling tags... in >particular: > ><ol> > <li> Some text. <p> > Another paragraph > <li> Some more text. ></ol> What about the idea I describe in HTMLTag? For various tags like <p> you could specify "abrupt terminators" such as p, li, table, etc. This wouldn't require structural changes to HTMLTag. It would be tedious to create the specification of abrupt terminators, but once created, that should take care of most cases. HTMLTag would still barf on unbalanced tags, but I think that's OK. -Chuck |
From: Ian B. <ia...@co...> - 2001-04-27 18:44:14
|
Chuck Esterbrook <ec...@mi...> wrote: > What about the idea I describe in HTMLTag? For various tags like <p> you > could specify "abrupt terminators" such as p, li, table, etc. This wouldn't > require structural changes to HTMLTag. > > It would be tedious to create the specification of abrupt terminators, but > once created, that should take care of most cases. Well, I guess this is what the HTML DTD really implies. There are certain things on the paragraph level, and I don't think they can be nested -- like <blockquote> implies a paragraph break, for instance. Or, rather, they can be nested but it's redundant -- after you took out the redundancies you might get what you are looking for. This is similar to what I was thinking of with looking up the parse tree and closing everything you find until you find a real enclosing element. I don't know enough about SGML to really understand how this works and how HTML is defined. > HTMLTag would still barf on unbalanced tags, but I think that's OK. I suppose so -- I'd just as soon make it permissive, but there's flaws with that too. Ian |
From: Chuck E. <ec...@mi...> - 2001-04-27 18:56:59
|
At 01:44 PM 4/27/2001 -0500, Ian Bicking wrote: >I don't know enough about SGML to really understand how this works and >how HTML is defined. I feel I have enough intuition with HTML to know this technique will work. I even suspect it used by browsers. But there's one way to find out for sure. :-) > > HTMLTag would still barf on unbalanced tags, but I think that's OK. > >I suppose so -- I'd just as soon make it permissive, but there's flaws >with that too. In both cases, abrupt termination and unbalanced tags, I suggest we make that configurable behavior in HTMLTag. -Chuck |
From: Ian B. <ia...@co...> - 2001-04-27 19:25:34
|
Chuck Esterbrook <ec...@mi...> wrote: > At 01:44 PM 4/27/2001 -0500, Ian Bicking wrote: > >I don't know enough about SGML to really understand how this works and > >how HTML is defined. > > I feel I have enough intuition with HTML to know this technique will work. > I even suspect it used by browsers. I think browsers use all sorts of heuristics, probably in part for speed. I certainly wouldn't want to copy Netscape's (4 and below) heuristics for interpreting HTML, for instance. Look at this page... it presents a slightly more concrete notion of "content models" and "parent models" (which are also noted exlicitly for each tag in other pages). http://www.blooberry.com/indexdot/html/tagpages/shorthands.htm I still feel this is approaching what the DTD defines formally and completely. But I just looked at the DTD <http://www.w3.org/TR/html4/sgml/dtd.html>, and it wasn't very clear. One way of doing abrupt termination is simply that all block-level tags (defined formally) end all inline tags. But there's more to it than that. Ian |