From: Jeff D. <da...@ma...> - 2001-03-05 17:59:35
|
>Had anybody already thought about coding transform.php completely=20 >in object oriented style ? Here's a bunch of random thoughts I've been having regarding the transform code: Jeffs_hacks-branch has an OO transform.php. Nobody really liked it completely (not even myself). The "new" transform code in 1.3.x is not really very different from the "old" code. This is both good and bad. I think that the transform code could be improved. Turning it into a bona fide parser, as Thomas suggests, would really help in several areas: 1. It would help guarantee correctness of generated HTML. 2. It would make extensions to the markup such as Reini's <code></code> blocks cleaner to implement. (I'll refrain from commenting on whether <code> blocks are a good idea.) 3. I've starting thinking about how to colorize (or otherwise) mark the transformed text to highlight diffs. It's not easy to fit a colorizing scheme into the current transform code --- all the ways I've thought of to do this would surely not receive the Arno seal of code simplicity. If done correctly a two stage transformation --- parsing then output would allow this to be done in a much cleaner way. I think that any such new parser should probably operate in two steps. First it should parse the "in-line" markup elements (bold, italic, links) --- then the "block-level" elements (paragraphs, lists, tables, <code> blocks). Perhaps this can all be handled in one parse step; but the distinction between the two types of mark-up needs to be made pretty clear for the sake of correct HTML generation, among other things. (I'm thinking perhaps that the inline markup should be handled by adding the ability to mark regions of text as having specific "flavors". Flavors would include bold, italic, link (I think), as well as things like 'deleted', 'added', 'modified-add', 'modified-del'.) (This is all brain-storming, don't take it too seriously.) ---- I'm not sure that the ability to generate other mark-up (TeX or whatever) is of great importance, but turning transform into a proper parser would certainly make that easier as well. ---- To change topics slightly, a personal peeve I have with the current markup, is this thing about having to put entire paragraphs (& list items, etc...) on one line. Having been raised on troff and then TeX, those really-long lines just drive me batty. Looking at the textarea in the edit page on my browser, it's impossible to tell whether there's a real \n or not between lines. I often find myself manually deleting all spaces from the ends of lines to make sure there is no \n in there. To confuse matters more , for plain paragraphs, the requirement that the paragraph be on one line is silently waived --- currently, it is still enforced for list items (& tables). I think it would be a good idea to make it so that all lines which do not begin with some sort of block type mark-up (e.g. a '*', '#', ';', '|', or space) are interpreted as continuations of the preceding line. (The only reason I see for not making this change is that it will break existing pages.) Ie. A sentence. Another. * Item. More item. Should be interpreted as <p>A sentence. Another.</p> <ul> <li>Item. More item. </ul> instead of <p>A sentence. Another.</p> <ul><li>Item.</ul> <p>More item.</p> (I don't see any reason why italicization and boldizization shouldn't be able to span these continuation lines as well.) E.g.: ;:''Here are some followup comments. I really don't like this at all. --Jeff'' should work. ------- Comments? Jeff |
From: Jeff D. <da...@da...> - 2001-03-05 19:33:03
|
>I doubt that you will be able to ensure HTML4 compliance. I'm not sure I understand this comment. Why not? Surely it's not impossible. Whether it's worth the effort or not is (as always) open to debate. >As far as I have tested out, "correct" markup will produce correct HTML. >Why deal with stuff like double-intended lists without a single list first? >Wrong markup produces wrong HTML - simple enough. ';:' is often used to introduce a "block-indented paragraph", not a "list item." There's no reason ';;:' shouldn't produce a block paragraph with more indentation --- and no reason it should have to be nested within another list. (Note that even though it produced "invalid" HTML, this construct worked, at least with my browser.) Why not fix things (especially, since, in this case, the fix was easy) so that this generates correct HTML? And I see your point, that if given garbage, one shouldn't worry if one outputs garbage. On the other hand, if it's not too hard, why not output valid garbage? (Perhaps one could even try to flag the offending garbage to help the user guess what he did wrong.) >>(Paragraphs all on one line...) >I think this is a frequent complaint. Maybe we should deal with this. >I'm not sure if it's a good idea, but maybe (if modifications are small) >we could add this as option? Yes, I think it's simple to implement --- a minor modification to the current transform code. (Except maybe for tables -- I have to think about that a bit more.) Just as an aside, I think proliferation of options (particularly one like this which doesn't really change the functionality of PhpWiki at all) should be avoided. Having numerous options makes testing harder and also complicates wiki-administrations and plug-and-playability. Another idea (though I think it's more trouble than it's worth) is to auto-convert the old pages when restoring (from a zip- or serialized- dump) to a new version of PhpWiki. Currently, the old-style tab-delimited lists, and triple-quote bold are unsupported in the 1.3.x branch. This could be a way to deal with that as well. >> (I don't see any reason why italicization and boldizization shouldn't be >> able to span these continuation lines as well.) > >Because, "errors" are then contained to one line? The thing I really like abou > t >wiki markup is, that no matter what I (or others!!) do on line 3, line 4 will >always be shown as I have meant it to be displayed. Furthermore, currently if >there's only one '' in the line, italics won't even be opened. It needs an ope > n >and close tag to be recognized by the regexp. Yes, but it's a common wiki-idiom to italicize an entire paragraph. Currently, the paragraph is supposed to be on one line (though it may span many lines in the textarea display). I'm just suggesting that if we relax the restriction that the paragraph must be on one line, we should similarly relax the restriction for italics. (The italics markup must still be contained within the paragraph, but since the paragraph can be split across lines, so can the italicization.) Just to reiterate a point from the my last message, since I think I've found a better way to say it: I think the current transform code generally marks-up the inline and block-level elements in the wrong order. The block-level markup should be processed first, then the in-line markup should be processed on a block-by-block basis. Jeff |
From: <ho...@sb...> - 2001-03-05 18:49:44
|
Jeff, my net connection is down and maybe will remain so for another week (aaarrggh). I cannot send emails from my regular address, so I can't post to phpwiki-talk. Could you please forward this email? [turning transform.php into a bona fide parser] > 1. It would help guarantee correctness of generated HTML. I doubt that you will be able to ensure HTML4 compliance. As far as I have tested out, "correct" markup will produce correct HTML. Why deal with stuff like double-intended lists without a single list first? Wrong markup produces wrong HTML - simple enough. > 2. It would make extensions to the markup such as Reini's > <code></code> blocks cleaner to implement. Block stuff could easily be fitted into the current transform.php with some minor modifications. I have already thought about that. (About block markup itself: I guess you know where I stand ;o) > 3. I've starting thinking about how to colorize (or otherwise) mark > the transformed text to highlight diffs. It's not easy to fit a > colorizing scheme into the current transform code Hm, if you think of the one diff-mode where you have the two versions side by side, it should be fairly easy to implement. Using some variation of the current diff-mode might be hard to implement, I agree. > all the ways I've thought of to do this would surely not > receive the Arno seal of code simplicity. :o) Well, I'd also argue that the diff stuff is not fundamental to wiki, and that the current diff is good enough for 95% of all cases. But I'm also interested to learn about the different ideas you had! > To change topics slightly, a personal peeve I have with the current > markup, is this thing about having to put entire paragraphs (& list items, > etc...) on one line. I think this is a frequent complaint. Maybe we should deal with this. Your simple solution (everything keeps current mode until a newline or another mode- markup is encountered) seems straightforward and easy to implement. I'm not sure if it's a good idea, but maybe (if modifications are small) we could add this as option? > (I don't see any reason why italicization and boldizization shouldn't be > able to span these continuation lines as well.) Because, "errors" are then contained to one line? The thing I really like about wiki markup is, that no matter what I (or others!!) do on line 3, line 4 will always be shown as I have meant it to be displayed. Furthermore, currently if there's only one '' in the line, italics won't even be opened. It needs an open and close tag to be recognized by the regexp. /Arno |