From: Steve W. <sw...@wc...> - 2000-07-18 22:05:48
|
On Tue, 18 Jul 2000, Jeff Dairiki wrote: > >You can do without them in most places, but there are some where you > >absolutely need them. > > Not that I doubt you, but, out of curiosity: where? Oh, bugger... where was that? Arno's right though, there are places where preg_* are the only solution. > The one drawback I see offhand is that it's possible for (invalid ?) wiki > markup > to generate invalid HTML. > > Eg.: "''__'' ''__''" becomes "<i><b></i> <i></b></i>". > > Perhaps we can live with that? At some point you have to decide the user is sane and has some intelligence... we can concoct pathological situations all day and develop workarounds but I don't think that would make for a fun project. :-) > Yes you could tokenize the <br> and <hr> or not --- since the tokenizing > mechanism is already in place (an must remain so for the links, at least) > it really makes no difference readability, or complexity, and negligible > difference in run time. Probably true... > My thinking was that by tokenizing anything containing HTML markup, > the HTML is protected from being mangled by subsequent transforms. > As long as each transform individually produces complete (and correct) > HTML entities, the proper nesting of the final HTML output is guaranteed. > > This helps to minimize the sensitivity on the ordering of > the transforms. I view this as somewhat important since it will > make the writing of (well-behaved) transforms in (as yet unimagined) > future extension modules simpler. I agree; in a way this is a variation on the argument for storing all links in a separate table and storing the pages in a semi-state. What will the long term benefits be? In this case you can eliminate line-by-line processing entirely, but that would also require changes to the markup language (for plain text, you'd have to have some substitute for the tag instead of indenting with spaces like we do now; lists would be a nightmare; and we'd reinvent HTML, something I've repeatedly told users I have no intention of doing.) (Implementing XHTML might be worthwhile though. Mind you, I'm not suggesting this for 1.2 or even 1.4 (2.0?) but just speculating.) > I suppose we could eliminate the recursable logic, while keeping the > tokenization by applying each of the currently recursed transformations > twice. > > 1. Transform "''"s > 2. Transform "'''"s > 3. Transform "__"s > 4. Transform "''"s again > 5. Transform "'''"s again > > This, I think, handles everything that your method does (while eliminating > the possibility of invalid HTML output.) Not having read the code yet I'm not sure what the fuss is about... I did solve the whole issue of order-of-transformations in wiki_transform.php3 ages ago. Also, being performance minded is a good thing, but don't let it corner you into writing 10x the amount of code, or seriously complex code, just to gain small benefits. Wikis do not scale. Wikis cannot scale. They can grow a lot wider, but there is a low limit on how many people can edit a given topic before lost updates create confusion and frustration. Do not write bubble sorts; do not write loops that call external programs; but don't be afraid to use Perl regular expressions or make deep copies of objects, because we have the room to do it. sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |