Re: [Phpwiki-talk] New transform code.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 18 Jul 2000, Jeff Dairiki wrote:

> >You can do without them in most places, but there are some where you
> >absolutely need them.
> 
> Not that I doubt you, but, out of curiosity: where?

Oh, bugger... where was that? Arno's right though, there are places where
preg_* are the only solution.

> The one drawback I see offhand is that it's possible for (invalid ?) wiki 
> markup
> to generate invalid HTML.
> 
> Eg.:  "''__'' ''__''" becomes "<i><b></i> <i></b></i>".
> 
> Perhaps we can live with that?

At some point you have to decide the user is sane and has some
intelligence... we can concoct pathological situations all day and develop
workarounds but I don't think that would make for a fun project. :-)

> Yes you could tokenize the <br> and <hr> or not --- since the tokenizing
> mechanism is already in place (an must remain so for the links, at least)
> it really makes no difference  readability, or complexity, and negligible
> difference in run time.

Probably true...

> My thinking was that by tokenizing anything containing HTML markup,
> the HTML is protected from being mangled by subsequent transforms.
> As long as each transform individually produces complete (and correct)
> HTML entities, the proper nesting of the final HTML output is guaranteed.
> 
> This helps to minimize the sensitivity on the ordering of
> the transforms.  I view this as somewhat important since it will
> make the writing of (well-behaved) transforms in (as yet unimagined)
> future extension modules simpler.

I agree; in a way this is a variation on the argument for storing all
links in a separate table and storing the pages in a semi-state. What will
the long term benefits be? In this case you can eliminate line-by-line
processing entirely, but that would also require changes to the markup
language (for plain text, you'd have to have some substitute for the tag
instead of indenting with spaces like we do now; lists would be a
nightmare; and we'd reinvent HTML, something I've repeatedly told users I
have no intention of doing.) (Implementing XHTML might be worthwhile
though. Mind you, I'm not suggesting this for 1.2 or even 1.4 (2.0?) but
just speculating.)

> I suppose we could eliminate the recursable logic, while keeping the
> tokenization by applying each of the currently recursed transformations
> twice.
> 
>   1. Transform "''"s
>   2. Transform "'''"s
>   3. Transform "__"s
>   4. Transform "''"s again
>   5. Transform "'''"s again
> 
> This, I think, handles everything that your method does (while eliminating
> the possibility of invalid HTML output.)

Not having read the code yet I'm not sure what the fuss is about... I did
solve the whole issue of order-of-transformations in wiki_transform.php3
ages ago.

Also, being performance minded is a good thing, but don't let it corner
you into writing 10x the amount of code, or seriously complex code, just
to gain small benefits. Wikis do not scale. Wikis cannot scale. They can
grow a lot wider, but there is a low limit on how many people can edit a
given topic before lost updates create confusion and frustration. Do not
write bubble sorts; do not write loops that call external programs; but
don't be afraid to use Perl regular expressions or make deep copies of
objects, because we have the room to do it.

sw

...............................ooo0000ooo.................................
   Hear FM quality freeform radio through the Internet: http://wcsb.org/
                     home page: www.wcsb.org/~swain