Re: [Phpwiki-talk] New transform code.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

 > >Some Windows PHP's don't have preg_* functions.
 > >You can do without them in most places, but there are some where you
 > >absolutely need them.
 > 
 > Not that I doubt you, but, out of curiosity: where?

The one place I can think of right now is the use of preg_match_all()
in wiki_transform. Also, eregs don't have non-greedy matches. Can't
remember which one, but I recall that there is at least one match
which needs non-greediness.

 > The one drawback I see offhand is that it's possible for (invalid ?) wiki 
 > markup to generate invalid HTML.
 > 
 > Eg.:  "''__'' ''__''" becomes "<i><b></i> <i></b></i>".

This is indeed invalid HTML. But the other way around (with tokens)
the inner '' will have no effect at all (effectively: )
if __ is processed before '', or it becomes "__ __"
if __ is processed after ''. So the actual behaviour is not
immediately apparent from the markup but depends on the
implementation. Not much difference.

 > Perhaps we can live with [invalid HTML]?

I can, because the above case will not appear very often, will it?

 > My thinking was that by tokenizing anything containing HTML markup,
 > the HTML is protected from being mangled by subsequent transforms.
 > As long as each transform individually produces complete (and correct)
 > HTML entities, the proper nesting of the final HTML output is guaranteed.

A valid point.

 > This helps to minimize the sensitivity on the ordering of
 > the transforms.  I view this as somewhat important since it will
 > make the writing of (well-behaved) transforms in (as yet unimagined)
 > future extension modules simpler.

Ordering will always play a role. Though I have to agree that hiding
HTML reduces one conflict point in the future for those "yet
unimagined" extension modules.

Btw, as your FIXME states: the recursive logic does not work as
advertised: "__''word''__" renders ok, but "''__word__''" is not
rendered - instead __ is inserted verbatim. Just looking at the code it
becomes clear where the "fault" lies: you are always processing $line.
Real recursion means processing the created tokens. (I guess you are
aware of that already) Oddly enough replacing __ with ''' makes it
work in both cases, but that is due to the regexp and not
because of the recursion.

 > I suppose we could eliminate the recursable logic, while keeping the
 > tokenization by applying each of the currently recursed transformations
 > twice.
 > 
 >   1. Transform "''"s
 >   2. Transform "'''"s
 >   3. Transform "__"s
 >   4. Transform "''"s again
 >   5. Transform "'''"s again

Apart from doing ''' before '' (otherwise '''word''' becomes 'word')
it does not immediately solve the problem. You need to transfrom the
tokens and not $line as you do right now.

So my conclusion is: recursion adds complexity (while having its benefits).
Let's start with HTML-in-place right now, and once some time has
passed and the dust settled, we can do the recursion stuff - we will
then have a better understanding of the issue.

[Or you write a functioning and beautiful recursion right away ;o)]

/Arno