Re: [Phpwiki-talk] New transform code.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

In message <147...@da...>,Arno Hollosi writes:
>I had a look at your new wiki_transform.
>Overall impressive work.

Thanks!

>- the class interface (functions and variables) looks ok.
>  Some functions will have to be added in order to make it useable for
>  extracting links when using it from wiki_savepage.
>  (e.g. some way to access the array in WikiTokenizer())

It's already there (mostly).  $page_renderer->wikilinks[] gets populated
with all the WikiLinks found in the page.  All that's needed is a bit
more elegant API to get at the list.

>- the regexp's are too complex in some places (which makes the
>  overall rendering slower than necessary):
>  Take for example:  /__ ( (?: [^_] (?:[^_]|_[^_])* )? ) __/x
>  which renders __strong__ emphasis. Apparantly this regexp ensures
>  two things: no "_" after "__" and no "__" inside the emphasis.
>  How about: /__(.*?)__/ instead? ".*?" is non-greedy and thus
>  "__" cannot appear inside the emphasis. Also, why forbid "_" after
>  "__"? In your case "___text__" is rendered as "_<strong>text</strong>"
>  in my case it's rendered as "<strong>_text</strong>". What's the
>  difference?

Okay, okay.  So I'm paranoid.  Yes the regexps should be cleaned up.
My guess is that (at least in most cases) the speed differences are
negligible --- I readily admit the regexps could be more readable.

(Speaking of which: it would probably be possible to avoid the use
of the Perl regexps altogether, in favor of PHP's ereg_'s.  Is this
worth considering?  How many PHP's are out there without PCRE support?)

>- Also, I don't think that all those "(?" extended regex syntax
>  is really necessary. It may be in some places, where it's important
>  to have a proper \0 match-value. But in all other places it adds
>  to complexity without any benefits (and makes the regexp slower, no?)

Okay, okay already! :-)

>- Ok, I don't like the groups. But groupTransforms() is plain ugly.
>  I understand that this stems from your goal to combine as many
>  transforms into a single $group as possible. I don't understand
>  the benefit of this approach - the only difference is that the
>  inner loops of render_line() are executed more often than the
>  outer for-loop. So what?

The point was to do as much of the looping logic as possible (the grouping)
only once rather once each line.  It does make a speed difference.

It is butt-ugly.  I don't like it either.

>- Maybe you are trying too hard with the concept of tokenization of a
>  line. E.g. is it really necessary to tokenize emphasises like "__"
>  and "'''"/"''"?  Why not generate the HTML directly (<strong><b><i>)?
>  All you have to do is make sure, that later transforms don't mess
>  with the inserted HTML tags. By ordering the transforms (as you plan
>  to do anyway) this can be achieved easily. This would also solve
>  your problem of recursive transforms. Take the easy route first.
>  If we ever come accross a markup that requires recursive stuff,
>  then we can add recursive transforms. Right now I don't see the
>  need for them.

The tokenization is not really necessary in all cases, but it is needed
(I think) for the various links (or else the regexps get horrendous).
If you accept that the tokenization code is needed, then it makes little
difference (in complexity or time) whether <b>'s and <i>'s are tokenized or 
not.
Tokenizing (I think) is safer --- less chance of some not-quite-completely-
well-conceived custom WikiTransform mucking things up.

As for recursiveness: I don't really see how direct substitution of the
HTML gets around the root of the problem.  How do you deal with 
__''Bold-italic'' and bold__ (or ''__Bold-italic__ and italic'')?
(Or should we just punt on that?)

>So my suggestions:
>
>- get rid of groups - implement priority sort order instead

Yes, we need some sort of priority sorting anyhow, so that the WikiTransforms
don't have to be registered() in a specific order.

The groups stuff is there to deal with the recursable stuff --- you haven't
yet convinced me that the recursable stuff is unnecessary.

>- get rid of recursive markup - right now it's only needed for
>  emphasis. Insert the HTML tags instead.

Again, I don't yet see how this helps?

>- final transfroms can be dealt with one if-clause like
>  if($t->final) break;

Yes that's the way I did it before I added the code to deal with the
recursive stuff.  (But then ''__Bold-italic__'' was broken.)

>- make your regexps simpler. And if one regexp becomes too
>  complex split it into two transfroms.

Okay, already!

As a footnote though: I'm pretty sure that in most cases one transform
with a complex regexp is faster than two transforms with simple regexps.

Okay, so I guess my main counter-response is:

Either:
  a) Convince me that the recursable stuff really is not needed.
or
  b) Suggest a cleaner way to deal with the recursable stuff.

Jeff