From: Arno H. <aho...@in...> - 2000-07-18 20:46:00
|
> (Speaking of which: it would probably be possible to avoid the use > of the Perl regexps altogether, in favor of PHP's ereg_'s. Is this > worth considering? How many PHP's are out there without PCRE support?) Some Windows PHP's don't have preg_* functions. You can do without them in most places, but there are some where you absolutely need them. So if there's no way around it, you can use them throughout. > As a footnote though: I'm pretty sure that in most cases one transform > with a complex regexp is faster than two transforms with simple regexps. Point taken. > The groups stuff is there to deal with the recursable stuff --- you haven't > yet convinced me that the recursable stuff is unnecessary. Ok, trying to convince you :o) We need tokenization at least for links and stuff. That's for sure. But do we need it for emphasis markup and the like? Right now, recursive transforms are only used for '',''',__ Please correct me if I'm wrong. Suppose the following line "__Bold and ''bold italics''__" Transforms are registered in this order 1. __ 2. ''' 3. '' Instead of tokenizing $line, you directly subsitute the HTML into $line. So, step 1 $line is changed to "<strong>Bold and ''bold italics''</strong>" Step 2 does nothing and step three executes without nesting (no tokens in $line): "<strong>Bold and <i>bold italics</i></strong>" Voila :o) If there's something like "Look at __WikiLink__" it becomes: "Look at __$token$__" "Look at <strong>$token$</strong>" "Look at <strong><a href="...">WikiLink</a></strong>" Problem solved. Only use tokens where they are absolutely necessary. I don't see the need to tokenize emphasis markup or things like '%%%' and '^-{4,}' By ensuring that transforms are executed in the right order, the freshly inserted HTML tags won't interfere with later transformations. E.g. it's important to do links and the '&<>' transform before doing the rest. Did I convince you? Sure, the new architecture is then a mixture of tokens and HTML-in-place - compared to your tokens-only approach. But it's much simplier - less complexity. And I don't think it's too ugly from a structural point of view either. /Arno |