From: Arno H. <aho...@in...> - 2000-07-18 22:21:03
|
> >Some Windows PHP's don't have preg_* functions. > >You can do without them in most places, but there are some where you > >absolutely need them. > > Not that I doubt you, but, out of curiosity: where? The one place I can think of right now is the use of preg_match_all() in wiki_transform. Also, eregs don't have non-greedy matches. Can't remember which one, but I recall that there is at least one match which needs non-greediness. > The one drawback I see offhand is that it's possible for (invalid ?) wiki > markup to generate invalid HTML. > > Eg.: "''__'' ''__''" becomes "<i><b></i> <i></b></i>". This is indeed invalid HTML. But the other way around (with tokens) the inner '' will have no effect at all (effectively: <i><i></i><i>) if __ is processed before '', or it becomes "<i>__</i> <i>__</i>" if __ is processed after ''. So the actual behaviour is not immediately apparent from the markup but depends on the implementation. Not much difference. > Perhaps we can live with [invalid HTML]? I can, because the above case will not appear very often, will it? > My thinking was that by tokenizing anything containing HTML markup, > the HTML is protected from being mangled by subsequent transforms. > As long as each transform individually produces complete (and correct) > HTML entities, the proper nesting of the final HTML output is guaranteed. A valid point. > This helps to minimize the sensitivity on the ordering of > the transforms. I view this as somewhat important since it will > make the writing of (well-behaved) transforms in (as yet unimagined) > future extension modules simpler. Ordering will always play a role. Though I have to agree that hiding HTML reduces one conflict point in the future for those "yet unimagined" extension modules. Btw, as your FIXME states: the recursive logic does not work as advertised: "__''word''__" renders ok, but "''__word__''" is not rendered - instead __ is inserted verbatim. Just looking at the code it becomes clear where the "fault" lies: you are always processing $line. Real recursion means processing the created tokens. (I guess you are aware of that already) Oddly enough replacing __ with ''' makes it work in both cases, but that is due to the regexp and not because of the recursion. > I suppose we could eliminate the recursable logic, while keeping the > tokenization by applying each of the currently recursed transformations > twice. > > 1. Transform "''"s > 2. Transform "'''"s > 3. Transform "__"s > 4. Transform "''"s again > 5. Transform "'''"s again Apart from doing ''' before '' (otherwise '''word''' becomes '<i>word</i>') it does not immediately solve the problem. You need to transfrom the tokens and not $line as you do right now. So my conclusion is: recursion adds complexity (while having its benefits). Let's start with HTML-in-place right now, and once some time has passed and the dust settled, we can do the recursion stuff - we will then have a better understanding of the issue. [Or you write a functioning and beautiful recursion right away ;o)] /Arno |