You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(103) |
Jul
(105) |
Aug
(16) |
Sep
(16) |
Oct
(78) |
Nov
(36) |
Dec
(58) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(100) |
Feb
(155) |
Mar
(84) |
Apr
(33) |
May
(22) |
Jun
(77) |
Jul
(36) |
Aug
(37) |
Sep
(183) |
Oct
(74) |
Nov
(235) |
Dec
(165) |
2002 |
Jan
(187) |
Feb
(183) |
Mar
(52) |
Apr
(10) |
May
(15) |
Jun
(19) |
Jul
(43) |
Aug
(90) |
Sep
(144) |
Oct
(144) |
Nov
(171) |
Dec
(78) |
2003 |
Jan
(113) |
Feb
(99) |
Mar
(80) |
Apr
(44) |
May
(35) |
Jun
(32) |
Jul
(34) |
Aug
(34) |
Sep
(30) |
Oct
(57) |
Nov
(97) |
Dec
(139) |
2004 |
Jan
(132) |
Feb
(223) |
Mar
(300) |
Apr
(221) |
May
(171) |
Jun
(286) |
Jul
(188) |
Aug
(107) |
Sep
(97) |
Oct
(106) |
Nov
(139) |
Dec
(125) |
2005 |
Jan
(200) |
Feb
(116) |
Mar
(68) |
Apr
(158) |
May
(70) |
Jun
(80) |
Jul
(55) |
Aug
(52) |
Sep
(92) |
Oct
(141) |
Nov
(86) |
Dec
(41) |
2006 |
Jan
(35) |
Feb
(62) |
Mar
(59) |
Apr
(52) |
May
(51) |
Jun
(61) |
Jul
(30) |
Aug
(36) |
Sep
(12) |
Oct
(4) |
Nov
(22) |
Dec
(34) |
2007 |
Jan
(49) |
Feb
(19) |
Mar
(37) |
Apr
(16) |
May
(9) |
Jun
(38) |
Jul
(17) |
Aug
(31) |
Sep
(16) |
Oct
(34) |
Nov
(4) |
Dec
(8) |
2008 |
Jan
(8) |
Feb
(16) |
Mar
(14) |
Apr
(6) |
May
(4) |
Jun
(5) |
Jul
(9) |
Aug
(36) |
Sep
(6) |
Oct
(3) |
Nov
(3) |
Dec
(3) |
2009 |
Jan
(14) |
Feb
(2) |
Mar
(7) |
Apr
(16) |
May
(2) |
Jun
(10) |
Jul
(1) |
Aug
(10) |
Sep
(11) |
Oct
(4) |
Nov
(2) |
Dec
|
2010 |
Jan
(1) |
Feb
|
Mar
(13) |
Apr
(11) |
May
(18) |
Jun
(44) |
Jul
(7) |
Aug
(2) |
Sep
(14) |
Oct
|
Nov
(6) |
Dec
|
2011 |
Jan
(2) |
Feb
(6) |
Mar
(3) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
(11) |
Feb
(3) |
Mar
(11) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(4) |
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(8) |
Dec
(1) |
2015 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(3) |
May
(1) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2016 |
Jan
|
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
(5) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(6) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
(11) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
(3) |
2024 |
Jan
(7) |
Feb
(2) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jeff D. <da...@da...> - 2000-07-18 23:48:46
|
In message <147...@da...>,Arno Hollosi writes: >The one place I can think of right now is the use of preg_match_all() >in wiki_transform. Also, eregs don't have non-greedy matches. Can't >remember which one, but I recall that there is at least one match >which needs non-greediness. Of course, "need" is always relative. :-) > > Perhaps we can live with [invalid HTML]? > >I can, because the above case will not appear very often, will it? Not except as a result of typos and brainos. If the wiki markup is esoteric or just wrong, I don't mind if it comes out looking like garbage (in fact, it should). However broken HTML makes me nervous. Who knows what it will come out looking like on whatever random browser I happen to be using? (I'll admit the world is unlikely to end.) >Btw, as your FIXME states: the recursive logic does not work as >advertised: "__''word''__" renders ok, but "''__word__''" is not >rendered - instead __ is inserted verbatim. Just looking at the code it >becomes clear where the "fault" lies: you are always processing $line. >Real recursion means processing the created tokens. (I guess you are >aware of that already) Oddly enough replacing __ with ''' makes it >work in both cases, but that is due to the regexp and not >because of the recursion. You're right. Actually, my original intent was to handle this via regexps. My intent (not that it made it into the code) was that none of the "''", "'''", or "__" quoted expressions are recognized unless they contain no (untokenized) occurrence of either "''" or "__". Ie. the regexp for the __Bold__ expressions should have been: "__[^_'](?:[^_']+|_(?!_)|'(?!'))+(?<!_)__" There! Haha. Make sense? No really, you're right. It's broken. > > I suppose we could eliminate the recursable logic, while keeping the > > tokenization by applying each of the currently recursed transformations > > twice. > >Apart from doing ''' before '' (otherwise '''word''' becomes '<i>word</i>') >it does not immediately solve the problem. You need to transfrom the >tokens and not $line as you do right now. Of course. Okay, so never mind... >So my conclusion is: recursion adds complexity (while having its benefits). >Let's start with HTML-in-place right now, and once some time has >passed and the dust settled, we can do the recursion stuff - we will >then have a better understanding of the issue. > >[Or you write a functioning and beautiful recursion right away ;o)] Let me search for a nicer solution for a little while more. (A week or two.) As I see it, there's no big rush for this, as the present wiki_transform works just fine. Jeff |
From: Arno H. <aho...@in...> - 2000-07-18 22:35:20
|
> >Line-by-line processing is inherited from 1.0, which is how most Wikis do > >things. > > Do we want to get away from line-by-line processing? I don't. Keep the line-by-line approach. As you said: errors don't spill over the rest of the page. That makes wiki more fun to experiment with. /Arno |
From: Steve W. <sw...@wc...> - 2000-07-18 22:28:25
|
On Tue, 18 Jul 2000, Jeff Dairiki wrote: > Do we want to get away from line-by-line processing? > It can be done. > Now's the time to do it. > I think it might be faster that way besides. > > However, I kind of like the line-by-line processing. It keeps goofs in one > line from hosing the whole page. True... in a browser it's easy to see where you goofed by using View Source; not quite as practical here though. Would search be impacted? Probably not.. we can still iterate though lines by exploding() the text... Would storage be impacted? Again I think not... these are separated for a reason... I don't know. I think it's not a necessary change right now, and it creates even more work because certain markup has to be changed too. (Unless we're only talking about <b>, <i> and friends, then it's a minor point; to do all of them (<hr>, <pre>, etc) is too major a change, especially for 1.2. just thinking out loud again because I don't want to work on work, sw ................................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: J C L. <cl...@ka...> - 2000-07-18 22:24:46
|
On Tue, 18 Jul 2000 15:10:28 -0700 Jeff Dairiki <da...@da...> wrote: > However, I kind of like the line-by-line processing. It keeps > goofs in one line from hosing the whole page. It is significantly easier to do proper table support with line-by-line processing. Further whole file processing allows several easy optimisations. -- J C Lawrence Home: cl...@ka... ---------(*) Other: co...@ka... http://www.kanga/nu/~claw/ Keys etc: finger cl...@ka... --=| A man is as sane as he is dangerous to his environment |=-- |
From: Arno H. <aho...@in...> - 2000-07-18 22:21:03
|
> >Some Windows PHP's don't have preg_* functions. > >You can do without them in most places, but there are some where you > >absolutely need them. > > Not that I doubt you, but, out of curiosity: where? The one place I can think of right now is the use of preg_match_all() in wiki_transform. Also, eregs don't have non-greedy matches. Can't remember which one, but I recall that there is at least one match which needs non-greediness. > The one drawback I see offhand is that it's possible for (invalid ?) wiki > markup to generate invalid HTML. > > Eg.: "''__'' ''__''" becomes "<i><b></i> <i></b></i>". This is indeed invalid HTML. But the other way around (with tokens) the inner '' will have no effect at all (effectively: <i><i></i><i>) if __ is processed before '', or it becomes "<i>__</i> <i>__</i>" if __ is processed after ''. So the actual behaviour is not immediately apparent from the markup but depends on the implementation. Not much difference. > Perhaps we can live with [invalid HTML]? I can, because the above case will not appear very often, will it? > My thinking was that by tokenizing anything containing HTML markup, > the HTML is protected from being mangled by subsequent transforms. > As long as each transform individually produces complete (and correct) > HTML entities, the proper nesting of the final HTML output is guaranteed. A valid point. > This helps to minimize the sensitivity on the ordering of > the transforms. I view this as somewhat important since it will > make the writing of (well-behaved) transforms in (as yet unimagined) > future extension modules simpler. Ordering will always play a role. Though I have to agree that hiding HTML reduces one conflict point in the future for those "yet unimagined" extension modules. Btw, as your FIXME states: the recursive logic does not work as advertised: "__''word''__" renders ok, but "''__word__''" is not rendered - instead __ is inserted verbatim. Just looking at the code it becomes clear where the "fault" lies: you are always processing $line. Real recursion means processing the created tokens. (I guess you are aware of that already) Oddly enough replacing __ with ''' makes it work in both cases, but that is due to the regexp and not because of the recursion. > I suppose we could eliminate the recursable logic, while keeping the > tokenization by applying each of the currently recursed transformations > twice. > > 1. Transform "''"s > 2. Transform "'''"s > 3. Transform "__"s > 4. Transform "''"s again > 5. Transform "'''"s again Apart from doing ''' before '' (otherwise '''word''' becomes '<i>word</i>') it does not immediately solve the problem. You need to transfrom the tokens and not $line as you do right now. So my conclusion is: recursion adds complexity (while having its benefits). Let's start with HTML-in-place right now, and once some time has passed and the dust settled, we can do the recursion stuff - we will then have a better understanding of the issue. [Or you write a functioning and beautiful recursion right away ;o)] /Arno |
From: Jeff D. <da...@da...> - 2000-07-18 22:11:20
|
In message <Pin...@bo...>,Steve Wai nstead writes: >The minor drawback is that it's line-by-line processing, and if you want >to have successive lines in italics in preformatted text every line must >start and end with: > > ''here is my preformatted text in italics'' > >Line-by-line processing is inherited from 1.0, which is how most Wikis do >things. Do we want to get away from line-by-line processing? It can be done. Now's the time to do it. I think it might be faster that way besides. However, I kind of like the line-by-line processing. It keeps goofs in one line from hosing the whole page. |
From: Steve W. <sw...@wc...> - 2000-07-18 22:05:48
|
On Tue, 18 Jul 2000, Jeff Dairiki wrote: > >You can do without them in most places, but there are some where you > >absolutely need them. > > Not that I doubt you, but, out of curiosity: where? Oh, bugger... where was that? Arno's right though, there are places where preg_* are the only solution. > The one drawback I see offhand is that it's possible for (invalid ?) wiki > markup > to generate invalid HTML. > > Eg.: "''__'' ''__''" becomes "<i><b></i> <i></b></i>". > > Perhaps we can live with that? At some point you have to decide the user is sane and has some intelligence... we can concoct pathological situations all day and develop workarounds but I don't think that would make for a fun project. :-) > Yes you could tokenize the <br> and <hr> or not --- since the tokenizing > mechanism is already in place (an must remain so for the links, at least) > it really makes no difference readability, or complexity, and negligible > difference in run time. Probably true... > My thinking was that by tokenizing anything containing HTML markup, > the HTML is protected from being mangled by subsequent transforms. > As long as each transform individually produces complete (and correct) > HTML entities, the proper nesting of the final HTML output is guaranteed. > > This helps to minimize the sensitivity on the ordering of > the transforms. I view this as somewhat important since it will > make the writing of (well-behaved) transforms in (as yet unimagined) > future extension modules simpler. I agree; in a way this is a variation on the argument for storing all links in a separate table and storing the pages in a semi-state. What will the long term benefits be? In this case you can eliminate line-by-line processing entirely, but that would also require changes to the markup language (for plain text, you'd have to have some substitute for the tag instead of indenting with spaces like we do now; lists would be a nightmare; and we'd reinvent HTML, something I've repeatedly told users I have no intention of doing.) (Implementing XHTML might be worthwhile though. Mind you, I'm not suggesting this for 1.2 or even 1.4 (2.0?) but just speculating.) > I suppose we could eliminate the recursable logic, while keeping the > tokenization by applying each of the currently recursed transformations > twice. > > 1. Transform "''"s > 2. Transform "'''"s > 3. Transform "__"s > 4. Transform "''"s again > 5. Transform "'''"s again > > This, I think, handles everything that your method does (while eliminating > the possibility of invalid HTML output.) Not having read the code yet I'm not sure what the fuss is about... I did solve the whole issue of order-of-transformations in wiki_transform.php3 ages ago. Also, being performance minded is a good thing, but don't let it corner you into writing 10x the amount of code, or seriously complex code, just to gain small benefits. Wikis do not scale. Wikis cannot scale. They can grow a lot wider, but there is a low limit on how many people can edit a given topic before lost updates create confusion and frustration. Do not write bubble sorts; do not write loops that call external programs; but don't be afraid to use Perl regular expressions or make deep copies of objects, because we have the room to do it. sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Steve W. <sw...@wc...> - 2000-07-18 21:39:08
|
On Tue, 18 Jul 2000, Arno Hollosi wrote: > Sure, the new architecture is then a mixture of tokens and > HTML-in-place - compared to your tokens-only approach. > But it's much simplier - less complexity. And I don't think it's > too ugly from a structural point of view either. The minor drawback is that it's line-by-line processing, and if you want to have successive lines in italics in preformatted text every line must start and end with: ''here is my preformatted text in italics'' Line-by-line processing is inherited from 1.0, which is how most Wikis do things. just a minor point, sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Jeff D. <da...@da...> - 2000-07-18 21:25:40
|
In message <147...@da...>,Arno Hollosi writes: >Some Windows PHP's don't have preg_* functions. >You can do without them in most places, but there are some where you >absolutely need them. Not that I doubt you, but, out of curiosity: where? >Instead of tokenizing $line, you directly subsitute the HTML into $line. >So, step 1 $line is changed to >"<strong>Bold and ''bold italics''</strong>" >Step 2 does nothing and step three executes without nesting (no tokens >in $line): >"<strong>Bold and <i>bold italics</i></strong>" > >Voila :o) Okay, I get it now. The one drawback I see offhand is that it's possible for (invalid ?) wiki markup to generate invalid HTML. Eg.: "''__'' ''__''" becomes "<i><b></i> <i></b></i>". Perhaps we can live with that? >Problem solved. Only use tokens where they are absolutely necessary. >I don't see the need to tokenize emphasis markup or things like >'%%%' and '^-{4,}' Yes you could tokenize the <br> and <hr> or not --- since the tokenizing mechanism is already in place (an must remain so for the links, at least) it really makes no difference readability, or complexity, and negligible difference in run time. My thinking was that by tokenizing anything containing HTML markup, the HTML is protected from being mangled by subsequent transforms. As long as each transform individually produces complete (and correct) HTML entities, the proper nesting of the final HTML output is guaranteed. This helps to minimize the sensitivity on the ordering of the transforms. I view this as somewhat important since it will make the writing of (well-behaved) transforms in (as yet unimagined) future extension modules simpler. I suppose we could eliminate the recursable logic, while keeping the tokenization by applying each of the currently recursed transformations twice. 1. Transform "''"s 2. Transform "'''"s 3. Transform "__"s 4. Transform "''"s again 5. Transform "'''"s again This, I think, handles everything that your method does (while eliminating the possibility of invalid HTML output.) Jeff |
From: Arno H. <aho...@in...> - 2000-07-18 20:46:00
|
> (Speaking of which: it would probably be possible to avoid the use > of the Perl regexps altogether, in favor of PHP's ereg_'s. Is this > worth considering? How many PHP's are out there without PCRE support?) Some Windows PHP's don't have preg_* functions. You can do without them in most places, but there are some where you absolutely need them. So if there's no way around it, you can use them throughout. > As a footnote though: I'm pretty sure that in most cases one transform > with a complex regexp is faster than two transforms with simple regexps. Point taken. > The groups stuff is there to deal with the recursable stuff --- you haven't > yet convinced me that the recursable stuff is unnecessary. Ok, trying to convince you :o) We need tokenization at least for links and stuff. That's for sure. But do we need it for emphasis markup and the like? Right now, recursive transforms are only used for '',''',__ Please correct me if I'm wrong. Suppose the following line "__Bold and ''bold italics''__" Transforms are registered in this order 1. __ 2. ''' 3. '' Instead of tokenizing $line, you directly subsitute the HTML into $line. So, step 1 $line is changed to "<strong>Bold and ''bold italics''</strong>" Step 2 does nothing and step three executes without nesting (no tokens in $line): "<strong>Bold and <i>bold italics</i></strong>" Voila :o) If there's something like "Look at __WikiLink__" it becomes: "Look at __$token$__" "Look at <strong>$token$</strong>" "Look at <strong><a href="...">WikiLink</a></strong>" Problem solved. Only use tokens where they are absolutely necessary. I don't see the need to tokenize emphasis markup or things like '%%%' and '^-{4,}' By ensuring that transforms are executed in the right order, the freshly inserted HTML tags won't interfere with later transformations. E.g. it's important to do links and the '&<>' transform before doing the rest. Did I convince you? Sure, the new architecture is then a mixture of tokens and HTML-in-place - compared to your tokens-only approach. But it's much simplier - less complexity. And I don't think it's too ugly from a structural point of view either. /Arno |
From: Jeff D. <da...@da...> - 2000-07-18 20:15:36
|
I'm trying to start a new branch in the CVS in which to hack in support for PATH_INFO support. Every time I try to execute a CVS command referencing a tagged version, I get the error message: cvs [server aborted]: cannot write /cvsroot/phpwiki/CVSROOT/val-tags: Permission denied For example, all of the following commands fail with the same message: cvs diff -rrelease-1_1_7 cvs co -rrelease-1_1_7 cvs rtag -rjeffs_pathinfo_hacks-root -b jeffs_patchinfo_hacks-branch phpwiki (The last command is the one I really want to do.) Note that I had no problem creating the tag jeffs_pathinfo_hacks-root. Any ideas? Jeff |
From: Jeff D. <da...@da...> - 2000-07-18 20:10:43
|
In message <147...@da...>,Arno Hollosi writes: >I had a look at your new wiki_transform. >Overall impressive work. Thanks! >- the class interface (functions and variables) looks ok. > Some functions will have to be added in order to make it useable for > extracting links when using it from wiki_savepage. > (e.g. some way to access the array in WikiTokenizer()) It's already there (mostly). $page_renderer->wikilinks[] gets populated with all the WikiLinks found in the page. All that's needed is a bit more elegant API to get at the list. >- the regexp's are too complex in some places (which makes the > overall rendering slower than necessary): > Take for example: /__ ( (?: [^_] (?:[^_]|_[^_])* )? ) __/x > which renders __strong__ emphasis. Apparantly this regexp ensures > two things: no "_" after "__" and no "__" inside the emphasis. > How about: /__(.*?)__/ instead? ".*?" is non-greedy and thus > "__" cannot appear inside the emphasis. Also, why forbid "_" after > "__"? In your case "___text__" is rendered as "_<strong>text</strong>" > in my case it's rendered as "<strong>_text</strong>". What's the > difference? Okay, okay. So I'm paranoid. Yes the regexps should be cleaned up. My guess is that (at least in most cases) the speed differences are negligible --- I readily admit the regexps could be more readable. (Speaking of which: it would probably be possible to avoid the use of the Perl regexps altogether, in favor of PHP's ereg_'s. Is this worth considering? How many PHP's are out there without PCRE support?) >- Also, I don't think that all those "(?" extended regex syntax > is really necessary. It may be in some places, where it's important > to have a proper \0 match-value. But in all other places it adds > to complexity without any benefits (and makes the regexp slower, no?) Okay, okay already! :-) >- Ok, I don't like the groups. But groupTransforms() is plain ugly. > I understand that this stems from your goal to combine as many > transforms into a single $group as possible. I don't understand > the benefit of this approach - the only difference is that the > inner loops of render_line() are executed more often than the > outer for-loop. So what? The point was to do as much of the looping logic as possible (the grouping) only once rather once each line. It does make a speed difference. It is butt-ugly. I don't like it either. >- Maybe you are trying too hard with the concept of tokenization of a > line. E.g. is it really necessary to tokenize emphasises like "__" > and "'''"/"''"? Why not generate the HTML directly (<strong><b><i>)? > All you have to do is make sure, that later transforms don't mess > with the inserted HTML tags. By ordering the transforms (as you plan > to do anyway) this can be achieved easily. This would also solve > your problem of recursive transforms. Take the easy route first. > If we ever come accross a markup that requires recursive stuff, > then we can add recursive transforms. Right now I don't see the > need for them. The tokenization is not really necessary in all cases, but it is needed (I think) for the various links (or else the regexps get horrendous). If you accept that the tokenization code is needed, then it makes little difference (in complexity or time) whether <b>'s and <i>'s are tokenized or not. Tokenizing (I think) is safer --- less chance of some not-quite-completely- well-conceived custom WikiTransform mucking things up. As for recursiveness: I don't really see how direct substitution of the HTML gets around the root of the problem. How do you deal with __''Bold-italic'' and bold__ (or ''__Bold-italic__ and italic'')? (Or should we just punt on that?) >So my suggestions: > >- get rid of groups - implement priority sort order instead Yes, we need some sort of priority sorting anyhow, so that the WikiTransforms don't have to be registered() in a specific order. The groups stuff is there to deal with the recursable stuff --- you haven't yet convinced me that the recursable stuff is unnecessary. >- get rid of recursive markup - right now it's only needed for > emphasis. Insert the HTML tags instead. Again, I don't yet see how this helps? >- final transfroms can be dealt with one if-clause like > if($t->final) break; Yes that's the way I did it before I added the code to deal with the recursive stuff. (But then ''__Bold-italic__'' was broken.) >- make your regexps simpler. And if one regexp becomes too > complex split it into two transfroms. Okay, already! As a footnote though: I'm pretty sure that in most cases one transform with a complex regexp is faster than two transforms with simple regexps. Okay, so I guess my main counter-response is: Either: a) Convince me that the recursable stuff really is not needed. or b) Suggest a cleaner way to deal with the recursable stuff. Jeff |
From: Arno H. <aho...@in...> - 2000-07-18 18:56:09
|
Jeff, I had a look at your new wiki_transform. Overall impressive work. In some places it seems a little bit awkward. Actually, I had problems to understand how it works at first. I'm not sure I like the split into groups of the transfrom objects. The distinction final/normal/recursive seems necessary, but I'm sure it can be solved in a different way. See below (we can do away with recursive tokenization and the distinction final/normal can be dealt with one easy if-clause in render_line() instead of having groups and two different loops) Random thoughts: - the class interface (functions and variables) looks ok. Some functions will have to be added in order to make it useable for extracting links when using it from wiki_savepage. (e.g. some way to access the array in WikiTokenizer()) - the regexp's are too complex in some places (which makes the overall rendering slower than necessary): Take for example: /__ ( (?: [^_] (?:[^_]|_[^_])* )? ) __/x which renders __strong__ emphasis. Apparantly this regexp ensures two things: no "_" after "__" and no "__" inside the emphasis. How about: /__(.*?)__/ instead? ".*?" is non-greedy and thus "__" cannot appear inside the emphasis. Also, why forbid "_" after "__"? In your case "___text__" is rendered as "_<strong>text</strong>" in my case it's rendered as "<strong>_text</strong>". What's the difference? - Also, I don't think that all those "(?" extended regex syntax is really necessary. It may be in some places, where it's important to have a proper \0 match-value. But in all other places it adds to complexity without any benefits (and makes the regexp slower, no?) - Ok, I don't like the groups. But groupTransforms() is plain ugly. I understand that this stems from your goal to combine as many transforms into a single $group as possible. I don't understand the benefit of this approach - the only difference is that the inner loops of render_line() are executed more often than the outer for-loop. So what? - Maybe you are trying too hard with the concept of tokenization of a line. E.g. is it really necessary to tokenize emphasises like "__" and "'''"/"''"? Why not generate the HTML directly (<strong><b><i>)? All you have to do is make sure, that later transforms don't mess with the inserted HTML tags. By ordering the transforms (as you plan to do anyway) this can be achieved easily. This would also solve your problem of recursive transforms. Take the easy route first. If we ever come accross a markup that requires recursive stuff, then we can add recursive transforms. Right now I don't see the need for them. So my suggestions: - get rid of groups - implement priority sort order instead - get rid of recursive markup - right now it's only needed for emphasis. Insert the HTML tags instead. - final transfroms can be dealt with one if-clause like if($t->final) break; - make your regexps simpler. And if one regexp becomes too complex split it into two transfroms. Again, very promising start. Good work. /Arno |
From: Jeff D. <da...@da...> - 2000-07-18 05:35:30
|
>The pages are stored as MIME e-mail messages, with the meta-data stored >as parameters in the Content-Type: header. > >I also added the ability to make a zip including the archived versions of >the pages. In this case you still get one file per page, formatted >as a multipart MIME message: one part for each version of the page. Okay, so now how to use these zip files? Here's how: The CVS version now has a new config constant WIKI_PGSRC (in wiki_config), which controls the source for the initial page contents when index.php3 is first invoked on an empty database (ie. no FrontPage). If WIKI_PGSRC is set to the name of a zip file, that zip file is used for the initial page contents. If WIKI_PGSRC is set to './pgsrc' then the old behavior results. Note that the unzipping code only supports the 'store' (non-compressed) and 'deflate' compression methods --- furthermore the 'deflate' method is only support if PHP was compiled with zlib support. Also I'm somewhat unconvinced that the unzip code will work on deflated data from all zip programs. According to zip spec, the file CRC and compressed file size can be stored either ahead of the file data, or after the file data. My code only works if its stored ahead of the file data. (I think this is fixable, but is a bit of a pain --- one must determine the compressed data size from the compressed data stream itself.) I don't see much point in fixing it unless this is a problem for some major zipper (eg. PKZIP.) (The unzipper should work on all uncompressed zip files.) So far I've only tested this code with zip files from wiki_zip and from Info-Zip's zip 2.3. If y'all could test it on anything else you've got, that would be great. Jeff |
From: Jeff D. <da...@da...> - 2000-07-17 16:47:27
|
Here's a current snapshot of my thoughts on the new transform code. This currently is in the form of a drop-in replacement for wiki_transform. However, if I were to insert this into PhpWiki now, most of it would go into wiki_stdlib. Some would go into new custum-feature module files. Only a skeleton would remain in wiki_transform. Here's some random thoughts, in order of increasing entropy: Currently this only implements wiki_transform. However it should be clear that class WikiRenderer can also be used as the basis for a modular replacement to GeneratePage(). The main thing that I'm not completely happy with (and which is not yet complete) is how the order of the WikiTransforms is specified. (It is clear that some sort of 'order' or 'precedence' parameter is required --- that's easy, I just haven't done it yet.) The hard part is handling the following issues in an efficient, clean, clear way (this issues are handled by this snapshot, but I'm not sure I'm happy with the implementation): o Some transforms are "final". When they are matched, they terminate the page rendering. o Some transforms (might) need to be applied repeatedly. Consider constructs like "__''bold-italic''__". Another issue is that putting the logic to handle these details into (what is now) the inner loop (over transforms) is slow. I think I'll try reversing the order of the loops (eg. make the loop over lines the inner loop, and see if that helps). Comments welcome. Jeff |
From: Jeff D. <da...@da...> - 2000-07-17 16:09:33
|
In message <147...@da...>,Arno Hollosi writes: > >I gave this some more thought. Here's what I've come up with. Good summary Arno. >Let me state again that the wikilink table can be used >with or without link tokenization. The benefits of this table are not >bound to tokenization. I agree completely. I think we should implement the link table soon. (In addition to the feature's we've been talking about, it will make the back-link search fast and correct.) >Pros: > > > 3. Faster page rendering. > >Wether or not this is true: it's a moot point. Just to add a data point: Wiki_transform takes about a second on the current TestPage (on PII/450). That is a fair amount of juice, and I can see that being an issue for some (though, it isn't for me, really). I don't think the new transform code is going to be any faster. >Jeff, I'd really like to see the class definitions of your >transform code. Okay! I'll send out my current working version in a separate email. Jeff |
From: Steve W. <sw...@wc...> - 2000-07-17 14:16:04
|
Great summary, Arno. As long as we architect 1.2 with this possibility in mind, I'm happy. sw On Mon, 17 Jul 2000, Arno Hollosi wrote: > > I gave this some more thought. Here's what I've come up with. > Let me state again that the wikilink table can be used > with or without link tokenization. The benefits of this table are not > bound to tokenization. > > Pros: > * Eliminate name collisions when merging wikis -- long term benefit > * Easy automatic link fixing when a page is renamed -- short term > benefit for a seldom (or not so seldom) used feature > * pages (and their referencing links) can be deleted easily -- short > term benefit for a seldom (or not so seldom) used feature. > > Note that the last two points "Seldom vs. often used feature" depends > on what kind of wiki you are running. In common wikis they would be > used seldom I reckon. > > Page deletes *without* deleting references can be easily done without > tokenization too. > > Con: > * Complexity and if it becomes too complex bugs may cause "Bad Things". > > > Other things mentioned: > > > Undefined pages can be listed separately (ListOfUndefinedPages) > > This can be done without tokenization as well. > Or is there more to this and I've overlooked something essential? > > > > 3. Inversion of the pre-transform [is hairy] > > > (Eg. was the link entered as "WikiLink", "[WikiLink]", > > > or "[ WikiLink ]"?) > > This is a moot point. Say links are always displayed as [WikiLink] > to editors afterwards. What's the drawback? > > > > 2. Bugs in transform code are more likely to cause > > > permanent changes/losses of page content. > > Only if the transform code becomes too complex. > > > > 3. Faster page rendering. > > Wether or not this is true: it's a moot point. > > > To sum it up: some (small?) short term benefits plus a long term > benefit weighed against added complexity. > > I vote for postponing this change until 1.4. > Eventually it will be done, but 1.2 is too early for this. > > Let's concentrate on the high priority issues first: > - making phpwiki more modular for easier extension and customization > - refactoring the db interface (going oop?) > - adding new navigation possibilities through use of our > new db schema. > > When this is done we can roll out 1.2. > And then we can start the really crazy things. > > Jeff, I'd really like to see the class definitions of your > transform code. > > > /Arno > > P.S: I have to switch vacation with a colleague. This means that > I'm on vacation from Thursday until end of July. Probably without > email access, but unable to code on phpwiki for sure :( > > -- > secret plan: > 1. world domination > 2. get all cookies > 3. eat all cookies > > _______________________________________________ > Phpwiki-talk mailing list > Php...@li... > http://lists.sourceforge.net/mailman/listinfo/phpwiki-talk > ................................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Arno H. <aho...@in...> - 2000-07-17 13:27:33
|
I gave this some more thought. Here's what I've come up with. Let me state again that the wikilink table can be used with or without link tokenization. The benefits of this table are not bound to tokenization. Pros: * Eliminate name collisions when merging wikis -- long term benefit * Easy automatic link fixing when a page is renamed -- short term benefit for a seldom (or not so seldom) used feature * pages (and their referencing links) can be deleted easily -- short term benefit for a seldom (or not so seldom) used feature. Note that the last two points "Seldom vs. often used feature" depends on what kind of wiki you are running. In common wikis they would be used seldom I reckon. Page deletes *without* deleting references can be easily done without tokenization too. Con: * Complexity and if it becomes too complex bugs may cause "Bad Things". Other things mentioned: > Undefined pages can be listed separately (ListOfUndefinedPages) This can be done without tokenization as well. Or is there more to this and I've overlooked something essential? > > 3. Inversion of the pre-transform [is hairy] > > (Eg. was the link entered as "WikiLink", "[WikiLink]", > > or "[ WikiLink ]"?) This is a moot point. Say links are always displayed as [WikiLink] to editors afterwards. What's the drawback? > > 2. Bugs in transform code are more likely to cause > > permanent changes/losses of page content. Only if the transform code becomes too complex. > > 3. Faster page rendering. Wether or not this is true: it's a moot point. To sum it up: some (small?) short term benefits plus a long term benefit weighed against added complexity. I vote for postponing this change until 1.4. Eventually it will be done, but 1.2 is too early for this. Let's concentrate on the high priority issues first: - making phpwiki more modular for easier extension and customization - refactoring the db interface (going oop?) - adding new navigation possibilities through use of our new db schema. When this is done we can roll out 1.2. And then we can start the really crazy things. Jeff, I'd really like to see the class definitions of your transform code. /Arno P.S: I have to switch vacation with a colleague. This means that I'm on vacation from Thursday until end of July. Probably without email access, but unable to code on phpwiki for sure :( -- secret plan: 1. world domination 2. get all cookies 3. eat all cookies |
From: Steve W. <sw...@wc...> - 2000-07-17 03:28:10
|
On Sun, 16 Jul 2000, Jeff Dairiki wrote: > As of yet, I'm not at all convinced this is worth the effort. It's a big > kettle of fish. I see three reasons to consider saving the pages in > a partially transformed state: > > 1. Eliminate name collisions when merging wikis. > > So, I think this is a moot point. Yes, I was recounting that I thought about the problem when I was trying to merge two Wikis, not that it was a motivating factor here. Our criteria here should be, will this bring real benefits? > 2. Easy automatic link fixing when a page is renamed. > > I don't think this makes it worth the effort. It will be easy enough > to translate the links (e.g. with my forthcoming generalized > wiki_transform) > in the raw markup. OK... > 3. Faster page rendering. > > This might be an issue. However, if it is, I think the best way to speed > page rendering is just to cache the transformed HTML in the database. Oh, I hadn't thought of that... but I doubt the gain would be much. > Drawbacks of partial pre-transforming: > > 1. Complexity. (Not that I'm not a fan of complexity ;-? ) > 2. Bugs in transform code are more likely to cause permanent changes/losses > of page content. > 3. Inversion of the pre-transform is another kettle of fish, especially if one > wants to ensure the the output of the inversion matches the original > markup. > (Eg. was the link entered as "WikiLink", "[WikiLink]", or "[ WikiLink ]"?) No doubt, there is risk. "Fortune favors the bold." :-) Point 3 is particularly troublesome though. I sketched out a simple pair of algorithms to help me think about this: On submit of a page: replace all links with tokens update list of links save page to db save list to db On select of page: fetch page fetch link list replace tokens with links send page to user So what are the possible benefits? Here's a list of what I thought of: Undefined pages can be listed separately (ListOfUndefinedPages), page names can be changed, pages can be deleted (and links to deleted pages would be replaced with a message, or link to a different page) Three things, initially. sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Jeff D. <da...@da...> - 2000-07-16 17:39:32
|
In message <147...@da...>,Arno Hollosi writes: > > > > Or do you think we should store the intermediate state instead? > > > > Yes, this is an idea I had a long time ago. > > I thought that [...] it would open up some interesting possibilities > >Sure :o) >I'm in favour of this change. We have to think about some side effects >like links to pages not yet existing. As of yet, I'm not at all convinced this is worth the effort. It's a big kettle of fish. I see three reasons to consider saving the pages in a partially transformed state: 1. Eliminate name collisions when merging wikis. Well, before we can merge wikis we need an interchange format. (Currently, this is shaping up to be my zipfile.) Then, page name tokenization only helps if the interchanged format contains tokenized names. I'm pretty sure this is not the greatest idea. So, I think this is a moot point. 2. Easy automatic link fixing when a page is renamed. I don't think this makes it worth the effort. It will be easy enough to translate the links (e.g. with my forthcoming generalized wiki_transform) in the raw markup. 3. Faster page rendering. This might be an issue. However, if it is, I think the best way to speed page rendering is just to cache the transformed HTML in the database. Drawbacks of partial pre-transforming: 1. Complexity. (Not that I'm not a fan of complexity ;-? ) 2. Bugs in transform code are more likely to cause permanent changes/losses of page content. 3. Inversion of the pre-transform is another kettle of fish, especially if one wants to ensure the the output of the inversion matches the original markup. (Eg. was the link entered as "WikiLink", "[WikiLink]", or "[ WikiLink ]"?) Jeff |
From: Jeff D. <da...@da...> - 2000-07-16 17:25:25
|
I've just checked in a new version of wiki_zip.php3 (& wiki_adminform.php). This does away with the secret zip header field for meta-data. The pages are stored as MIME e-mail messages, with the meta-data stored as parameters in the Content-Type: header. I also added the ability to make a zip including the archived versions of the pages. In this case you still get one file per page, formatted as a multipart MIME message: one part for each version of the page. Jeff |
From: Steve W. <sw...@wc...> - 2000-07-16 16:23:47
|
I added comments to some of the code this morning as I was figuring out what it does... it's up on the FTP site now, which I cannot reach at the moment so pick the URL off http://phpwiki.sourceforge.net/ if you can reach it. sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Steve W. <sw...@wc...> - 2000-07-16 06:55:35
|
I've checked in a version of wiki_dbmlib.php3 that should allow us to refactor the access to the data store.. I checked out a fresh copy from CVS and tested it, and it looks good (I can edit, save, view info, diff, and retrieve the copy from the archive. This includes new code to pad out the serialized data (with spaces via sprintf()) to make DBM files more space efficient in the long run. I am going to drop the HTML form for rebuilding the DBM files since a) there's a Perl script and b) it would have been a very hairy process with lots of file operations through a web browser, which scared me, and c) space loss shouldn't be too bad anymore. sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Steve W. <sw...@wc...> - 2000-07-16 04:09:04
|
I have been working on the refactoring of the DBM support, and came back to the problem of the memory leak. I wrote a test script and did some experimenting, and here's what I found: When you insert a key/value into a DBM and that pair already exists: if the value is less than or equal to the existing value, the space is reused, else new space is allocated and the old space is not reclaimed. The dbmdelete() function does not help at all. If you loop over a DBM and delete all key/values, and then replace them with the same pairs, there is no change in the file size. However if you delete all pairs, close the DBM and reopen it and reinsert all key/value pairs, the DBM file size doubles. If you delete all pairs and insert new ones with slightly larger value sizes you more than double the file size. This would suggest (if I feel ambitious) that for a DBM implementation all pages should be padded out to a certain size, say 500 bytes, and when they are fetched the padding is stripped. This probably wouldn't be too hard with the perl regexp package and spaces or $FieldSeparator. test script below, sw <? $page = implode("", file("pgsrc/AddingPages")); $h = dbmopen("/tmp/AAA", "c"); $time = time(); for ($x = 0; $x < 500; $x++) { if (dbmexists($h, "$x")) { $page = dbmfetch($h, "$x"); $page .= "$x$time"; } dbmdelete($h, "$x"); dbmreplace($h, "$x", $page); } echo system("ls -l /tmp/AAA"); dbmclose($h); ?> ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |
From: Steve W. <sw...@wc...> - 2000-07-15 21:58:37
|
I've posted to freshmeat.net, and updated the links on http://phpwiki.sourceforge.net/phpwiki/, and d/l and tested the tarball to make sure it's good, and it is... sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |