|
From: Campo W. <rf...@nl...> - 2007-10-01 12:52:04
|
On Mon, Oct 01, 2007 at 01:35:02PM +0200, Sabri LABBENE wrote:
> Reini Urban wrote:
> >Campo Weijerman schrieb:
> >> On Fri, Sep 28, 2007 at 10:57:24AM +0200, Sabri LABBENE wrote:
> >>> Hi all,
> >>> I'm using phpwiki-1.3.12 and I'm trying to make it
> >recognize CamelCase words with numbers inside as wikiwords, fo example:
> >>> - CamelCase2 -> is a wikiword
> >>> - Camel2Case -> is also a wiki word
> >>> - 2CamelCase -> is also a wiki word
> >>>
> >>> I think there should be a regular expression somewhere in
> >the code that decides if a word is a wikiword. Can someone
> >teel where to find it ? If there will some side effects
> >whenever numbers are considered into wikiwords ?
> >>
> >> Hi,
> >>
> >> We had a similar requirement and solved it back with phpwiki 1.3.3 by
> >> changing the definition of $WikiNameRegexp in index.php
> >>
> >> With more recent releases there is WIKI_NAME_REGEXP in
> >> config/config.ini
> >>
> >> It takes some tweaking to arrive at the right compromise between the
> >> regex being too wide or too narrow. I think too wide is worse than
> >> too narrow: you can always force linking to a page by putting the name
> >> in [brackets], which is less painful than having to escape every other
> >> word on a page...
>
> I tried the regexp and it keeps catching CamelCase words without digits
> inside. I don't understand why you need to escape some other words in your
> page. May be you have as requirement to only link pagenames that contains
> digits.
Sure. The problem is, if you start tweaking the regexp it is easy to
come up with something that considers too many words a WikiWord, and
you'll end up having to escape lots of words.
> >> We have been using this for years now:
> >>
> >> WIKI_NAME_REGEXP =
> >>
> >"(?<![[:alnum:]])[[:upper:]][[:alnum:]]*?[[:lower:]][[:alnum:]]*?[[:up
> >> per:]][[:alnum:]]*(?![[:alnum:]])";
> >>
> >> Btw, the default is
> >>
> >> WIKI_NAME_REGEXP =
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:]]+){2,}(?![[:alnum:]])"
> >
> >config-dist.ini in CVS has these options:
> >http://phpwiki.cvs.sourceforge.net/phpwiki/phpwiki/config/confi
> >g-dist.ini?revision=1.83&view=markup
> >
> >; Perl regexp for WikiNames ("bumpy words"):
> >; (?<!..) & (?!...) used instead of '\b' because \b matches
> >'_' as well
> >; Allow digits: BumpyVersion132
> >; WIKI_NAME_REGEXP =
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:][:digit:]]+){2,}(?![[:
> >alnum:]])"
> >; Allow lower+digits+dots: BumpyVersion1.3.2
> >; WIKI_NAME_REGEXP =
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:][:digit:]\.]+){2,}(?![
> >[:alnum:]])"
> >; Default old behaviour, no digits as lowerchars.
> >;WIKI_NAME_REGEXP =
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:]]+){2,}(?![[:alnum:]])"
>
> Great, it works !
> Thanks Reini and Campo.
Actually, the suggestions offered by Reini better match what you asked
for. I use phpwiki mostly for documenting IT-related stuff, and as we
all know there are many acronyms used. The traditional definition of
WikiWord will include anything containing an embedded acronym, like
for example DocBookXML2LaTeX (at least, I don't think it does). The
alternative regexp I am now using will match any sequence of non-blank
non-punctuation that starts with a Capital letter and alternates
sufficiently between lower and uppercase. This works pretty well.
Regards,
--
$_ = "Campo Weijerman [rfc822://nl.ibm.com/]" and tr-[:]/-<@>-d and print;
|