Re: [Phpwiki-talk] Numbers in wikiwords

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Oct 01, 2007 at 01:35:02PM +0200, Sabri LABBENE wrote:
> Reini Urban wrote:
> >Campo Weijerman schrieb:
> >> On Fri, Sep 28, 2007 at 10:57:24AM +0200, Sabri LABBENE wrote:
> >>> Hi all,
> >>> I'm using phpwiki-1.3.12 and I'm trying to make it 
> >recognize CamelCase words with numbers inside as wikiwords, fo example:
> >>> - CamelCase2 -> is a wikiword
> >>> - Camel2Case -> is also a wiki word
> >>> - 2CamelCase -> is also a wiki word
> >>>
> >>> I think there should be a regular expression somewhere in 
> >the code that decides if a word is a wikiword. Can someone 
> >teel where to find it ? If there will some side effects 
> >whenever numbers are considered into wikiwords ?
> >> 
> >> Hi,
> >> 
> >> We had a similar requirement and solved it back with phpwiki 1.3.3 by 
> >> changing the definition of $WikiNameRegexp in index.php
> >> 
> >> With more recent releases there is WIKI_NAME_REGEXP in 
> >> config/config.ini
> >> 
> >> It takes some tweaking to arrive at the right compromise between the 
> >> regex being too wide or too narrow.  I think too wide is worse than 
> >> too narrow: you can always force linking to a page by putting the name 
> >> in [brackets], which is less painful than having to escape every other 
> >> word on a page...
> 
> I tried the regexp and it keeps catching CamelCase words without digits
> inside. I don't understand why you need to escape some other words in your
> page. May be you have as requirement to only link pagenames that contains
> digits.

Sure.  The problem is, if you start tweaking the regexp it is easy to
come up with something that considers too many words a WikiWord, and
you'll end up having to escape lots of words.

> >> We have been using this for years now:
> >> 
> >> WIKI_NAME_REGEXP = 
> >> 
> >"(?<![[:alnum:]])[[:upper:]][[:alnum:]]*?[[:lower:]][[:alnum:]]*?[[:up
> >> per:]][[:alnum:]]*(?![[:alnum:]])";
> >> 
> >> Btw, the default is
> >> 
> >> WIKI_NAME_REGEXP = 
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:]]+){2,}(?![[:alnum:]])"
> >
> >config-dist.ini in CVS has these options:
> >http://phpwiki.cvs.sourceforge.net/phpwiki/phpwiki/config/confi
> >g-dist.ini?revision=1.83&view=markup
> >
> >; Perl regexp for WikiNames ("bumpy words"):
> >;   (?<!..) & (?!...) used instead of '\b' because \b matches 
> >'_' as well
> >; Allow digits: BumpyVersion132
> >;   WIKI_NAME_REGEXP = 
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:][:digit:]]+){2,}(?![[:
> >alnum:]])"
> >; Allow lower+digits+dots: BumpyVersion1.3.2
> >;   WIKI_NAME_REGEXP = 
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:][:digit:]\.]+){2,}(?![
> >[:alnum:]])"
> >; Default old behaviour, no digits as lowerchars.
> >;WIKI_NAME_REGEXP =
> >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:]]+){2,}(?![[:alnum:]])"
> 
> Great, it works !
> Thanks Reini and Campo.

Actually, the suggestions offered by Reini better match what you asked
for.  I use phpwiki mostly for documenting IT-related stuff, and as we
all know there are many acronyms used.  The traditional definition of
WikiWord will include anything containing an embedded acronym, like
for example DocBookXML2LaTeX (at least, I don't think it does).  The
alternative regexp I am now using will match any sequence of non-blank
non-punctuation that starts with a Capital letter and alternates
sufficiently between lower and uppercase.  This works pretty well.

Regards,
-- 
$_ = "Campo Weijerman [rfc822://nl.ibm.com/]" and tr-[:]/-<@>-d and print;