From: Campo W. <rf...@nl...> - 2007-10-01 12:52:04
|
On Mon, Oct 01, 2007 at 01:35:02PM +0200, Sabri LABBENE wrote: > Reini Urban wrote: > >Campo Weijerman schrieb: > >> On Fri, Sep 28, 2007 at 10:57:24AM +0200, Sabri LABBENE wrote: > >>> Hi all, > >>> I'm using phpwiki-1.3.12 and I'm trying to make it > >recognize CamelCase words with numbers inside as wikiwords, fo example: > >>> - CamelCase2 -> is a wikiword > >>> - Camel2Case -> is also a wiki word > >>> - 2CamelCase -> is also a wiki word > >>> > >>> I think there should be a regular expression somewhere in > >the code that decides if a word is a wikiword. Can someone > >teel where to find it ? If there will some side effects > >whenever numbers are considered into wikiwords ? > >> > >> Hi, > >> > >> We had a similar requirement and solved it back with phpwiki 1.3.3 by > >> changing the definition of $WikiNameRegexp in index.php > >> > >> With more recent releases there is WIKI_NAME_REGEXP in > >> config/config.ini > >> > >> It takes some tweaking to arrive at the right compromise between the > >> regex being too wide or too narrow. I think too wide is worse than > >> too narrow: you can always force linking to a page by putting the name > >> in [brackets], which is less painful than having to escape every other > >> word on a page... > > I tried the regexp and it keeps catching CamelCase words without digits > inside. I don't understand why you need to escape some other words in your > page. May be you have as requirement to only link pagenames that contains > digits. Sure. The problem is, if you start tweaking the regexp it is easy to come up with something that considers too many words a WikiWord, and you'll end up having to escape lots of words. > >> We have been using this for years now: > >> > >> WIKI_NAME_REGEXP = > >> > >"(?<![[:alnum:]])[[:upper:]][[:alnum:]]*?[[:lower:]][[:alnum:]]*?[[:up > >> per:]][[:alnum:]]*(?![[:alnum:]])"; > >> > >> Btw, the default is > >> > >> WIKI_NAME_REGEXP = > >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:]]+){2,}(?![[:alnum:]])" > > > >config-dist.ini in CVS has these options: > >http://phpwiki.cvs.sourceforge.net/phpwiki/phpwiki/config/confi > >g-dist.ini?revision=1.83&view=markup > > > >; Perl regexp for WikiNames ("bumpy words"): > >; (?<!..) & (?!...) used instead of '\b' because \b matches > >'_' as well > >; Allow digits: BumpyVersion132 > >; WIKI_NAME_REGEXP = > >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:][:digit:]]+){2,}(?![[: > >alnum:]])" > >; Allow lower+digits+dots: BumpyVersion1.3.2 > >; WIKI_NAME_REGEXP = > >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:][:digit:]\.]+){2,}(?![ > >[:alnum:]])" > >; Default old behaviour, no digits as lowerchars. > >;WIKI_NAME_REGEXP = > >"(?<![[:alnum:]])(?:[[:upper:]][[:lower:]]+){2,}(?![[:alnum:]])" > > Great, it works ! > Thanks Reini and Campo. Actually, the suggestions offered by Reini better match what you asked for. I use phpwiki mostly for documenting IT-related stuff, and as we all know there are many acronyms used. The traditional definition of WikiWord will include anything containing an embedded acronym, like for example DocBookXML2LaTeX (at least, I don't think it does). The alternative regexp I am now using will match any sequence of non-blank non-punctuation that starts with a Capital letter and alternates sufficiently between lower and uppercase. This works pretty well. Regards, -- $_ = "Campo Weijerman [rfc822://nl.ibm.com/]" and tr-[:]/-<@>-d and print; |