Re: [Phpwiki-talk] current CVS InlineParser problem fixed

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Reini Urban schrieb:
> I found now the problem with the current InlineParser, why it fails only 
> on sf.net:
> The problem is that the php at sf.net has less memory for regular 
> expressions than a typical php, both have an 8M memory_limit, but 
> somehow anchored pcre regex obviously allocate from somewhere else.

Problem on http://phpwiki.sf.net/demo/ fixed.

It was not the memory, it was an endless loop, caused by an empty 
definition of WIKI_NAME_REGEXP, which I fixed now in IniConfig.php.
Exactly this constant wasn't checked for its default setting.

Anyway the huge regexp string is now gone also, and the whole inline 
parsing is now a lot better, falling back to the previous hairy code 
only if two conflicting markups are found in the same block.

> The problem is RegexpSet::_match with the huge regexp string, which now 
> with the added Inline plugin markup overflow its limit.
> 
> The pattern is contructed from
>   $pat= "/ ( . $repeat ) ( (" . join(')|(', $regexps) . ") ) /Asx";
> The modifier A (ANCHORED) tells pcre to store the block, regexps is an 
> array of 10 rather complicated regex strings, and repeat starts from 
> "*?" to {nn} towards the end, so that the prematch gets longer and 
> longer, until nothing is found anymore and the final "$" regexps 
> matches. This ends the loop.
> 
> On sf.net we don't have an endless loop, we rather run out of memory, 
> because of the continued anchored matching of the same huge regexp, 
> until repeat gets large enough. The /A tells pcre to store the matching 
> block to notify match() which regexps actually matched, and to be able 
> to recurse into shorter substrings then.
> 
> I rewrote now that critical part to be somewhat slower, but to need much 
> less memory.
> We don't really need to string-join the regexps array together.
> It is sufficient to loop through all regexps until one balanced or 
> simple markup matches.
> The problem is that the longest substring should be favoured, so that it 
> recurses into matches, that's what /A is for.
> e.g. for "<small>*WikiWord*</small>" it has to match at first the 
> balanced <small> tag, than the *...* emphasis and at last the wikiword 
> inside.
> 
> The hugest partial regexp is the interwiki map which constructs 
> "(moniker1:|moniker2:|moniker3:|moniker4:|moniker5:|moniker6:|moniker7:|...)" 
-- 
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/