From: Reini U. <ru...@x-...> - 2004-05-08 22:55:57
|
Reini Urban schrieb: > I found now the problem with the current InlineParser, why it fails only > on sf.net: > The problem is that the php at sf.net has less memory for regular > expressions than a typical php, both have an 8M memory_limit, but > somehow anchored pcre regex obviously allocate from somewhere else. Problem on http://phpwiki.sf.net/demo/ fixed. It was not the memory, it was an endless loop, caused by an empty definition of WIKI_NAME_REGEXP, which I fixed now in IniConfig.php. Exactly this constant wasn't checked for its default setting. Anyway the huge regexp string is now gone also, and the whole inline parsing is now a lot better, falling back to the previous hairy code only if two conflicting markups are found in the same block. > The problem is RegexpSet::_match with the huge regexp string, which now > with the added Inline plugin markup overflow its limit. > > The pattern is contructed from > $pat= "/ ( . $repeat ) ( (" . join(')|(', $regexps) . ") ) /Asx"; > The modifier A (ANCHORED) tells pcre to store the block, regexps is an > array of 10 rather complicated regex strings, and repeat starts from > "*?" to {nn} towards the end, so that the prematch gets longer and > longer, until nothing is found anymore and the final "$" regexps > matches. This ends the loop. > > On sf.net we don't have an endless loop, we rather run out of memory, > because of the continued anchored matching of the same huge regexp, > until repeat gets large enough. The /A tells pcre to store the matching > block to notify match() which regexps actually matched, and to be able > to recurse into shorter substrings then. > > I rewrote now that critical part to be somewhat slower, but to need much > less memory. > We don't really need to string-join the regexps array together. > It is sufficient to loop through all regexps until one balanced or > simple markup matches. > The problem is that the longest substring should be favoured, so that it > recurses into matches, that's what /A is for. > e.g. for "<small>*WikiWord*</small>" it has to match at first the > balanced <small> tag, than the *...* emphasis and at last the wikiword > inside. > > The hugest partial regexp is the interwiki map which constructs > "(moniker1:|moniker2:|moniker3:|moniker4:|moniker5:|moniker6:|moniker7:|...)" -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/ |