I too see the long preg-match regex error, even when just doing a runJobs.php to clear the workqueue.  Perhaps it has something to do with Unicode pages with lots of semantic data (the only data connection I can see between Temlakos' wiki and mine).  Please let me know how I can help us all track down the source of this glitch.


On Fri, Jul 11, 2008 at 8:07 PM, Temlakos <temlakos@gmail.com> wrote:
Markus Krötzsch wrote:
> On Freitag, 11. Juli 2008, Temlakos wrote:
>> Everyone:
>> Several weeks ago, I finally figured out how to install SMW's
>> maintenance scripts as symlinks in my server's wiki maintenance
>> subdirectory so that I could run them.
>> But when I ran SMW_refreshData.php, I got multiple warnings saying that
>> a call to preg_match failed on an overly long regular expression. The
>> implicated file was my custom "historical date" file. And after multiple
>> such "warnings," the execution of the file finally ended with one word:
>> "Killed."
>> Markus, I believe you have a copy of the historical-date file
>> (SMW_DV_HxDate.php). The longest regular expression (regex) in it is
>> $screenpat, and my file calls preg_match with that string in order to
>> screen out date texts that are not in a form that the script would
>> recognize. I do that to ensure that any annotated date that passed that
>> test would be sure to represent a valid date, so long as month names
>> were spelled correctly, etc.
>> But if a long regex is creating a problem, then I must solve it today,
>> before I update my wiki. Otherwise, SMW_refreshData.php will kill itself
>> again, and it will leave the job unfinished.
>> How long can a regex be and not cause a problem with the execution of
>> SMW_refreshData?
>> The regex strings in the file are $screenpat and $format1, $format2,
>> $format3, $format4, and $format5.
>> These strings have 219, 83, 89, 84, 85, and 55 characters, respectively.
>> Any assistance would be appreciated. Furthermore, if anyone else hopes
>> to use the Historical Date script, then I can't have it creating a
>> problem every time someone wants to run SMW_refreshData.php.
> I never encountered a similar problem. We also have long regexps in SMW, and
> the lengths you gave do not sound impressingly long to me either. Are all
> regexps static and do not use any variables of possibly unexpected content?
> Can a websearch help you on your warning/error messages?
> Of course, "Killed" sounds like an emergency break due to the shortage of some
> resource (such as memory). Does the problem occur when you start on that very
> page (using -v and then -s <id> as options fo refreshData)?
> SMW_refreshData.php as such does not do many things that would be different
> from normal page editing, though it calls functions in a slightly different
> program context (I just fixed some bugs in SemanticCalendar, which relied on
> the global $wgTitle that is not ensured to contain anythin during parsing in
> general and refreshData in particular). Besides these things, it would
> normally use the same code as during writing a page. Of course, the php
> command on a server may have different behaviour than the php module in
> Apache (and the admissible length of regexps appears to be rather specific to
> PHP).
> Anyway, you can use the option -v to see which page id causes the problem, and
> then use the option -s <id+1> to continue after that page. This way, you skip
> one page but can still refresh the rest. Use the MediaWiki API or the
> database to find out which page causes the problem, and check whether it
> works normally when read/edited on the web.
> Markus
> P.S. It seems that this is a discussion for the developers' list ...
Duly noted. I will now publish this to the development list as well,
though the other users might want to see my answers.

I have no insight on the problem of long regex strings. My regexes are
static, first of all. The warning messages created such confusion and
went by so fast that I didn't have a chance to see where the "kill"
occurred before it happened. This might or might not be significant: I
did not at first run refreshData and restrict it to refreshing type and
property pages only. Instead I ran it on the entire database, using the
-v option. That's why I saw all those warnings.

Terry A. Hurlbut

PS: Thank you for detailing the -s option. I did not at first see that
on the commented documentation in the file.


Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
Semediawiki-devel mailing list

Roses are red,Violets are blue,I'm schizophrenic,and so am I.