From: Temlakos <tem...@gm...> - 2008-07-11 14:07:51
|
Everyone: Several weeks ago, I finally figured out how to install SMW's maintenance scripts as symlinks in my server's wiki maintenance subdirectory so that I could run them. But when I ran SMW_refreshData.php, I got multiple warnings saying that a call to preg_match failed on an overly long regular expression. The implicated file was my custom "historical date" file. And after multiple such "warnings," the execution of the file finally ended with one word: "Killed." Markus, I believe you have a copy of the historical-date file (SMW_DV_HxDate.php). The longest regular expression (regex) in it is $screenpat, and my file calls preg_match with that string in order to screen out date texts that are not in a form that the script would recognize. I do that to ensure that any annotated date that passed that test would be sure to represent a valid date, so long as month names were spelled correctly, etc. But if a long regex is creating a problem, then I must solve it today, before I update my wiki. Otherwise, SMW_refreshData.php will kill itself again, and it will leave the job unfinished. How long can a regex be and not cause a problem with the execution of SMW_refreshData? The regex strings in the file are $screenpat and $format1, $format2, $format3, $format4, and $format5. These strings have 219, 83, 89, 84, 85, and 55 characters, respectively. Any assistance would be appreciated. Furthermore, if anyone else hopes to use the Historical Date script, then I can't have it creating a problem every time someone wants to run SMW_refreshData.php. Temlakos |
From: Markus K. <ma...@se...> - 2008-07-11 19:54:47
|
On Freitag, 11. Juli 2008, Temlakos wrote: > Everyone: > > Several weeks ago, I finally figured out how to install SMW's > maintenance scripts as symlinks in my server's wiki maintenance > subdirectory so that I could run them. > > But when I ran SMW_refreshData.php, I got multiple warnings saying that > a call to preg_match failed on an overly long regular expression. The > implicated file was my custom "historical date" file. And after multiple > such "warnings," the execution of the file finally ended with one word: > "Killed." > > Markus, I believe you have a copy of the historical-date file > (SMW_DV_HxDate.php). The longest regular expression (regex) in it is > $screenpat, and my file calls preg_match with that string in order to > screen out date texts that are not in a form that the script would > recognize. I do that to ensure that any annotated date that passed that > test would be sure to represent a valid date, so long as month names > were spelled correctly, etc. > > But if a long regex is creating a problem, then I must solve it today, > before I update my wiki. Otherwise, SMW_refreshData.php will kill itself > again, and it will leave the job unfinished. > > How long can a regex be and not cause a problem with the execution of > SMW_refreshData? > > The regex strings in the file are $screenpat and $format1, $format2, > $format3, $format4, and $format5. > > These strings have 219, 83, 89, 84, 85, and 55 characters, respectively. > > Any assistance would be appreciated. Furthermore, if anyone else hopes > to use the Historical Date script, then I can't have it creating a > problem every time someone wants to run SMW_refreshData.php. I never encountered a similar problem. We also have long regexps in SMW, and the lengths you gave do not sound impressingly long to me either. Are all regexps static and do not use any variables of possibly unexpected content? Can a websearch help you on your warning/error messages? Of course, "Killed" sounds like an emergency break due to the shortage of some resource (such as memory). Does the problem occur when you start on that very page (using -v and then -s <id> as options fo refreshData)? SMW_refreshData.php as such does not do many things that would be different from normal page editing, though it calls functions in a slightly different program context (I just fixed some bugs in SemanticCalendar, which relied on the global $wgTitle that is not ensured to contain anythin during parsing in general and refreshData in particular). Besides these things, it would normally use the same code as during writing a page. Of course, the php command on a server may have different behaviour than the php module in Apache (and the admissible length of regexps appears to be rather specific to PHP). Anyway, you can use the option -v to see which page id causes the problem, and then use the option -s <id+1> to continue after that page. This way, you skip one page but can still refresh the rest. Use the MediaWiki API or the database to find out which page causes the problem, and check whether it works normally when read/edited on the web. Markus P.S. It seems that this is a discussion for the developers' list ... > > Temlakos > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Semediawiki-user mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-user -- Markus Krötzsch Semantic MediaWiki http://semantic-mediawiki.org http://korrekt.org ma...@se... |
From: Temlakos <tem...@gm...> - 2008-07-12 03:07:18
|
Markus Krötzsch wrote: > On Freitag, 11. Juli 2008, Temlakos wrote: > >> Everyone: >> >> Several weeks ago, I finally figured out how to install SMW's >> maintenance scripts as symlinks in my server's wiki maintenance >> subdirectory so that I could run them. >> >> But when I ran SMW_refreshData.php, I got multiple warnings saying that >> a call to preg_match failed on an overly long regular expression. The >> implicated file was my custom "historical date" file. And after multiple >> such "warnings," the execution of the file finally ended with one word: >> "Killed." >> >> Markus, I believe you have a copy of the historical-date file >> (SMW_DV_HxDate.php). The longest regular expression (regex) in it is >> $screenpat, and my file calls preg_match with that string in order to >> screen out date texts that are not in a form that the script would >> recognize. I do that to ensure that any annotated date that passed that >> test would be sure to represent a valid date, so long as month names >> were spelled correctly, etc. >> >> But if a long regex is creating a problem, then I must solve it today, >> before I update my wiki. Otherwise, SMW_refreshData.php will kill itself >> again, and it will leave the job unfinished. >> >> How long can a regex be and not cause a problem with the execution of >> SMW_refreshData? >> >> The regex strings in the file are $screenpat and $format1, $format2, >> $format3, $format4, and $format5. >> >> These strings have 219, 83, 89, 84, 85, and 55 characters, respectively. >> >> Any assistance would be appreciated. Furthermore, if anyone else hopes >> to use the Historical Date script, then I can't have it creating a >> problem every time someone wants to run SMW_refreshData.php. >> > > I never encountered a similar problem. We also have long regexps in SMW, and > the lengths you gave do not sound impressingly long to me either. Are all > regexps static and do not use any variables of possibly unexpected content? > Can a websearch help you on your warning/error messages? > > Of course, "Killed" sounds like an emergency break due to the shortage of some > resource (such as memory). Does the problem occur when you start on that very > page (using -v and then -s <id> as options fo refreshData)? > > SMW_refreshData.php as such does not do many things that would be different > from normal page editing, though it calls functions in a slightly different > program context (I just fixed some bugs in SemanticCalendar, which relied on > the global $wgTitle that is not ensured to contain anythin during parsing in > general and refreshData in particular). Besides these things, it would > normally use the same code as during writing a page. Of course, the php > command on a server may have different behaviour than the php module in > Apache (and the admissible length of regexps appears to be rather specific to > PHP). > > Anyway, you can use the option -v to see which page id causes the problem, and > then use the option -s <id+1> to continue after that page. This way, you skip > one page but can still refresh the rest. Use the MediaWiki API or the > database to find out which page causes the problem, and check whether it > works normally when read/edited on the web. > > Markus > > > P.S. It seems that this is a discussion for the developers' list ... > Duly noted. I will now publish this to the development list as well, though the other users might want to see my answers. I have no insight on the problem of long regex strings. My regexes are static, first of all. The warning messages created such confusion and went by so fast that I didn't have a chance to see where the "kill" occurred before it happened. This might or might not be significant: I did not at first run refreshData and restrict it to refreshing type and property pages only. Instead I ran it on the entire database, using the -v option. That's why I saw all those warnings. Terry A. Hurlbut PS: Thank you for detailing the -s option. I did not at first see that on the commented documentation in the file. TAH |
From: Robert M. <mra...@gm...> - 2008-07-12 07:02:47
|
I too see the long preg-match regex error, even when just doing a runJobs.php to clear the workqueue. Perhaps it has something to do with Unicode pages with lots of semantic data (the only data connection I can see between Temlakos' wiki and mine). Please let me know how I can help us all track down the source of this glitch. -Robert On Fri, Jul 11, 2008 at 8:07 PM, Temlakos <tem...@gm...> wrote: > Markus Krötzsch wrote: > > On Freitag, 11. Juli 2008, Temlakos wrote: > > > >> Everyone: > >> > >> Several weeks ago, I finally figured out how to install SMW's > >> maintenance scripts as symlinks in my server's wiki maintenance > >> subdirectory so that I could run them. > >> > >> But when I ran SMW_refreshData.php, I got multiple warnings saying that > >> a call to preg_match failed on an overly long regular expression. The > >> implicated file was my custom "historical date" file. And after multiple > >> such "warnings," the execution of the file finally ended with one word: > >> "Killed." > >> > >> Markus, I believe you have a copy of the historical-date file > >> (SMW_DV_HxDate.php). The longest regular expression (regex) in it is > >> $screenpat, and my file calls preg_match with that string in order to > >> screen out date texts that are not in a form that the script would > >> recognize. I do that to ensure that any annotated date that passed that > >> test would be sure to represent a valid date, so long as month names > >> were spelled correctly, etc. > >> > >> But if a long regex is creating a problem, then I must solve it today, > >> before I update my wiki. Otherwise, SMW_refreshData.php will kill itself > >> again, and it will leave the job unfinished. > >> > >> How long can a regex be and not cause a problem with the execution of > >> SMW_refreshData? > >> > >> The regex strings in the file are $screenpat and $format1, $format2, > >> $format3, $format4, and $format5. > >> > >> These strings have 219, 83, 89, 84, 85, and 55 characters, respectively. > >> > >> Any assistance would be appreciated. Furthermore, if anyone else hopes > >> to use the Historical Date script, then I can't have it creating a > >> problem every time someone wants to run SMW_refreshData.php. > >> > > > > I never encountered a similar problem. We also have long regexps in SMW, > and > > the lengths you gave do not sound impressingly long to me either. Are all > > regexps static and do not use any variables of possibly unexpected > content? > > Can a websearch help you on your warning/error messages? > > > > Of course, "Killed" sounds like an emergency break due to the shortage of > some > > resource (such as memory). Does the problem occur when you start on that > very > > page (using -v and then -s <id> as options fo refreshData)? > > > > SMW_refreshData.php as such does not do many things that would be > different > > from normal page editing, though it calls functions in a slightly > different > > program context (I just fixed some bugs in SemanticCalendar, which relied > on > > the global $wgTitle that is not ensured to contain anythin during parsing > in > > general and refreshData in particular). Besides these things, it would > > normally use the same code as during writing a page. Of course, the php > > command on a server may have different behaviour than the php module in > > Apache (and the admissible length of regexps appears to be rather > specific to > > PHP). > > > > Anyway, you can use the option -v to see which page id causes the > problem, and > > then use the option -s <id+1> to continue after that page. This way, you > skip > > one page but can still refresh the rest. Use the MediaWiki API or the > > database to find out which page causes the problem, and check whether it > > works normally when read/edited on the web. > > > > Markus > > > > > > P.S. It seems that this is a discussion for the developers' list ... > > > Duly noted. I will now publish this to the development list as well, > though the other users might want to see my answers. > > I have no insight on the problem of long regex strings. My regexes are > static, first of all. The warning messages created such confusion and > went by so fast that I didn't have a chance to see where the "kill" > occurred before it happened. This might or might not be significant: I > did not at first run refreshData and restrict it to refreshing type and > property pages only. Instead I ran it on the entire database, using the > -v option. That's why I saw all those warnings. > > Terry A. Hurlbut > > PS: Thank you for detailing the -s option. I did not at first see that > on the commented documentation in the file. > > TAH > > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > -- Roses are red,Violets are blue,I'm schizophrenic,and so am I. |
From: Markus K. <ma...@se...> - 2008-07-12 15:13:08
|
On Samstag, 12. Juli 2008, Robert Murphy wrote: > I too see the long preg-match regex error, even when just doing a > runJobs.php to clear the workqueue. Perhaps it has something to do with > Unicode pages with lots of semantic data (the only data connection I can > see between Temlakos' wiki and mine). Please let me know how I can help us > all track down the source of this glitch. Okay, then maybe the standard SMW regexps are the cause (at least they are dynamic, e.g. they sometimes include language-specific strings). Can you make a "minimal" test page that creates the error, and publish it on some page at sandbox.semantic-mediawiki.org? I will then try to reproduce and debug. Can someone give the exact error/warning messages that occur? Language settings and PHP version on the affected sites may also help. Temlakos, given that the long regexp warning occurs elsewhere without the "Killed", maybe the two things are not really related after all. I am curios to see whether the -s option can help you to relate the whole problem to a single page, and maybe to some specific part of it. -- Markus > > -Robert > > On Fri, Jul 11, 2008 at 8:07 PM, Temlakos <tem...@gm...> wrote: > > Markus Krötzsch wrote: > > > On Freitag, 11. Juli 2008, Temlakos wrote: > > >> Everyone: > > >> > > >> Several weeks ago, I finally figured out how to install SMW's > > >> maintenance scripts as symlinks in my server's wiki maintenance > > >> subdirectory so that I could run them. > > >> > > >> But when I ran SMW_refreshData.php, I got multiple warnings saying > > >> that a call to preg_match failed on an overly long regular expression. > > >> The implicated file was my custom "historical date" file. And after > > >> multiple such "warnings," the execution of the file finally ended with > > >> one word: "Killed." > > >> > > >> Markus, I believe you have a copy of the historical-date file > > >> (SMW_DV_HxDate.php). The longest regular expression (regex) in it is > > >> $screenpat, and my file calls preg_match with that string in order to > > >> screen out date texts that are not in a form that the script would > > >> recognize. I do that to ensure that any annotated date that passed > > >> that test would be sure to represent a valid date, so long as month > > >> names were spelled correctly, etc. > > >> > > >> But if a long regex is creating a problem, then I must solve it today, > > >> before I update my wiki. Otherwise, SMW_refreshData.php will kill > > >> itself again, and it will leave the job unfinished. > > >> > > >> How long can a regex be and not cause a problem with the execution of > > >> SMW_refreshData? > > >> > > >> The regex strings in the file are $screenpat and $format1, $format2, > > >> $format3, $format4, and $format5. > > >> > > >> These strings have 219, 83, 89, 84, 85, and 55 characters, > > >> respectively. > > >> > > >> Any assistance would be appreciated. Furthermore, if anyone else hopes > > >> to use the Historical Date script, then I can't have it creating a > > >> problem every time someone wants to run SMW_refreshData.php. > > > > > > I never encountered a similar problem. We also have long regexps in > > > SMW, > > > > and > > > > > the lengths you gave do not sound impressingly long to me either. Are > > > all regexps static and do not use any variables of possibly unexpected > > > > content? > > > > > Can a websearch help you on your warning/error messages? > > > > > > Of course, "Killed" sounds like an emergency break due to the shortage > > > of > > > > some > > > > > resource (such as memory). Does the problem occur when you start on > > > that > > > > very > > > > > page (using -v and then -s <id> as options fo refreshData)? > > > > > > SMW_refreshData.php as such does not do many things that would be > > > > different > > > > > from normal page editing, though it calls functions in a slightly > > > > different > > > > > program context (I just fixed some bugs in SemanticCalendar, which > > > relied > > > > on > > > > > the global $wgTitle that is not ensured to contain anythin during > > > parsing > > > > in > > > > > general and refreshData in particular). Besides these things, it would > > > normally use the same code as during writing a page. Of course, the php > > > command on a server may have different behaviour than the php module in > > > Apache (and the admissible length of regexps appears to be rather > > > > specific to > > > > > PHP). > > > > > > Anyway, you can use the option -v to see which page id causes the > > > > problem, and > > > > > then use the option -s <id+1> to continue after that page. This way, > > > you > > > > skip > > > > > one page but can still refresh the rest. Use the MediaWiki API or the > > > database to find out which page causes the problem, and check whether > > > it works normally when read/edited on the web. > > > > > > Markus > > > > > > > > > P.S. It seems that this is a discussion for the developers' list ... > > > > Duly noted. I will now publish this to the development list as well, > > though the other users might want to see my answers. > > > > I have no insight on the problem of long regex strings. My regexes are > > static, first of all. The warning messages created such confusion and > > went by so fast that I didn't have a chance to see where the "kill" > > occurred before it happened. This might or might not be significant: I > > did not at first run refreshData and restrict it to refreshing type and > > property pages only. Instead I ran it on the entire database, using the > > -v option. That's why I saw all those warnings. > > > > Terry A. Hurlbut > > > > PS: Thank you for detailing the -s option. I did not at first see that > > on the commented documentation in the file. > > > > TAH > > > > > > ------------------------------------------------------------------------- > > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > > Studies have shown that voting for your favorite open source project, > > along with a healthy diet, reduces your potential for chronic lameness > > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > > _______________________________________________ > > Semediawiki-devel mailing list > > Sem...@li... > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Semantic MediaWiki http://semantic-mediawiki.org http://korrekt.org ma...@se... |