Menu

#594 interwiki crashes with: regular expression code size limit e

closed-fixed
nobody
None
5
2008-01-16
2008-01-14
Anonymous
No

bot crashes with: regular expression code size limit exceeded error on many pages

Error report:

Updating links on page [[pl:10,000 Maniacs]].
No changes needed
Getting 37 pages from wikipedia:ru...
Dump pl (wikipedia) saved
Traceback (most recent call last):
File "C:\dw\pywikipedia\interwiki.py", line 1606, in <module>
bot.run()
File "C:\dw\pywikipedia\interwiki.py", line 1381, in run
self.queryStep()
File "C:\dw\pywikipedia\interwiki.py", line 1355, in queryStep
self.oneQuery()
File "C:\dw\pywikipedia\interwiki.py", line 1351, in oneQuery
subject.workDone(self)
File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone
elif page.isEmpty() and not page.isCategory():
File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty
txt = removeLanguageLinks(txt)
File "C:\dw\pywikipedia\wikipedia.py", line 3054, in removeLanguageLinks
% languageR, re.IGNORECASE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 231, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Python25\lib\sre_compile.py", line 530, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded

Discussion

  • Rotem Liss

    Rotem Liss - 2008-01-15

    Logged In: YES
    user_id=1327030
    Originator: NO

    I can't reproduce the bug. What is the exact command?

     
  • masti

    masti - 2008-01-15

    Logged In: YES
    user_id=1974561
    Originator: NO

    It looks that the error only occurs when interwiki is run with -autonomous switch. For example runnig this command in pl.wiki:
    interwiki.py -start:100BASE-FX -autonomous
    casues bot to run thru multiple pages giving in the end following error:

    [[100 dni Napoleona]]: [[ja:????]] gives new interwiki [[he:m'h hymym]]
    [[101 (liczba)]]: [[ja:101]] gives new interwiki [[ms:101 (nombor)]]
    ======Post-processing [[pl:10164 Akusekijima]]======
    Updating links on page [[pl:10164 Akusekijima]].
    No changes needed
    ======Post-processing [[pl:10163 Onomichi]]======
    Updating links on page [[pl:10163 Onomichi]].
    No changes needed
    ======Post-processing [[pl:10157 Asagiri]]======
    Updating links on page [[pl:10157 Asagiri]].
    No changes needed
    ======Post-processing [[pl:10143 Kamogawa]]======
    Updating links on page [[pl:10143 Kamogawa]].
    No changes needed
    ======Post-processing [[pl:10142 Sakka]]======
    Updating links on page [[pl:10142 Sakka]].
    No changes needed
    ======Post-processing [[pl:10117 Tanikawa]]======
    Updating links on page [[pl:10117 Tanikawa]].
    No changes needed
    Getting 23 pages from wikipedia:id...
    NOTE: [[id:100 (buku)]] is redirect to [[id:The 100]]
    Getting 21 pages from wikipedia:uk...
    Getting 18 pages from wikipedia:lt...
    Getting 16 pages from wikipedia:fr...
    Sleeping for 3.2 seconds, 2008-01-15 19:50:31
    Getting 15 pages from wikipedia:es...
    ======Post-processing [[pl:100BASE-FX]]======
    ERROR: Found link to [[pl:Fast Ethernet]]
    [[en:Fast Ethernet]]
    [[es:Fast Ethernet]]
    [[fr:100BASE-T4]]
    [[id:Fast Ethernet]]
    [[it:Fast Ethernet]]
    [[ja:100megabitto ihsanetto]]
    [[lt:Fast Ethernet]]
    [[pt:Fast Ethernet]]
    [[uk:Fast Ethernet]]
    ERROR: Found more than one link for wikipedia:es
    ERROR: Found more than one link for wikipedia:fr
    ======Aborted processing [[pl:100BASE-FX]]======
    Getting 42 pages from wikipedia:de...
    Getting 31 pages from wikipedia:sv...
    Getting 28 pages from wikipedia:nl...
    Dump pl (wikipedia) saved
    Traceback (most recent call last):
    File "C:\dw\pywikipedia\interwiki.py", line 1609, in <module>
    bot.run()
    File "C:\dw\pywikipedia\interwiki.py", line 1384, in run
    self.queryStep()
    File "C:\dw\pywikipedia\interwiki.py", line 1358, in queryStep
    self.oneQuery()
    File "C:\dw\pywikipedia\interwiki.py", line 1354, in oneQuery
    subject.workDone(self)
    File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone
    elif page.isEmpty() and not page.isCategory():
    File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty
    txt = removeLanguageLinks(txt)
    File "C:\dw\pywikipedia\wikipedia.py", line 3054, in removeLanguageLinks
    % languageR, re.IGNORECASE)
    File "C:\Python25\lib\re.py", line 180, in compile
    return _compile(pattern, flags)
    File "C:\Python25\lib\re.py", line 231, in _compile
    p = sre_compile.compile(pattern, flags)
    File "C:\Python25\lib\sre_compile.py", line 530, in compile
    groupindex, indexgroup
    OverflowError: regular expression code size limit exceeded

    above test done with r4893.

     
  • André Malafaya Baptista

    Logged In: YES
    user_id=1037345
    Originator: NO

    As of r4893, I believe changing wikipedia.py line 2810 to:
    'source': re.compile(r'(?is)<source>.*?</source>'),
    would solve the problem.
    There was an unclosed '<' after 'source'.
    I'm not absolutely sure about this as testing this problem doesn't seem easy. It also occurred to me but I can't precise under which conditions.

     
  • André Malafaya Baptista

    • status: open --> open-remind
     
  • André Malafaya Baptista

    Logged In: YES
    user_id=1037345
    Originator: NO

    In fact, it was line 2836.
    I just commited those changes to SVN (r4894).
    This bug should be considered fixed if it does not re-occur.

     
  • masti

    masti - 2008-01-15

    Logged In: YES
    user_id=1974561
    Originator: NO

    updated to r4894
    unfortunately I can still reproduce same error.

     
  • André Malafaya Baptista

    Logged In: YES
    user_id=1037345
    Originator: NO

    yep, so can I just now :(

     
  • Francesco Cosoleto

    Logged In: YES
    user_id=181280
    Originator: NO

    This bug is related to removeLanguageLinks function in wikipedia module: languageR variable increase his length for each call until produce a overflow error in re module.

     
  • Francesco Cosoleto

    Logged In: YES
    user_id=181280
    Originator: NO

    Fixed in r4896.

     
  • Francesco Cosoleto

    • status: open-remind --> closed-fixed
     

Log in to post a comment.

Auth0 Logo