Menu

#1283 Template exception overworks in replace.py

closed-duplicate
xqt
None
5
2011-05-12
2011-01-15
Bináris
No

I correct spelling mistakes with replace.py, and use exception:
'exceptions': {
'inside-tags': [
'hyperlink',
'template',
],
etc. as shown at http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py/it

This exception excludes a lot of text that should be replaced! After a long investigation I suspect that the problem may exist when the template is complicated, e. g. the article begins with an infobox. The bot probably thinks to be inside of the template when it is already closed.

Examples:
In the last sentence of section http://hu.wikipedia.org/w/index.php?title=Nagyv%C3%A1rad&oldid=9085449#N.C3.A9pess.C3.A9ge the word "telepitettek" was not found. The article begins with an infobox.
In the middle of section http://hu.wikipedia.org/w/index.php?title=Opera_%28sz%C3%ADnm%C5%B1%29&oldid=8961154#Az_angol_nyelv.C5.B1_opera the word "Szenitávnéji" was not found. The article has no infobox, but the text is preceeded by some templates with parameters, one of them at the very beginning.
In section http://hu.wikipedia.org/w/index.php?title=Tennessee&oldid=9028125#Megy.C3.A9k the word "alapitási" was not found. The article begins with an infobox.

But:
The bot made the replacement here: http://hu.wikipedia.org/w/index.php?title=Mozilla&diff=9106942&oldid=8920815
This is also preceeded by some templates, which have parameters, but the one at the beginning of the article has no parameters. Does this make the difference?

All the above mentioned instances were found by the bot when I commented the word "template" out of the exceptions.
Not clear whether the bug is in replace.py or pagegenerators.

Discussion

  • Bináris

    Bináris - 2011-01-15

    Hurray, I have caught it! The bugfix is easy. In pywikibot/textlib.py, line 83, the outer brace is greedy. Changing
    'template': re.compile(r'(?s){{(({{.*?}})|.)*}}'),
    to
    'template': re.compile(r'(?s){{(({{.*?}})|.)*?}}'),
    solved the problem for me.

     
  • Bináris

    Bináris - 2011-02-09

    Would anyone please correct this bug? One character only. TIA

     
  • Merlijn S. van Deen

    Well... this is why we desperately need unit tests. In a quick response - I'm afraid the suggested fix' will break detection of nested templates. Or rather, a template like
    {{ blah | {{ yakk }} | more stuff }} will not be detected as a nested template, but as {{ blah | {{ yakk }}.
    Not a 100% sure on this, but this should be tested before applying the suggested fix.

     
  • Bináris

    Bináris - 2011-02-09

    At least a comment, thank you for dealing with the problem.
    What I know, in the present form it definitely works wrongly.

     
  • xqt

    xqt - 2011-05-12

    duplicate to bug #2819291

     
  • xqt

    xqt - 2011-05-12
    • assigned_to: nobody --> xqt
    • status: open --> closed-duplicate
     
  • xqt

    xqt - 2013-04-04

    fixed with r11333

     

Log in to post a comment.