The interwiki bot removes valid links to pl and ru. I
have tried changing the encoding of the links, but no
matter how they are encoded, it removes them.
======Post-processing [[en:Bioinformatics]]======
==status==
Changes to be made: Removing:ru
- [[de:Bioinformatik]] [[es:Bioinform tica]]
+
+
+
+
+
+
+
+ [[Category:Computer science]]
+ [[de:Bioinformatik]]
+ [[es:Bioinform tica]]
- [[pl:Bioinformatyka]]
[[ru:Биоинформатика]]
+ [[pl:Bioinformatyka]]
- [[Category:Computer science]]
NOTE: Replace [[en:Bioinformatics]]
ERROR: removing: ru
NOTE: Performing a recursive query first to save time....
NOTE: Nothing left to do 2
NOTE: Updating live wikipedia...
Sleeping for 5.4 seconds
Changing page en:Bioinformatics
As you can see with the pl link, it only does this when
special characters are in the title.
Logged In: YES
user_id=880694
First, the pl: link was not removed
Second, the ru: link is invalid.
Logged In: YES
user_id=843018
The Russian link is removed because the page does not exist.
Unless you find a better example, I intend to close this bug
report.
Logged In: YES
user_id=1087213
The links removed at
http://en.wikipedia.org/w/wiki.phtml?title=Book_of_Veles&diff=4745835&oldid=4745233
and
http://en.wikipedia.org/w/wiki.phtml?title=Boleslaus_III_of_Poland&diff=4748276&oldid=4745136
were valid.
Logged In: YES
user_id=843018
Yes, I see... Those are rather problematic. If I see it
correctly, what is going on is that (taking the example of
Boleslaus III of Poland since the other page has already
been corrected) the link to Polish contains "ł" to refer to
"ł". Apparently taking the Latin-1 encoding from "ł" and
reading that as Unicode gives "ł".
I would like to ask help from Brion or someone else familiar
with Wikipedia's code - how is Wikipedia able to find these
links, and how can we copy this in the bot without having to
make two guesses each time this occurs?
Logged In: YES
user_id=47476
Should be fixed now, it was broken by a utf-8ification.
This bug only affected interwiki links that were
url-encoded. There used to be a lot of these, but since the
robot never uses % encoding in the pages, many of these
style links have already been removed in the past thereby
making this bug less visible.