pywikibot / Bugs / #1482 archivebot.py doesn't support unicode month names

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-06-30

Pywikipedia [http] trunk/pywikipedia (r10432, 2012/06/30, 15:47:55)
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok

Pywikipedia \[http\] trunk/pywikipedia \(r10432, 2012/06/30, 15:47:55\) Python 2.7.3 \(default, Apr 10 2012, 23:31:26\) \[MSC v.1500 32 bit \(Intel\)\] config-settings: use\_api = True use\_api\_login = True unicode test: ok

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-06-30

Command line I used was archivebot.py -l turkish Archive/config

Command line I used was archivebot.py -l turkish Archive/config

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

xqt - 2012-07-01

Could you give us a traceback or further informations about that bug? The bot uses the monthnames coming from mediaWiki messages and I don't know what is the significance of the locale setting. Could you try to run the bot without --locale=tr setting?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-07-01

Sure. There is no traceback error for me to provide though since the code does work, it just ignores some threads.

Run1: archivebot.py -l turkish Archive/config
Fetching template transclusions...
Getting references to [[Sablon:Archive/config]] via API...
Processing [[tr:Kullanici mesaj:??????]]
3 Threads found on [[tr:Kullanici mesaj:??????]]
Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]]
Processing 3 threads
There are only 0 Threads. Skipping

Run2: archivebot.py Archive/config
Fetching template transclusions...
Getting references to [[Sablon:Archive/config]] via API...
Processing [[tr:Kullanici mesaj:??????]]
3 Threads found on [[tr:Kullanici mesaj:??????]]
Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]]
Processing 3 threads
There are only 0 Threads. Skipping

Note the Turkish character ı is displayed as i in the CMD window (I run code using Windows). The ???? relate to my user talk page http://tr.wikipedia.org/wiki/Kullan%C4%B1c%C4%B1_mesaj:%E3%81%A8%E3%81%82%E3%82%8B%E7%99%BD%E3%81%84%E7%8C%AB but CMD cannot display unicode.

Sure. There is no traceback error for me to provide though since the code does work, it just ignores some threads. Run1: archivebot.py -l turkish Archive/config Fetching template transclusions... Getting references to \[\[Sablon:Archive/config\]\] via API... Processing \[\[tr:Kullanici mesaj:??????\]\] 3 Threads found on \[\[tr:Kullanici mesaj:??????\]\] Looking for: \{\{Archive/config\}\} in \[\[tr:Kullanici mesaj:??????\]\] Processing 3 threads There are only 0 Threads. Skipping Run2: archivebot.py Archive/config Fetching template transclusions... Getting references to \[\[Sablon:Archive/config\]\] via API... Processing \[\[tr:Kullanici mesaj:??????\]\] 3 Threads found on \[\[tr:Kullanici mesaj:??????\]\] Looking for: \{\{Archive/config\}\} in \[\[tr:Kullanici mesaj:??????\]\] Processing 3 threads There are only 0 Threads. Skipping Note the Turkish character ı is displayed as i in the CMD window \(I run code using Windows\). The ???? relate to my user talk page http://tr.wikipedia.org/wiki/Kullan%C4%B1c%C4%B1\_mesaj:%E3%81%A8%E3%81%82%E3%82%8B%E7%99%BD%E3%81%84%E7%8C%AB but CMD cannot display unicode.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2012-07-01

Oh when I ran the bot initially without -l turkish it ignored all threads. Since it already archived 3 of the 6 initial threads it is still reporting 0 Threads as it cannot see the ones with "Mayıs" month name.

Last edit: Anonymous 2014-12-04

Oh when I ran the bot initially without -l turkish it ignored all threads. Since it already archived 3 of the 6 initial threads it is still reporting 0 Threads as it cannot see the ones with "Mayıs" month name.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Legoktm - 2013-08-30

Looked into this a bit.

I've managed to isolate the problem to ~line 237 where all the txt2timestamp functions are. It seems that all of them are raising ValueErrors.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mpaa - 2013-09-09

Tried this:
import unicodedata

@line 237
_TM = ''.join((c for c in unicodedata.normalize('NFD', TM.group(0)) if unicodedata.category(c) != 'Mn'))

and then call txt2timestamp with _TM instead of TM.group(0)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mpaa - 2013-09-15

https://gerrit.wikimedia.org/r/#/c/84204/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

archivebot.py doesn't support unicode month names

Python MediaWiki Bot Framework

Group

Searches

Help

#1482 archivebot.py doesn't support unicode month names

Discussion