Could you give us a traceback or further informations about that bug? The bot uses the monthnames coming from mediaWiki messages and I don't know what is the significance of the locale setting. Could you try to run the bot without --locale=tr setting?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sure. There is no traceback error for me to provide though since the code does work, it just ignores some threads.
Run1: archivebot.py -l turkish Archive/config
Fetching template transclusions...
Getting references to [[Sablon:Archive/config]] via API...
Processing [[tr:Kullanici mesaj:??????]]
3 Threads found on [[tr:Kullanici mesaj:??????]]
Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]]
Processing 3 threads
There are only 0 Threads. Skipping
Run2: archivebot.py Archive/config
Fetching template transclusions...
Getting references to [[Sablon:Archive/config]] via API...
Processing [[tr:Kullanici mesaj:??????]]
3 Threads found on [[tr:Kullanici mesaj:??????]]
Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]]
Processing 3 threads
There are only 0 Threads. Skipping
Oh when I ran the bot initially without -l turkish it ignored all threads. Since it already archived 3 of the 6 initial threads it is still reporting 0 Threads as it cannot see the ones with "Mayıs" month name.
Last edit: Anonymous 2014-12-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Pywikipedia [http] trunk/pywikipedia (r10432, 2012/06/30, 15:47:55)
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Command line I used was archivebot.py -l turkish Archive/config
Could you give us a traceback or further informations about that bug? The bot uses the monthnames coming from mediaWiki messages and I don't know what is the significance of the locale setting. Could you try to run the bot without --locale=tr setting?
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Sure. There is no traceback error for me to provide though since the code does work, it just ignores some threads.
Run1: archivebot.py -l turkish Archive/config
Fetching template transclusions...
Getting references to [[Sablon:Archive/config]] via API...
Processing [[tr:Kullanici mesaj:??????]]
3 Threads found on [[tr:Kullanici mesaj:??????]]
Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]]
Processing 3 threads
There are only 0 Threads. Skipping
Run2: archivebot.py Archive/config
Fetching template transclusions...
Getting references to [[Sablon:Archive/config]] via API...
Processing [[tr:Kullanici mesaj:??????]]
3 Threads found on [[tr:Kullanici mesaj:??????]]
Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]]
Processing 3 threads
There are only 0 Threads. Skipping
Note the Turkish character ı is displayed as i in the CMD window (I run code using Windows). The ???? relate to my user talk page http://tr.wikipedia.org/wiki/Kullan%C4%B1c%C4%B1_mesaj:%E3%81%A8%E3%81%82%E3%82%8B%E7%99%BD%E3%81%84%E7%8C%AB but CMD cannot display unicode.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Oh when I ran the bot initially without -l turkish it ignored all threads. Since it already archived 3 of the 6 initial threads it is still reporting 0 Threads as it cannot see the ones with "Mayıs" month name.
Last edit: Anonymous 2014-12-04
Looked into this a bit.
I've managed to isolate the problem to ~line 237 where all the txt2timestamp functions are. It seems that all of them are raising ValueErrors.
Tried this:
import unicodedata
@line 237
_TM = ''.join((c for c in unicodedata.normalize('NFD', TM.group(0)) if unicodedata.category(c) != 'Mn'))
and then call txt2timestamp with _TM instead of TM.group(0)
https://gerrit.wikimedia.org/r/#/c/84204/