f.readline() should be used instead of re.findall().
I think we have a too populated list of options. Probably it's better to change -file to interpret line as page titles when no [[title]] is found, instead of adding yet another option.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
f.readline() should be used instead of re.findall().
I think we have a too populated list of options. Probably it's better to change -file to interpret line as page titles when no [[title]] is found, instead of adding yet another option.
patch
Thanks, I updated it accordingly. Feel free to combine the two.
A preliminary patch to enhance -file option as described is attached. Any comments?
It is better than two separate options. I tested it and it works with files in both formats.
looks good to me. I was pretty sure that some generator already allowed to use one title per line, but I can't find it anymore. (?)
I think I will commit this patch tomorrow with a slightly changed docstring.
Applied in r6839.
Thanks. For some encodings/systems, the initial title would need to be stripped of any BOM
( http://evanjones.ca/python-utf8.html#bom ).
This should resolve the problem:
http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/config.py?r1=6836&r2=6854