|
From: tchomby <tc...@go...> - 2009-02-26 19:17:06
|
Thanks.
I don't know what encoding the files are in. They're just files that I
created myself with a text editor, but often text has been copy-pasted
into them from various sources, e.g. websites, and that's were the
decoding problems occur. Presumably some non-utf8 characters get
pasted in.
I used the codecs.open trick when reading files and again when writing
the HTML from python-markdown, wherever I was using open I replaced it
with codecs.open. This works for most of my files but for some I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position
2551: unexpected code byte
The error happens when I call template.substitute in this function:
def render_template(template_filename,variables=None):
if variables is None: variables = {}
template_path = os.path.join('templates',template_filename)
template_text = codecs.open(template_path,mode='r',encoding='utf8').read()
template_obj = Template(template_text)
return template_obj.substitute(variables)
So the error is no longer coming from python-markdown but from the
standard library. Seems to be some conflict between using codecs.open
to get a string and using Template.
Fortunately this happened in few enough files that I was able to find
and remove the offending characters manually. Still, it would be good
to be able to read and write text from files in a robust way.
|