Re: [Python-markdown-discuss] UnicodeDecodeError

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thanks.

I don't know what encoding the files are in. They're just files that I
created myself with a text editor, but often text has been copy-pasted
into them from various sources, e.g. websites, and that's were the
decoding problems occur. Presumably some non-utf8 characters get
pasted in.

I used the codecs.open trick when reading files and again when writing
the HTML from python-markdown, wherever I was using open I replaced it
with codecs.open. This works for most of my files but for some I get:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position
2551: unexpected code byte

The error happens when I call template.substitute in this function:

def render_template(template_filename,variables=None):
    if variables is None: variables = {}
    template_path = os.path.join('templates',template_filename)
    template_text = codecs.open(template_path,mode='r',encoding='utf8').read()
    template_obj = Template(template_text)
    return template_obj.substitute(variables)

So the error is no longer coming from python-markdown but from the
standard library. Seems to be some conflict between using codecs.open
to get a string and using Template.

Fortunately this happened in few enough files that I was able to find
and remove the offending characters manually. Still, it would be good
to be able to read and write text from files in a robust way.