From: tchomby <tc...@go...> - 2009-02-26 19:17:06
|
Thanks. I don't know what encoding the files are in. They're just files that I created myself with a text editor, but often text has been copy-pasted into them from various sources, e.g. websites, and that's were the decoding problems occur. Presumably some non-utf8 characters get pasted in. I used the codecs.open trick when reading files and again when writing the HTML from python-markdown, wherever I was using open I replaced it with codecs.open. This works for most of my files but for some I get: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position 2551: unexpected code byte The error happens when I call template.substitute in this function: def render_template(template_filename,variables=None): if variables is None: variables = {} template_path = os.path.join('templates',template_filename) template_text = codecs.open(template_path,mode='r',encoding='utf8').read() template_obj = Template(template_text) return template_obj.substitute(variables) So the error is no longer coming from python-markdown but from the standard library. Seems to be some conflict between using codecs.open to get a string and using Template. Fortunately this happened in few enough files that I was able to find and remove the offending characters manually. Still, it would be good to be able to read and write text from files in a robust way. |