From: Marco P. <mar...@gm...> - 2009-02-27 10:08:56
|
What about doing: fh = open(template_path, 'r') value = fh.read() fh.close() value = codecs.encode(value, 'utf-8', 'replace') ? Cheers, Marco On Thu, Feb 26, 2009 at 8:17 PM, tchomby <tc...@go...> wrote: > Thanks. > > I don't know what encoding the files are in. They're just files that I > created myself with a text editor, but often text has been copy-pasted > into them from various sources, e.g. websites, and that's were the > decoding problems occur. Presumably some non-utf8 characters get > pasted in. > > I used the codecs.open trick when reading files and again when writing > the HTML from python-markdown, wherever I was using open I replaced it > with codecs.open. This works for most of my files but for some I get: On Thu, Feb 26, 2009 at 8:17 PM, tchomby <tc...@go...> wrote: > Thanks. > > I don't know what encoding the files are in. They're just files that I > created myself with a text editor, but often text has been copy-pasted > into them from various sources, e.g. websites, and that's were the > decoding problems occur. Presumably some non-utf8 characters get > pasted in. > > I used the codecs.open trick when reading files and again when writing > the HTML from python-markdown, wherever I was using open I replaced it > with codecs.open. This works for most of my files but for some I get: On Thu, Feb 26, 2009 at 8:17 PM, tchomby <tc...@go...> wrote: > Thanks. > > I don't know what encoding the files are in. They're just files that I > created myself with a text editor, but often text has been copy-pasted > into them from various sources, e.g. websites, and that's were the > decoding problems occur. Presumably some non-utf8 characters get > pasted in. > > I used the codecs.open trick when reading files and again when writing > the HTML from python-markdown, wherever I was using open I replaced it > with codecs.open. This works for most of my files but for some I get: > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position > 2551: unexpected code byte > > The error happens when I call template.substitute in this function: > > def render_template(template_filename,variables=None): > if variables is None: variables = {} > template_path = os.path.join('templates',template_filename) > template_text = codecs.open(template_path,mode='r',encoding='utf8').read() > template_obj = Template(template_text) > return template_obj.substitute(variables) > > So the error is no longer coming from python-markdown but from the > standard library. Seems to be some conflict between using codecs.open > to get a string and using Template. > > Fortunately this happened in few enough files that I was able to find > and remove the offending characters manually. Still, it would be good > to be able to read and write text from files in a robust way. > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Marco Pantaleoni |