Re: [Python-markdown-discuss] UnicodeDecodeError

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

What about doing:

fh = open(template_path, 'r')
value = fh.read()
fh.close()
value = codecs.encode(value, 'utf-8', 'replace')

?

Cheers,
Marco

On Thu, Feb 26, 2009 at 8:17 PM, tchomby <tc...@go...> wrote:
> Thanks.
>
> I don't know what encoding the files are in. They're just files that I
> created myself with a text editor, but often text has been copy-pasted
> into them from various sources, e.g. websites, and that's were the
> decoding problems occur. Presumably some non-utf8 characters get
> pasted in.
>
> I used the codecs.open trick when reading files and again when writing
> the HTML from python-markdown, wherever I was using open I replaced it
> with codecs.open. This works for most of my files but for some I get:

On Thu, Feb 26, 2009 at 8:17 PM, tchomby <tc...@go...> wrote:
> Thanks.
>
> I don't know what encoding the files are in. They're just files that I
> created myself with a text editor, but often text has been copy-pasted
> into them from various sources, e.g. websites, and that's were the
> decoding problems occur. Presumably some non-utf8 characters get
> pasted in.
>
> I used the codecs.open trick when reading files and again when writing
> the HTML from python-markdown, wherever I was using open I replaced it
> with codecs.open. This works for most of my files but for some I get:

On Thu, Feb 26, 2009 at 8:17 PM, tchomby <tc...@go...> wrote:
> Thanks.
>
> I don't know what encoding the files are in. They're just files that I
> created myself with a text editor, but often text has been copy-pasted
> into them from various sources, e.g. websites, and that's were the
> decoding problems occur. Presumably some non-utf8 characters get
> pasted in.
>
> I used the codecs.open trick when reading files and again when writing
> the HTML from python-markdown, wherever I was using open I replaced it
> with codecs.open. This works for most of my files but for some I get:
>
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position
> 2551: unexpected code byte
>
> The error happens when I call template.substitute in this function:
>
> def render_template(template_filename,variables=None):
>    if variables is None: variables = {}
>    template_path = os.path.join('templates',template_filename)
>    template_text = codecs.open(template_path,mode='r',encoding='utf8').read()
>    template_obj = Template(template_text)
>    return template_obj.substitute(variables)
>
> So the error is no longer coming from python-markdown but from the
> standard library. Seems to be some conflict between using codecs.open
> to get a string and using Template.
>
> Fortunately this happened in few enough files that I was able to find
> and remove the offending characters manually. Still, it would be good
> to be able to read and write text from files in a robust way.
>
> ------------------------------------------------------------------------------
> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
> -Strategies to boost innovation and cut costs with open source participation
> -Receive a $600 discount off the registration fee with the source code: SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>

-- 
Marco Pantaleoni