Thread: [Python-markdown-discuss] Unicode errors with 1.6b rc2.

Brought to you by: qaramazov, waylanhl

python-markdown-discuss

[Python-markdown-discuss] Unicode errors with 1.6b rc2.

From: Aaron G. <fl...@sh...> - 2007-05-16 01:42:41

Not entirely sure what is going on, it was just fine with 1.6a.

File "/www/htdocs/floam.sh.nu/html/markdown.py", line 1611, in convert 
\n    self.source = removeBOM(self.source, self.encoding)
File "/www/htdocs/floam.sh.nu/html/markdown.py", line 74, in removeBOM 
\n    if text.startswith(bom):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position  
253: ordinal not in range(128)

Aaron Gyes

[Python-markdown-discuss] Unicode errors with 1.6b rc2.

From: Anand <ana...@gm...> - 2007-06-15 05:55:21

Markdown 1.6b rc2 doesn't seem to like unicode strings.

The following code works fine in 1.5, but doesn't work with 1.6b rc2.

import markdown
text = u'\r\n\u0c24\u0c46\u0c32\u0c41\u0c17\u0c41'
print markdown.markdown(text.encode('utf-8'))

Re: [Python-markdown-discuss] Unicode errors with 1.6b rc2.

From: Yuri T. <qar...@gm...> - 2007-06-15 13:18:20

1.6b rc2 expects to get actual Unicode strings as arguments.  So, you
should get rid of .econde('utf-8') in your example:

    text = u'\r\n\u0c24\u0c46\u0c32\u0c41\u0c17\u0c41'
    print markdown.markdown(text)

If you read UTF8-encoded text from a file manually, you will need to
decode it.  Alternatively, there is a function markdownFromFile() that
accepts an input file path and the encoding, which will decode the
file for you.

BTW, for languages written right-to-left 1.6b rc2 now inserts
dir="rtl" attributes where necessary.

  - yuri

-- 
Yuri Takhteyev
UC Berkeley School of Information
http://www.freewisdom.org/

Re: [Python-markdown-discuss] Unicode errors with 1.6b rc2.

From: Yuri T. <qar...@gm...> - 2007-05-16 02:18:59

Can you send a file that causes the error?  Also, how are you calling it?

  - yuri

On 5/15/07, Aaron Gyes <fl...@sh...> wrote:
> Not entirely sure what is going on, it was just fine with 1.6a.
>
> File "/www/htdocs/floam.sh.nu/html/markdown.py", line 1611, in convert
> \n    self.source = removeBOM(self.source, self.encoding)
> File "/www/htdocs/floam.sh.nu/html/markdown.py", line 74, in removeBOM
> \n    if text.startswith(bom):
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
> 253: ordinal not in range(128)
>
> Aaron Gyes


-- 
Yuri Takhteyev
UC Berkeley School of Information
http://www.freewisdom.org/

Re: [Python-markdown-discuss] Unicode errors with 1.6b rc2.

From: Aaron G. <fl...@sh...> - 2007-05-16 02:34:24

It seems to be "all of them". It oddly enough doesn't happen when I  
give them to it running ./markdown.py at a CLI.

One such file would be http://aarongyes.com/pages/home.txt, another  
http://aarongyes.com/pages/resume.txt. Those files should be  
completely identical, they are in the exact same location as they  
really are being used at, not even copied.

I have wedged it into a home-made templating system, and am calling  
it with .convert(string).

You can look at my code here:
http://aarongyes.com/sitestruc.py_
http://aarongyes.com/site.py_

(the latter being relevant).

Temporarily making removeBOM do nothing but return text has it  
hobbling along.

On May 15, 2007, at 7:18 PM, Yuri Takhteyev wrote:

> Can you send a file that causes the error?  Also, how are you  
> calling it?
>
>  - yuri
>
> On 5/15/07, Aaron Gyes <fl...@sh...> wrote:
>> Not entirely sure what is going on, it was just fine with 1.6a.
>>
>> File "/www/htdocs/floam.sh.nu/html/markdown.py", line 1611, in  
>> convert
>> \n    self.source = removeBOM(self.source, self.encoding)
>> File "/www/htdocs/floam.sh.nu/html/markdown.py", line 74, in  
>> removeBOM
>> \n    if text.startswith(bom):
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
>> 253: ordinal not in range(128)
>>
>> Aaron Gyes
>
>
> -- 
> Yuri Takhteyev
> UC Berkeley School of Information
> http://www.freewisdom.org/

Re: [Python-markdown-discuss] Unicode errors with 1.6b rc2.

From: Yuri T. <qar...@gm...> - 2007-05-16 03:32:19

It seems that you are loading your file as if it was ASCII (i.e., with
just plain open()) even thought it's not (because of the smart quotes
saved as UTF8).  When you do this, the non-ascii characters sometimes
do not cause problems and sometimes they do.  So, I am not sure why
this disabling removeBOM makes this problem go away and wouldn't rely
on this.

Instead, you should open your file with using:

input_file = codecs.open(input, mode="r", encoding="utf8")

(Or make it ASCII-compliant.)

  - yuri

On 5/15/07, Aaron Gyes <fl...@sh...> wrote:
> It seems to be "all of them". It oddly enough doesn't happen when I
> give them to it running ./markdown.py at a CLI.
>
> One such file would be http://aarongyes.com/pages/home.txt, another
> http://aarongyes.com/pages/resume.txt. Those files should be
> completely identical, they are in the exact same location as they
> really are being used at, not even copied.
>
> I have wedged it into a home-made templating system, and am calling
> it with .convert(string).
>
> You can look at my code here:
> http://aarongyes.com/sitestruc.py_
> http://aarongyes.com/site.py_
>
> (the latter being relevant).
>
> Temporarily making removeBOM do nothing but return text has it
> hobbling along.
>
> On May 15, 2007, at 7:18 PM, Yuri Takhteyev wrote:
>
> > Can you send a file that causes the error?  Also, how are you
> > calling it?
> >
> >  - yuri
> >
> > On 5/15/07, Aaron Gyes <fl...@sh...> wrote:
> >> Not entirely sure what is going on, it was just fine with 1.6a.
> >>
> >> File "/www/htdocs/floam.sh.nu/html/markdown.py", line 1611, in
> >> convert
> >> \n    self.source = removeBOM(self.source, self.encoding)
> >> File "/www/htdocs/floam.sh.nu/html/markdown.py", line 74, in
> >> removeBOM
> >> \n    if text.startswith(bom):
> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
> >> 253: ordinal not in range(128)
> >>
> >> Aaron Gyes
> >
> >
> > --
> > Yuri Takhteyev
> > UC Berkeley School of Information
> > http://www.freewisdom.org/
>


-- 
Yuri Takhteyev
UC Berkeley School of Information
http://www.freewisdom.org/