Thread: Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

Brought to you by: goodger, grubert, milde, tibs, wiemann

docutils-develop

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: <gr...@us...> - 2009-10-19 09:42:41

hello,

any comments to this behaviour

* why 'ASCII' sometimes works and sometimes not ?
* why 'ASCII' is in uppercase, python/encodings has all lowercase ?

cheers
-- 

---------- Forwarded message ----------
Date: Mon, 12 Oct 2009 19:05:30 +0300
From: Firat Ozgul <ozg...@gm...>
To: gr...@us...
Subject: Re: [Docutils-develop] Uppercase "ASCII" in nodes.py

Hello,

No, those codes do not produce any errors on Python 2.6.3 that I have...

I have Docutils 0.6 compiled from source on Ubuntu Karmic Koala. And when I
try to "make html" in Sphinx, I get the error I mentioned in my previous
e-mail, unless I manually edit nodes.py to change uppercase "ASCII" to
lowercase "ASCII".

Thanks for the interest.

Firat Ozgul

2009/10/12 <gr...@us...>

> On Mon, 12 Oct 2009, Firat Ozgul wrote:
>
>  Hello,
>>
>> In Docutils 0.6, specifically in nodes.py, more specifically in the
>> make_id() function, the following line will cause an error:
>>
>> id = unicodedata.normalize('NFKD', id).encode('ASCII',
>> 'ignore').decode('ASCII')
>>
>> The encoding "ASCII" should be written in lowercase. So the line should
>> read:
>>
>> id = unicodedata.normalize('NFKD', id).encode('ascii',
>> 'ignore').decode('ascii')
>>
>> Otherwise the programs that use this function will produce this error:
>>
>>  File "/usr/local/lib/python2.6/dist-packages/docutils/nodes.py", line
>> 1845, in make_id
>>   encode('ASCII', 'ignore').decode('ASCII')
>> LookupError: unknown encoding: ASCII
>>
>
> does this ::
>
>  import unicodedata
>
>  id = u"abc def"
>
>  id = unicodedata.normalize('NFKD', id).encode('ASCII',
> 'ignore').decode('ASCII')
>  id = unicodedata.normalize('NFKD', id).encode('ascii',
> 'ignore').decode('ascii')
>
> produce an error at your place ?
>
> python2.6 on what platform ?
>
> cheers
>
> --
>

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Firat O. <ozg...@gm...> - 2009-10-19 10:10:44

2009/10/19 <gr...@us...>

> hello,
>
> any comments to this behaviour
>
> * why 'ASCII' sometimes works and sometimes not ?
> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?
>
> cheers
> --
>
> ---------- Forwarded message ----------
> Date: Mon, 12 Oct 2009 19:05:30 +0300
> From: Firat Ozgul <ozg...@gm...>
> To: gr...@us...
> Subject: Re: [Docutils-develop] Uppercase "ASCII" in nodes.py
>
> Hello,
>
> No, those codes do not produce any errors on Python 2.6.3 that I have...
>
> I have Docutils 0.6 compiled from source on Ubuntu Karmic Koala. And when I
> try to "make html" in Sphinx, I get the error I mentioned in my previous
> e-mail, unless I manually edit nodes.py to change uppercase "ASCII" to
> lowercase "ASCII".
>
> Thanks for the interest.
>
> Firat Ozgul
>
> 2009/10/12 <gr...@us...>
>
> > On Mon, 12 Oct 2009, Firat Ozgul wrote:
> >
> >  Hello,
> >>
> >> In Docutils 0.6, specifically in nodes.py, more specifically in the
> >> make_id() function, the following line will cause an error:
> >>
> >> id = unicodedata.normalize('NFKD', id).encode('ASCII',
> >> 'ignore').decode('ASCII')
> >>
> >> The encoding "ASCII" should be written in lowercase. So the line should
> >> read:
> >>
> >> id = unicodedata.normalize('NFKD', id).encode('ascii',
> >> 'ignore').decode('ascii')
> >>
> >> Otherwise the programs that use this function will produce this error:
> >>
> >>  File "/usr/local/lib/python2.6/dist-packages/docutils/nodes.py", line
> >> 1845, in make_id
> >>   encode('ASCII', 'ignore').decode('ASCII')
> >> LookupError: unknown encoding: ASCII
> >>
> >
> > does this ::
> >
> >  import unicodedata
> >
> >  id = u"abc def"
> >
> >  id = unicodedata.normalize('NFKD', id).encode('ASCII',
> > 'ignore').decode('ASCII')
> >  id = unicodedata.normalize('NFKD', id).encode('ascii',
> > 'ignore').decode('ascii')
> >
> > produce an error at your place ?
> >
> > python2.6 on what platform ?
> >
> > cheers
> >
> > --
> >
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> Docutils-develop mailing list
> Doc...@li...
> https://lists.sourceforge.net/lists/listinfo/docutils-develop
>
> Please use "Reply All" to reply to the list.
>

Hello,

I have a desktop and a laptop computer, both of which run Ubuntu Karmic
Koala. The Python version they have now is 2.6.4rc1.

In my laptop computer "ASCII" causes a crash in Sphinx. And until I manually
edit nodes.py Sphinx goes on to spit errors... However, in the desktop
computer Sphinx works fine no matter in what case is "ASCII"...  Just a
note, uninstalling and re-installing Sphinx and Docutils did not have any
effect on the problem...

Regards,
Firat

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Georg B. <g.b...@gm...> - 2009-10-19 13:25:06

gr...@us... schrieb:
> hello,
> 
> any comments to this behaviour
> 
> * why 'ASCII' sometimes works and sometimes not ?
> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?

'ASCII' should work; codec names are normalized before lookup, which
includes a lowercasing step.

However, that lowercasing may not lead to the expected results when
a non-C locale is set and the locale's ASCII upper case letters do
not map to the ASCII lower case letters.

I remember that Turkish has a different upper case 'i' than other
Latin languages, this may well be the cause here.

Firat: Can you open a Python shell on both of your computers and do
execute the following commands, showing us the output of each:

    import locale
    print 'I'.lower()
    print locale.setlocale(locale.LC_ALL, '')
    print 'I'.lower()

cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Michael F. <fuz...@vo...> - 2009-10-19 14:03:32

Georg Brandl wrote:
> gr...@us... schrieb:
>   
>> hello,
>>
>> any comments to this behaviour
>>
>> * why 'ASCII' sometimes works and sometimes not ?
>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?
>>     
>
> 'ASCII' should work; codec names are normalized before lookup, which
> includes a lowercasing step.
>
> However, that lowercasing may not lead to the expected results when
> a non-C locale is set and the locale's ASCII upper case letters do
> not map to the ASCII lower case letters.
>
> I remember that Turkish has a different upper case 'i' than other
> Latin languages, this may well be the cause here.
>
> Firat: Can you open a Python shell on both of your computers and do
> execute the following commands, showing us the output of each:
>
>     import locale
>     print 'I'.lower()
>     print locale.setlocale(locale.LC_ALL, '')
>     print 'I'.lower()
>
>   

I thought that Python guaranteed that ascii characters would remain 
ascii when converted from lowercase to uppercase or vice-versa.

All the best,

Michael Foord

> cheers,
> Georg
>
>   


-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: <gr...@us...> - 2009-10-19 16:23:10

On Mon, 19 Oct 2009, Georg Brandl wrote:

> gr...@us... schrieb:
>> hello,
>>
>> any comments to this behaviour
>>
>> * why 'ASCII' sometimes works and sometimes not ?
>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?
>
> 'ASCII' should work; codec names are normalized before lookup, which
> includes a lowercasing step.
>
> However, that lowercasing may not lead to the expected results when
> a non-C locale is set and the locale's ASCII upper case letters do
> not map to the ASCII lower case letters.
>
> I remember that Turkish has a different upper case 'i' than other
> Latin languages, this may well be the cause here.
>
> Firat: Can you open a Python shell on both of your computers and do
> execute the following commands, showing us the output of each:
>
>    import locale
>    print 'I'.lower()
>    print locale.setlocale(locale.LC_ALL, '')
>    print 'I'.lower()

seams to be a general problem , see : http://bugs.python.org/issue1813

for docutils i would go for using 'ascii' in the code to support
all turkish installations.

cheers

--

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Georg B. <g.b...@gm...> - 2009-10-19 13:52:04

Attachments: signature.asc

Michael Foord schrieb:
> Georg Brandl wrote:
>> gr...@us... schrieb:
>>   
>>> hello,
>>>
>>> any comments to this behaviour
>>>
>>> * why 'ASCII' sometimes works and sometimes not ?
>>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?
>>>     
>>
>> 'ASCII' should work; codec names are normalized before lookup, which
>> includes a lowercasing step.
>>
>> However, that lowercasing may not lead to the expected results when
>> a non-C locale is set and the locale's ASCII upper case letters do
>> not map to the ASCII lower case letters.
>>
>> I remember that Turkish has a different upper case 'i' than other
>> Latin languages, this may well be the cause here.
>>
>> Firat: Can you open a Python shell on both of your computers and do
>> execute the following commands, showing us the output of each:
>>
>>     import locale
>>     print 'I'.lower()
>>     print locale.setlocale(locale.LC_ALL, '')
>>     print 'I'.lower()
>>
>>   
> 
> I thought that Python guaranteed that ascii characters would remain 
> ascii when converted from lowercase to uppercase or vice-versa.

Maybe it does (is it in the docs somewhere?) -- but that would be the
only explanation I can offer for the behavior observed.

(Even if Python does guarantee that, is it tested?  Most locales *do*
map ASCII upper/lowercase on themselves...)

Georg

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Michael F. <fuz...@vo...> - 2009-10-19 14:02:37

Georg Brandl wrote:
> Michael Foord schrieb:
>   
>> Georg Brandl wrote:
>>     
>>> gr...@us... schrieb:
>>>   
>>>       
>>>> hello,
>>>>
>>>> any comments to this behaviour
>>>>
>>>> * why 'ASCII' sometimes works and sometimes not ?
>>>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?
>>>>     
>>>>         
>>> 'ASCII' should work; codec names are normalized before lookup, which
>>> includes a lowercasing step.
>>>
>>> However, that lowercasing may not lead to the expected results when
>>> a non-C locale is set and the locale's ASCII upper case letters do
>>> not map to the ASCII lower case letters.
>>>
>>> I remember that Turkish has a different upper case 'i' than other
>>> Latin languages, this may well be the cause here.
>>>
>>> Firat: Can you open a Python shell on both of your computers and do
>>> execute the following commands, showing us the output of each:
>>>
>>>     import locale
>>>     print 'I'.lower()
>>>     print locale.setlocale(locale.LC_ALL, '')
>>>     print 'I'.lower()
>>>
>>>   
>>>       
>> I thought that Python guaranteed that ascii characters would remain 
>> ascii when converted from lowercase to uppercase or vice-versa.
>>     
>
> Maybe it does (is it in the docs somewhere?) -- but that would be the
> only explanation I can offer for the behavior observed.
>
> (Even if Python does guarantee that, is it tested?  Most locales *do*
> map ASCII upper/lowercase on themselves...)
>   

Well, I know that on IronPython we had problems because it *didn't* 
guarantee this (and the decimal module uses strings to lookup global 
variable names - yeeurgh).

If decimal can be imported in the Turkish locale (which I'm pretty sure 
I can) then Python is maintaining this ascii-invariant.

Michael Foord

> Georg
>
>   


-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Firat O. <ozg...@gm...> - 2009-10-19 16:35:59

Georg, I think you are right in stating that this is a locale issue...

On my desktop computer where both lower- and uppercase "ASCII" work fine,
this is the output:

>>> print "ASCII".lower()
ascii
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'en_US.UTF-8'
>>> print "ASCII".lower()
ascii

And this is the output from the laptop computer where only lowercase "ascii"
works:

>>> print "ASCII".lower()
ascii
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'tr_TR.UTF-8'
>>> print "ASCII".lower()
ascII

In Turkish language, the general rule is that letters do not lose their
cedillas or dots when changing their cases. So;

lowercase "i" turns into "I-with-a-dot-above" in uppercase
lowercase "i-without-a-dot" turns into "I" in uppercase
uppercase "I-with-a-dot-above" turns into "i" in lowercase
uppercase "I" turns into "i-without-a-dot" in lowercase

However, the letter "i" maps to "I"  in Unicode whatever locale you choose:

>>> print u"kitap".upper()
KITAP

The "I" letter shoud be "I-with-a-dot-above" in the output...

There is an old bug report for this at http://bugs.python.org/msg55478

For example, Elisa, Gazpacho and PyPDF give "KeyError: ‘ROUND_CEiLiNG’" when
run under Turkish locale. Here "lowercase "i" letters in the output
(CEiLiNG) does indicate a locale failure... However, in Docutils, the
letters of the word "ASCII" in the error message do not indicate any
problems...:

"LookupError: unknown encoding: ASCII"

"Dotted I" in Turkish is a very big problem for Turkish and non-Turkish
programmers. Because there is no way to uppercase the letter "i" correctly
in any locale...

Regards,
Firat

Georg Brandl wrote:
> >>> 'ASCII' should work; codec names are normalized before lookup, which
> >>> includes a lowercasing step.
> >>>
> >>> However, that lowercasing may not lead to the expected results when
> >>> a non-C locale is set and the locale's ASCII upper case letters do
> >>> not map to the ASCII lower case letters.
> >>>
> >>> I remember that Turkish has a different upper case 'i' than other
> >>> Latin languages, this may well be the cause here.
> >>>
> >>> Firat: Can you open a Python shell on both of your computers and do
> >>> execute the following commands, showing us the output of each:
> >>>
> >>>     import locale
> >>>     print 'I'.lower()
> >>>     print locale.setlocale(locale.LC_ALL, '')
> >>>     print 'I'.lower()
>
>

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: Georg B. <g.b...@gm...> - 2009-10-19 18:18:40

gr...@us... schrieb:
> On Mon, 19 Oct 2009, Georg Brandl wrote:
> 
>> gr...@us... schrieb:
>>> hello,
>>>
>>> any comments to this behaviour
>>>
>>> * why 'ASCII' sometimes works and sometimes not ?
>>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ?
>>
>> 'ASCII' should work; codec names are normalized before lookup, which
>> includes a lowercasing step.
>>
>> However, that lowercasing may not lead to the expected results when
>> a non-C locale is set and the locale's ASCII upper case letters do
>> not map to the ASCII lower case letters.
>>
>> I remember that Turkish has a different upper case 'i' than other
>> Latin languages, this may well be the cause here.
>>
>> Firat: Can you open a Python shell on both of your computers and do
>> execute the following commands, showing us the output of each:
>>
>>    import locale
>>    print 'I'.lower()
>>    print locale.setlocale(locale.LC_ALL, '')
>>    print 'I'.lower()
> 
> seams to be a general problem , see : http://bugs.python.org/issue1813

Ah, ok.  I didn't remember that report specifically.

> for docutils i would go for using 'ascii' in the code to support
> all turkish installations.

Yes, I'd recommend that.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

Re: [Docutils-develop] Uppercase "ASCII" in nodes.py (fwd)

From: <gr...@us...> - 2009-10-20 14:31:15

On Mon, 19 Oct 2009, Georg Brandl wrote:

> gr...@us... schrieb:

SNIP 8< ---

>> seams to be a general problem , see : http://bugs.python.org/issue1813
>
> Ah, ok.  I didn't remember that report specifically.
>
>> for docutils i would go for using 'ascii' in the code to support
>> all turkish installations.
>
> Yes, I'd recommend that.

committed

cheers
--