From: <gr...@us...> - 2009-10-19 09:42:41
|
hello, any comments to this behaviour * why 'ASCII' sometimes works and sometimes not ? * why 'ASCII' is in uppercase, python/encodings has all lowercase ? cheers -- ---------- Forwarded message ---------- Date: Mon, 12 Oct 2009 19:05:30 +0300 From: Firat Ozgul <ozg...@gm...> To: gr...@us... Subject: Re: [Docutils-develop] Uppercase "ASCII" in nodes.py Hello, No, those codes do not produce any errors on Python 2.6.3 that I have... I have Docutils 0.6 compiled from source on Ubuntu Karmic Koala. And when I try to "make html" in Sphinx, I get the error I mentioned in my previous e-mail, unless I manually edit nodes.py to change uppercase "ASCII" to lowercase "ASCII". Thanks for the interest. Firat Ozgul 2009/10/12 <gr...@us...> > On Mon, 12 Oct 2009, Firat Ozgul wrote: > > Hello, >> >> In Docutils 0.6, specifically in nodes.py, more specifically in the >> make_id() function, the following line will cause an error: >> >> id = unicodedata.normalize('NFKD', id).encode('ASCII', >> 'ignore').decode('ASCII') >> >> The encoding "ASCII" should be written in lowercase. So the line should >> read: >> >> id = unicodedata.normalize('NFKD', id).encode('ascii', >> 'ignore').decode('ascii') >> >> Otherwise the programs that use this function will produce this error: >> >> File "/usr/local/lib/python2.6/dist-packages/docutils/nodes.py", line >> 1845, in make_id >> encode('ASCII', 'ignore').decode('ASCII') >> LookupError: unknown encoding: ASCII >> > > does this :: > > import unicodedata > > id = u"abc def" > > id = unicodedata.normalize('NFKD', id).encode('ASCII', > 'ignore').decode('ASCII') > id = unicodedata.normalize('NFKD', id).encode('ascii', > 'ignore').decode('ascii') > > produce an error at your place ? > > python2.6 on what platform ? > > cheers > > -- > |
From: Firat O. <ozg...@gm...> - 2009-10-19 10:10:44
|
2009/10/19 <gr...@us...> > hello, > > any comments to this behaviour > > * why 'ASCII' sometimes works and sometimes not ? > * why 'ASCII' is in uppercase, python/encodings has all lowercase ? > > cheers > -- > > ---------- Forwarded message ---------- > Date: Mon, 12 Oct 2009 19:05:30 +0300 > From: Firat Ozgul <ozg...@gm...> > To: gr...@us... > Subject: Re: [Docutils-develop] Uppercase "ASCII" in nodes.py > > Hello, > > No, those codes do not produce any errors on Python 2.6.3 that I have... > > I have Docutils 0.6 compiled from source on Ubuntu Karmic Koala. And when I > try to "make html" in Sphinx, I get the error I mentioned in my previous > e-mail, unless I manually edit nodes.py to change uppercase "ASCII" to > lowercase "ASCII". > > Thanks for the interest. > > Firat Ozgul > > 2009/10/12 <gr...@us...> > > > On Mon, 12 Oct 2009, Firat Ozgul wrote: > > > > Hello, > >> > >> In Docutils 0.6, specifically in nodes.py, more specifically in the > >> make_id() function, the following line will cause an error: > >> > >> id = unicodedata.normalize('NFKD', id).encode('ASCII', > >> 'ignore').decode('ASCII') > >> > >> The encoding "ASCII" should be written in lowercase. So the line should > >> read: > >> > >> id = unicodedata.normalize('NFKD', id).encode('ascii', > >> 'ignore').decode('ascii') > >> > >> Otherwise the programs that use this function will produce this error: > >> > >> File "/usr/local/lib/python2.6/dist-packages/docutils/nodes.py", line > >> 1845, in make_id > >> encode('ASCII', 'ignore').decode('ASCII') > >> LookupError: unknown encoding: ASCII > >> > > > > does this :: > > > > import unicodedata > > > > id = u"abc def" > > > > id = unicodedata.normalize('NFKD', id).encode('ASCII', > > 'ignore').decode('ASCII') > > id = unicodedata.normalize('NFKD', id).encode('ascii', > > 'ignore').decode('ascii') > > > > produce an error at your place ? > > > > python2.6 on what platform ? > > > > cheers > > > > -- > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Docutils-develop mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list. > Hello, I have a desktop and a laptop computer, both of which run Ubuntu Karmic Koala. The Python version they have now is 2.6.4rc1. In my laptop computer "ASCII" causes a crash in Sphinx. And until I manually edit nodes.py Sphinx goes on to spit errors... However, in the desktop computer Sphinx works fine no matter in what case is "ASCII"... Just a note, uninstalling and re-installing Sphinx and Docutils did not have any effect on the problem... Regards, Firat |
From: Georg B. <g.b...@gm...> - 2009-10-19 13:25:06
|
gr...@us... schrieb: > hello, > > any comments to this behaviour > > * why 'ASCII' sometimes works and sometimes not ? > * why 'ASCII' is in uppercase, python/encodings has all lowercase ? 'ASCII' should work; codec names are normalized before lookup, which includes a lowercasing step. However, that lowercasing may not lead to the expected results when a non-C locale is set and the locale's ASCII upper case letters do not map to the ASCII lower case letters. I remember that Turkish has a different upper case 'i' than other Latin languages, this may well be the cause here. Firat: Can you open a Python shell on both of your computers and do execute the following commands, showing us the output of each: import locale print 'I'.lower() print locale.setlocale(locale.LC_ALL, '') print 'I'.lower() cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. |
From: Michael F. <fuz...@vo...> - 2009-10-19 14:03:32
|
Georg Brandl wrote: > gr...@us... schrieb: > >> hello, >> >> any comments to this behaviour >> >> * why 'ASCII' sometimes works and sometimes not ? >> * why 'ASCII' is in uppercase, python/encodings has all lowercase ? >> > > 'ASCII' should work; codec names are normalized before lookup, which > includes a lowercasing step. > > However, that lowercasing may not lead to the expected results when > a non-C locale is set and the locale's ASCII upper case letters do > not map to the ASCII lower case letters. > > I remember that Turkish has a different upper case 'i' than other > Latin languages, this may well be the cause here. > > Firat: Can you open a Python shell on both of your computers and do > execute the following commands, showing us the output of each: > > import locale > print 'I'.lower() > print locale.setlocale(locale.LC_ALL, '') > print 'I'.lower() > > I thought that Python guaranteed that ascii characters would remain ascii when converted from lowercase to uppercase or vice-versa. All the best, Michael Foord > cheers, > Georg > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog |
From: <gr...@us...> - 2009-10-19 16:23:10
|
On Mon, 19 Oct 2009, Georg Brandl wrote: > gr...@us... schrieb: >> hello, >> >> any comments to this behaviour >> >> * why 'ASCII' sometimes works and sometimes not ? >> * why 'ASCII' is in uppercase, python/encodings has all lowercase ? > > 'ASCII' should work; codec names are normalized before lookup, which > includes a lowercasing step. > > However, that lowercasing may not lead to the expected results when > a non-C locale is set and the locale's ASCII upper case letters do > not map to the ASCII lower case letters. > > I remember that Turkish has a different upper case 'i' than other > Latin languages, this may well be the cause here. > > Firat: Can you open a Python shell on both of your computers and do > execute the following commands, showing us the output of each: > > import locale > print 'I'.lower() > print locale.setlocale(locale.LC_ALL, '') > print 'I'.lower() seams to be a general problem , see : http://bugs.python.org/issue1813 for docutils i would go for using 'ascii' in the code to support all turkish installations. cheers -- |
From: Georg B. <g.b...@gm...> - 2009-10-19 13:52:04
Attachments:
signature.asc
|
Michael Foord schrieb: > Georg Brandl wrote: >> gr...@us... schrieb: >> >>> hello, >>> >>> any comments to this behaviour >>> >>> * why 'ASCII' sometimes works and sometimes not ? >>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ? >>> >> >> 'ASCII' should work; codec names are normalized before lookup, which >> includes a lowercasing step. >> >> However, that lowercasing may not lead to the expected results when >> a non-C locale is set and the locale's ASCII upper case letters do >> not map to the ASCII lower case letters. >> >> I remember that Turkish has a different upper case 'i' than other >> Latin languages, this may well be the cause here. >> >> Firat: Can you open a Python shell on both of your computers and do >> execute the following commands, showing us the output of each: >> >> import locale >> print 'I'.lower() >> print locale.setlocale(locale.LC_ALL, '') >> print 'I'.lower() >> >> > > I thought that Python guaranteed that ascii characters would remain > ascii when converted from lowercase to uppercase or vice-versa. Maybe it does (is it in the docs somewhere?) -- but that would be the only explanation I can offer for the behavior observed. (Even if Python does guarantee that, is it tested? Most locales *do* map ASCII upper/lowercase on themselves...) Georg |
From: Michael F. <fuz...@vo...> - 2009-10-19 14:02:37
|
Georg Brandl wrote: > Michael Foord schrieb: > >> Georg Brandl wrote: >> >>> gr...@us... schrieb: >>> >>> >>>> hello, >>>> >>>> any comments to this behaviour >>>> >>>> * why 'ASCII' sometimes works and sometimes not ? >>>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ? >>>> >>>> >>> 'ASCII' should work; codec names are normalized before lookup, which >>> includes a lowercasing step. >>> >>> However, that lowercasing may not lead to the expected results when >>> a non-C locale is set and the locale's ASCII upper case letters do >>> not map to the ASCII lower case letters. >>> >>> I remember that Turkish has a different upper case 'i' than other >>> Latin languages, this may well be the cause here. >>> >>> Firat: Can you open a Python shell on both of your computers and do >>> execute the following commands, showing us the output of each: >>> >>> import locale >>> print 'I'.lower() >>> print locale.setlocale(locale.LC_ALL, '') >>> print 'I'.lower() >>> >>> >>> >> I thought that Python guaranteed that ascii characters would remain >> ascii when converted from lowercase to uppercase or vice-versa. >> > > Maybe it does (is it in the docs somewhere?) -- but that would be the > only explanation I can offer for the behavior observed. > > (Even if Python does guarantee that, is it tested? Most locales *do* > map ASCII upper/lowercase on themselves...) > Well, I know that on IronPython we had problems because it *didn't* guarantee this (and the decimal module uses strings to lookup global variable names - yeeurgh). If decimal can be imported in the Turkish locale (which I'm pretty sure I can) then Python is maintaining this ascii-invariant. Michael Foord > Georg > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog |
From: Firat O. <ozg...@gm...> - 2009-10-19 16:35:59
|
Georg, I think you are right in stating that this is a locale issue... On my desktop computer where both lower- and uppercase "ASCII" work fine, this is the output: >>> print "ASCII".lower() ascii >>> import locale >>> locale.setlocale(locale.LC_ALL, "") 'en_US.UTF-8' >>> print "ASCII".lower() ascii And this is the output from the laptop computer where only lowercase "ascii" works: >>> print "ASCII".lower() ascii >>> import locale >>> locale.setlocale(locale.LC_ALL, "") 'tr_TR.UTF-8' >>> print "ASCII".lower() ascII In Turkish language, the general rule is that letters do not lose their cedillas or dots when changing their cases. So; lowercase "i" turns into "I-with-a-dot-above" in uppercase lowercase "i-without-a-dot" turns into "I" in uppercase uppercase "I-with-a-dot-above" turns into "i" in lowercase uppercase "I" turns into "i-without-a-dot" in lowercase However, the letter "i" maps to "I" in Unicode whatever locale you choose: >>> print u"kitap".upper() KITAP The "I" letter shoud be "I-with-a-dot-above" in the output... There is an old bug report for this at http://bugs.python.org/msg55478 For example, Elisa, Gazpacho and PyPDF give "KeyError: ‘ROUND_CEiLiNG’" when run under Turkish locale. Here "lowercase "i" letters in the output (CEiLiNG) does indicate a locale failure... However, in Docutils, the letters of the word "ASCII" in the error message do not indicate any problems...: "LookupError: unknown encoding: ASCII" "Dotted I" in Turkish is a very big problem for Turkish and non-Turkish programmers. Because there is no way to uppercase the letter "i" correctly in any locale... Regards, Firat Georg Brandl wrote: > >>> 'ASCII' should work; codec names are normalized before lookup, which > >>> includes a lowercasing step. > >>> > >>> However, that lowercasing may not lead to the expected results when > >>> a non-C locale is set and the locale's ASCII upper case letters do > >>> not map to the ASCII lower case letters. > >>> > >>> I remember that Turkish has a different upper case 'i' than other > >>> Latin languages, this may well be the cause here. > >>> > >>> Firat: Can you open a Python shell on both of your computers and do > >>> execute the following commands, showing us the output of each: > >>> > >>> import locale > >>> print 'I'.lower() > >>> print locale.setlocale(locale.LC_ALL, '') > >>> print 'I'.lower() > > |
From: Georg B. <g.b...@gm...> - 2009-10-19 18:18:40
|
gr...@us... schrieb: > On Mon, 19 Oct 2009, Georg Brandl wrote: > >> gr...@us... schrieb: >>> hello, >>> >>> any comments to this behaviour >>> >>> * why 'ASCII' sometimes works and sometimes not ? >>> * why 'ASCII' is in uppercase, python/encodings has all lowercase ? >> >> 'ASCII' should work; codec names are normalized before lookup, which >> includes a lowercasing step. >> >> However, that lowercasing may not lead to the expected results when >> a non-C locale is set and the locale's ASCII upper case letters do >> not map to the ASCII lower case letters. >> >> I remember that Turkish has a different upper case 'i' than other >> Latin languages, this may well be the cause here. >> >> Firat: Can you open a Python shell on both of your computers and do >> execute the following commands, showing us the output of each: >> >> import locale >> print 'I'.lower() >> print locale.setlocale(locale.LC_ALL, '') >> print 'I'.lower() > > seams to be a general problem , see : http://bugs.python.org/issue1813 Ah, ok. I didn't remember that report specifically. > for docutils i would go for using 'ascii' in the code to support > all turkish installations. Yes, I'd recommend that. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. |
From: <gr...@us...> - 2009-10-20 14:31:15
|
On Mon, 19 Oct 2009, Georg Brandl wrote: > gr...@us... schrieb: SNIP 8< --- >> seams to be a general problem , see : http://bugs.python.org/issue1813 > > Ah, ok. I didn't remember that report specifically. > >> for docutils i would go for using 'ascii' in the code to support >> all turkish installations. > > Yes, I'd recommend that. committed cheers -- |