From: A.M. K. <aku...@us...> - 2001-07-19 01:48:12
|
Update of /cvsroot/py-howto/pyhowto In directory usw-pr-cvs1:/tmp/cvs-serv12586 Modified Files: python-22.tex Log Message: Fill out the Unicode section, somewhat uncertainly Index: python-22.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/python-22.tex,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -r1.14 -r1.15 *** python-22.tex 2001/07/19 01:19:59 1.14 --- python-22.tex 2001/07/19 01:48:08 1.15 *************** *** 341,349 **** Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. ! Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers ! by supplying \longprogramopt{enable-unicode=ucs4} to the configure script. ! XXX explain surrogates? I have to figure out what the changes mean to users. ! Since their introduction, Unicode strings have supported an \method{encode()} method to convert the string to a selected encoding --- 341,359 ---- Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. ! Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned ! integers, as its internal encoding by supplying ! \longprogramopt{enable-unicode=ucs4} to the configure script. When ! built to use UCS-4, in theory Python could handle Unicode characters ! from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is ! a necessary step to do that, but it's not the only step, and in Python ! 2.2alpha1 the work isn't complete yet. For example, the ! \function{unichr()} function still only accepts values from 0 to ! 65535, and there's no \code{\e U} notation for embedding characters ! greater than 65535 in a Unicode string literal. All this is the ! province of the still-unimplemented PEP 261, ``Support for `wide' ! Unicode characters''; consult it for further details, and please offer ! comments and suggestions on the proposal it describes. ! Another change is much simpler to explain. Since their introduction, Unicode strings have supported an \method{encode()} method to convert the string to a selected encoding *************** *** 375,382 **** 'furrfu' \end{verbatim} ! References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html ! and following thread. %====================================================================== --- 385,399 ---- 'furrfu' \end{verbatim} + + \method{encode()} and \method{decode()} were implemented by + Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally + were implemented by Fredrik Lundh and Martin von L\"owis. + + \begin{seealso} ! \seepep{261}{Support for `wide' Unicode characters}{PEP written by ! Paul Prescod. Not yet accepted or fully implemented.} + \end{seealso} %====================================================================== |