[Py-howto-checkins] CVS: pyhowto python-22.tex,1.15,1.16

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/py-howto/pyhowto
In directory usw-pr-cvs1:/tmp/cvs-serv17509

Modified Files:
	python-22.tex 
Log Message:
Revise the Unicode section after getting comments from MAL, GvR, and others.
Add new low-level API for interpreter introspection
Bump version number.

Index: python-22.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/python-22.tex,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -r1.15 -r1.16
*** python-22.tex	2001/07/19 01:48:08	1.15
--- python-22.tex	2001/07/19 14:59:53	1.16
***************
*** 4,8 ****

  \title{What's New in Python 2.2}
! \release{0.03}
  \author{A.M. Kuchling}
  \authoraddress{\email{aku...@me...}}
--- 4,8 ----

  \title{What's New in Python 2.2}
! \release{0.04}
  \author{A.M. Kuchling}
  \authoraddress{\email{aku...@me...}}
***************
*** 340,371 ****

  Python's Unicode support has been enhanced a bit in 2.2.  Unicode
! strings are usually stored as UCS-2, as 16-bit unsigned integers.
  Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
  integers, as its internal encoding by supplying
  \longprogramopt{enable-unicode=ucs4} to the configure script.  When
! built to use UCS-4, in theory Python could handle Unicode characters
! from U-00000000 to U-7FFFFFFF.  Being able to use UCS-4 internally is
! a necessary step to do that, but it's not the only step, and in Python
! 2.2alpha1 the work isn't complete yet.  For example, the
! \function{unichr()} function still only accepts values from 0 to
! 65535, and there's no \code{\e U} notation for embedding characters
! greater than 65535 in a Unicode string literal.  All this is the
! province of the still-unimplemented PEP 261, ``Support for `wide'
! Unicode characters''; consult it for further details, and please offer
! comments and suggestions on the proposal it describes.
! 
! Another change is much simpler to explain.
! Since their introduction, Unicode strings have supported an
! \method{encode()} method to convert the string to a selected encoding
! such as UTF-8 or Latin-1.  A symmetric
! \method{decode(\optional{\var{encoding}})} method has been added to
! both 8-bit and Unicode strings in 2.2, which assumes that the string
! is in the specified encoding and decodes it. This means that
! \method{encode()} and \method{decode()} can be called on both types of
! strings, and can be used for tasks not directly related to Unicode.
! For example, codecs have been added for UUencoding, MIME's base-64
! encoding, and compression with the \module{zlib} module.

  \begin{verbatim}
  >>> s = """Here is a lengthy piece of redundant, overly verbose,
  ... and repetitive text.
--- 340,385 ----

  Python's Unicode support has been enhanced a bit in 2.2.  Unicode
! strings are usually stored as UTF-16, as 16-bit unsigned integers.
  Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
  integers, as its internal encoding by supplying
  \longprogramopt{enable-unicode=ucs4} to the configure script.  When
! built to use UCS-4 (a ``wide Python''), the interpreter can natively
! handle Unicode characters from U+000000 to U+110000.  The range of
! legal values for the \function{unichr()} function has been expanded;
! it used to only accept values up to 65535, but in 2.2 will accept
! values from 0 to 0x110000.  Using a ``narrow Python'', an interpreter
! compiled to use UTF-16, values greater than 65535 will result in
! \function{unichr()} returning a string of length 2:

  \begin{verbatim}
+ >>> s = unichr(65536)
+ >>> s
+ u'\ud800\udc00'
+ >>> len(s)
+ 2
+ \end{verbatim}
+ 
+ This possibly-confusing behaviour, breaking the intuitive invariant
+ that \function{chr()} and\function{unichr()} always return strings of
+ length 1, may be changed later in 2.2 depending on public reaction.
+ 
+ All this is the province of the still-unimplemented PEP 261, ``Support
+ for `wide' Unicode characters''; consult it for further details, and
+ please offer comments and suggestions on the proposal it describes.
+ 
+ Another change is much simpler to explain. Since their introduction,
+ Unicode strings have supported an \method{encode()} method to convert
+ the string to a selected encoding such as UTF-8 or Latin-1.  A
+ symmetric \method{decode(\optional{\var{encoding}})} method has been
+ added to 8-bit strings (though not to Unicode strings) in 2.2.
+ \method{decode()} assumes that the string is in the specified encoding
+ and decodes it, returning whatever is returned by the codec. 
+ 
+ Using this new feature, codecs have been added for tasks not directly
+ related to Unicode.  For example, codecs have been added for
+ uu-encoding, MIME's base64 encoding, and compression with the
+ \module{zlib} module:
+ 
+ \begin{verbatim}
  >>> s = """Here is a lengthy piece of redundant, overly verbose,
  ... and repetitive text.
***************
*** 611,614 ****
--- 625,637 ----
    L. Drake, Jr.)

+   \item Another low-level API, primarily of interest to implementors
+   of Python debuggers and development tools, was added.
+   \cfunction{PyInterpreterState_Head()} and
+   \cfunction{PyInterpreterState_Next()} let a caller walk through all
+   the existing interpreter objects;
+   \cfunction{PyInterpreterState_ThreadHead()} and
+   \cfunction{PyThreadState_Next()} allow looping over all the thread
+   states for a given interpreter.  (Contributed by David Beazley.)
+ 
    % XXX is this explanation correct?  
    \item When presented with a Unicode filename on Windows, Python will
***************
*** 669,673 ****
  The author would like to thank the following people for offering
  suggestions and corrections to various drafts of this article: Fred
! Bremmer, Fred L. Drake, Jr., Tim Peters, Neil Schemenauer.  

  \end{document}
--- 692,697 ----
  The author would like to thank the following people for offering
  suggestions and corrections to various drafts of this article: Fred
! Bremmer, Fred L. Drake, Jr., Marc-Andr\'e Lemburg,
! Tim Peters, Neil Schemenauer, Guido van Rossum.  

  \end{document}