[Py-howto-checkins] CVS: pyhowto regex.tex,1.8,1.9

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/py-howto/pyhowto
In directory usw-pr-cvs1:/tmp/cvs-serv29333

Modified Files:
	regex.tex 
Log Message:

Fix re.VERSION-modified RE; "#" as part of the pattern was not escaped.
Closes SF bug #416374.

Wrap some wide paragraphs.

Remove extraneous "%" characters from otherwise blank lines after verbatim
environments, except in a couple of places where we needed to bow to
font-lock.  ;-(

Index: regex.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/regex.tex,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -r1.8 -r1.9
*** regex.tex	2000/07/28 02:06:27	1.8
--- regex.tex	2001/04/23 16:54:49	1.9
***************
*** 70,74 ****

  We'll start by learning about the simplest possible regular
! expressions.  Since regular expressions are used to operate on strings, we'll start with the most common task: matching characters.

  For a detailed explanation of the computer science underlying regular
--- 70,75 ----

  We'll start by learning about the simplest possible regular
! expressions.  Since regular expressions are used to operate on
! strings, we'll start with the most common task: matching characters.

  For a detailed explanation of the computer science underlying regular
***************
*** 90,100 ****
  devoted to discussing various metacharacters and what they do.

! Here's a complete list of the metacharacters; their meanings will be discussed 
! in the rest of this HOWTO.

  \begin{verbatim}
  . ^ $ * + ? { [ \ | ( )
  \end{verbatim}
! %
  The first metacharacter we'll look at is \samp{[}; it's used for
  specifying a character class, which is a set of characters that you
--- 91,102 ----
  devoted to discussing various metacharacters and what they do.

! Here's a complete list of the metacharacters; their meanings will be
! discussed in the rest of this HOWTO.

  \begin{verbatim}
  . ^ $ * + ? { [ \ | ( )
  \end{verbatim}
! % $
! 
  The first metacharacter we'll look at is \samp{[}; it's used for
  specifying a character class, which is a set of characters that you
***************
*** 107,114 ****
  RE would be \regexp{[a-z]}.

! Metacharacters are not active inside classes.  For example, \regexp{[akm\$]}
! will match any of the characters \character{a}, \character{k},
! \character{m}, or \character{\$}; \character{\$} is usually a metacharacter, but inside a character class it's stripped
! of its special nature.

  You can match the characters not within a range by \dfn{complementing}
--- 109,117 ----
  RE would be \regexp{[a-z]}.

! Metacharacters are not active inside classes.  For example,
! \regexp{[akm\$]} will match any of the characters \character{a},
! \character{k}, \character{m}, or \character{\$}; \character{\$} is
! usually a metacharacter, but inside a character class it's stripped of
! its special nature.

  You can match the characters not within a range by \dfn{complementing}
***************
*** 134,150 ****
  \item[\code{\e d}]Matches any decimal digit; this is
  equivalent to the class \regexp{[0-9]}.
! %
  \item[\code{\e D}]Matches any non-digit character; this is
  equivalent to the class \verb|[^0-9]|.
! %
  \item[\code{\e s}]Matches any whitespace character; this is
  equivalent to the class \regexp{[ \e t\e n\e r\e f\e v]}.
! %
  \item[\code{\e S}]Matches any non-whitespace character; this is
  equivalent to the class \verb|[^ \t\n\r\f\v]|.
! %
  \item[\code{\e w}]Matches any alphanumeric character; this is equivalent to the class
  \regexp{[a-zA-Z0-9_]}.  
! %
  \item[\code{\e W}]Matches any non-alphanumeric character; this is equivalent to the class
  \verb|[^a-zA-Z0-9_]|.   
--- 137,153 ----
  \item[\code{\e d}]Matches any decimal digit; this is
  equivalent to the class \regexp{[0-9]}.
! 
  \item[\code{\e D}]Matches any non-digit character; this is
  equivalent to the class \verb|[^0-9]|.
! 
  \item[\code{\e s}]Matches any whitespace character; this is
  equivalent to the class \regexp{[ \e t\e n\e r\e f\e v]}.
! 
  \item[\code{\e S}]Matches any non-whitespace character; this is
  equivalent to the class \verb|[^ \t\n\r\f\v]|.
! 
  \item[\code{\e w}]Matches any alphanumeric character; this is equivalent to the class
  \regexp{[a-zA-Z0-9_]}.  
! 
  \item[\code{\e W}]Matches any non-alphanumeric character; this is equivalent to the class
  \verb|[^a-zA-Z0-9_]|.   
***************
*** 272,276 ****
  <re.RegexObject instance at 80b4150>
  \end{verbatim}
! %
  \function{re.compile()} also accepts an optional \var{flags}
  argument, used to enable various special features and syntax
--- 275,279 ----
  <re.RegexObject instance at 80b4150>
  \end{verbatim}
! 
  \function{re.compile()} also accepts an optional \var{flags}
  argument, used to enable various special features and syntax
***************
*** 281,285 ****
  >>> p = re.compile('ab*', re.IGNORECASE)
  \end{verbatim}
! %
  The RE is passed to \function{re.compile()} as a string.
  REs are handled as strings because regular expressions aren't
--- 284,288 ----
  >>> p = re.compile('ab*', re.IGNORECASE)
  \end{verbatim}
! 
  The RE is passed to \function{re.compile()} as a string.
  REs are handled as strings because regular expressions aren't
***************
*** 379,383 ****
  <re.RegexObject instance at 80c3c28>
  \end{verbatim}
! %
  Now, you can try matching various strings against the RE
  \regexp{[a-z]+}.  An empty string shouldn't match at all, since
--- 382,386 ----
  <re.RegexObject instance at 80c3c28>
  \end{verbatim}
! 
  Now, you can try matching various strings against the RE
  \regexp{[a-z]+}.  An empty string shouldn't match at all, since
***************
*** 392,396 ****
  None
  \end{verbatim}
! %
  Now, let's try it on a string that it should match, such as
  \samp{tempo}.  In this case, \method{match()} will return a
--- 395,399 ----
  None
  \end{verbatim}
! 
  Now, let's try it on a string that it should match, such as
  \samp{tempo}.  In this case, \method{match()} will return a
***************
*** 403,407 ****
  <re.MatchObject instance at 80c4f68>
  \end{verbatim}
! %
  Now you can query the \class{MatchObject} for information about the
  matching string.   \class{MatchObject} instances also have several
--- 406,410 ----
  <re.MatchObject instance at 80c4f68>
  \end{verbatim}
! 
  Now you can query the \class{MatchObject} for information about the
  matching string.   \class{MatchObject} instances also have several
***************
*** 425,429 ****
  (0, 5)
  \end{verbatim}
! %
  \method{group()} returns the substring that was matched by the
  RE.  \method{start()} and \method{end()} return the starting and
--- 428,432 ----
  (0, 5)
  \end{verbatim}
! 
  \method{group()} returns the substring that was matched by the
  RE.  \method{start()} and \method{end()} return the starting and
***************
*** 445,449 ****
  (4, 11)
  \end{verbatim}
! %
  In actual programs, the most common style is to store the
  \class{MatchObject} in a variable, and then check if it was
--- 448,452 ----
  (4, 11)
  \end{verbatim}
! 
  In actual programs, the most common style is to store the
  \class{MatchObject} in a variable, and then check if it was
***************
*** 458,462 ****
      print 'No match'
  \end{verbatim}
! %
  \subsection{Module-Level Functions}

--- 461,465 ----
      print 'No match'
  \end{verbatim}
! 
  \subsection{Module-Level Functions}

***************
*** 475,479 ****
  <re.MatchObject instance at 80c5978>
  \end{verbatim}
! %
  Under the hood, these functions simply produce a \class{RegexObject}
  for you and call the appropriate method on it.  They also store the
--- 478,482 ----
  <re.MatchObject instance at 80c5978>
  \end{verbatim}
! 
  Under the hood, these functions simply produce a \class{RegexObject}
  for you and call the appropriate method on it.  They also store the
***************
*** 498,502 ****
  starttagopen = re.compile( ... )
  \end{verbatim}
! %
  (I generally prefer to work with the compiled object, even for
  one-time uses, but few people will be as much of a purist about this
--- 501,505 ----
  starttagopen = re.compile( ... )
  \end{verbatim}
! 
  (I generally prefer to work with the compiled object, even for
  one-time uses, but few people will be as much of a purist about this
***************
*** 594,598 ****
  \begin{verbatim}
  charref = re.compile(r"""
!  &#		     # Start of a numeric entity reference
   (?P<char>      
     [0-9]+[^0-9]      # Decimal form
--- 597,601 ----
  \begin{verbatim}
  charref = re.compile(r"""
!  &\#		     # Start of a numeric entity reference
   (?P<char>      
     [0-9]+[^0-9]      # Decimal form
***************
*** 602,606 ****
  """, re.VERBOSE)
  \end{verbatim}
! %
  Without the verbose setting, the RE would look like this:
  \begin{verbatim}
--- 605,609 ----
  """, re.VERBOSE)
  \end{verbatim}
! 
  Without the verbose setting, the RE would look like this:
  \begin{verbatim}
***************
*** 609,613 ****
                       "|x[0-9a-fA-F]+[^0-9a-fA-F])")
  \end{verbatim}
! %
  In the above example, Python's automatic concatenation of string literals has been used to
  break up the RE into smaller pieces, but it's still more difficult to
--- 612,616 ----
                       "|x[0-9a-fA-F]+[^0-9a-fA-F])")
  \end{verbatim}
! 
  In the above example, Python's automatic concatenation of string literals has been used to
  break up the RE into smaller pieces, but it's still more difficult to
***************
*** 639,643 ****

  \begin{list}{}{}
! %
  \item[\regexp{|}] 
  Alternation, or the ``or'' operator.  
--- 642,646 ----

  \begin{list}{}{}
! 
  \item[\regexp{|}] 
  Alternation, or the ``or'' operator.  
***************
*** 651,655 ****
  To match a literal \character{|},
  use \regexp{\e|}, or enclose it inside a character class, as in \regexp{[|]}.
! %
  \item[\regexp{\^}] Matches at the beginning of lines.  Unless the
  \constant{MULTILINE} flag has been set, this will only match at the
--- 654,658 ----
  To match a literal \character{|},
  use \regexp{\e|}, or enclose it inside a character class, as in \regexp{[|]}.
! 
  \item[\regexp{\^}] Matches at the beginning of lines.  Unless the
  \constant{MULTILINE} flag has been set, this will only match at the
***************
*** 670,674 ****
  use \regexp{\e\^}, or enclose it inside a character class, as in  
  \regexp{[{\e}\^]}.
! %
  \item[\regexp{\$}] Matches at the end of lines, which is defined as
  either the end of the string, or any location followed by a newline
--- 673,677 ----
  use \regexp{\e\^}, or enclose it inside a character class, as in  
  \regexp{[{\e}\^]}.
! 
  \item[\regexp{\$}] Matches at the end of lines, which is defined as
  either the end of the string, or any location followed by a newline
***************
*** 683,690 ****
  <re.MatchObject instance at 80adfa8>
  \end{verbatim}
! %
! To match a literal \character{\$},
! use \regexp{\e\$}, or enclose it inside a character class, as in  \regexp{[\$]}.
! %
  \item[\regexp{\e A}] Matches only at the start of the string.  When not
  in \constant{MULTILINE} mode, \regexp{\e A} and \regexp{\^} are effectively
--- 686,694 ----
  <re.MatchObject instance at 80adfa8>
  \end{verbatim}
! % $
! 
! To match a literal \character{\$}, use \regexp{\e\$}, or enclose it
! inside a character class, as in  \regexp{[\$]}.
! 
  \item[\regexp{\e A}] Matches only at the start of the string.  When not
  in \constant{MULTILINE} mode, \regexp{\e A} and \regexp{\^} are effectively
***************
*** 693,699 ****
  \regexp{\^} may match at several locations inside the string (anywhere
  following a newline character).
! %
  \item[\regexp{\e Z}]Matches only at the end of the string.  
! %
  \item[\regexp{\e b}] Word boundary.  
  This is a zero-width assertion that matches only at the
--- 697,703 ----
  \regexp{\^} may match at several locations inside the string (anywhere
  following a newline character).
! 
  \item[\regexp{\e Z}]Matches only at the end of the string.  
! 
  \item[\regexp{\e b}] Word boundary.  
  This is a zero-width assertion that matches only at the
***************
*** 714,718 ****
  None
  \end{verbatim}
! %
  There are two subtleties you should remember when using this special
  sequence.  First, this is the worst collision between Python's string
--- 718,722 ----
  None
  \end{verbatim}
! 
  There are two subtleties you should remember when using this special
  sequence.  First, this is the worst collision between Python's string
***************
*** 731,743 ****
  <re.MatchObject instance at 80c3ee0>
  \end{verbatim}
! %
  Second, inside a character class, where there's no use for this
  assertion, \regexp{\e b} represents the backspace character, for
  compatibility with Python's string literals.
! %
  \item[\regexp{\e B}] Another zero-width assertion, this is the
  opposite of \regexp{\e b}, only matching when the current
  position is not at a word boundary.
! %
  \end{list}

--- 735,747 ----
  <re.MatchObject instance at 80c3ee0>
  \end{verbatim}
! 
  Second, inside a character class, where there's no use for this
  assertion, \regexp{\e b} represents the backspace character, for
  compatibility with Python's string literals.
! 
  \item[\regexp{\e B}] Another zero-width assertion, this is the
  opposite of \regexp{\e b}, only matching when the current
  position is not at a word boundary.
! 
  \end{list}

***************
*** 927,931 ****
  'Lots'
  \end{verbatim}
! %
  Named groups are handy because they let you use easily-remembered
  names, instead of having to remember numbers.  Here's an example RE
--- 931,935 ----
  'Lots'
  \end{verbatim}
! 
  Named groups are handy because they let you use easily-remembered
  names, instead of having to remember numbers.  Here's an example RE
***************
*** 940,944 ****
          r'"')
  \end{verbatim}
! %
  It's obviously much easier to retrieve \code{m.group('zonem')},
  instead of having to remember to retrieve group 9.
--- 944,948 ----
          r'"')
  \end{verbatim}
! 
  It's obviously much easier to retrieve \code{m.group('zonem')},
  instead of having to remember to retrieve group 9.
***************
*** 997,1000 ****
--- 1001,1005 ----

  \verb|.*[.][^b].*$|
+ % $

  First attempt: Exclude \samp{bat} by requiring that the first
***************
*** 1007,1014 ****
  The expression gets messier when you try to patch up the first
  solution by requiring one of the following cases to match: the first
! character of the extension isn't
! \samp{b}; the second character isn't \samp{a}; or the third
! character isn't \samp{t}.  This accepts \samp{foo.bar} and rejects
! \samp{autoexec.bat}, but it requires a three-letter extension, and doesn't accept \samp{sendmail.cf}.  Another bug, so we'll complicate the pattern again in an effort to fix it.

  \regexp{.*[.]([\^b].?.?|.[\^a]?.?|..?[\^t]?)\$}
--- 1012,1021 ----
  The expression gets messier when you try to patch up the first
  solution by requiring one of the following cases to match: the first
! character of the extension isn't \samp{b}; the second character isn't
! \samp{a}; or the third character isn't \samp{t}.  This accepts
! \samp{foo.bar} and rejects \samp{autoexec.bat}, but it requires a
! three-letter extension, and doesn't accept \samp{sendmail.cf}.
! Another bug, so we'll complicate the pattern again in an effort to fix
! it.

  \regexp{.*[.]([\^b].?.?|.[\^a]?.?|..?[\^t]?)\$}
***************
*** 1068,1072 ****
  returned as the final element of the list.  In the following example,
  the delimiter will be any sequence of non-alphanumeric characters.
! %
  \begin{verbatim}
  >>> p = re.compile(r'\W+')
--- 1075,1079 ----
  returned as the final element of the list.  In the following example,
  the delimiter will be any sequence of non-alphanumeric characters.
! 
  \begin{verbatim}
  >>> p = re.compile(r'\W+')
***************
*** 1076,1080 ****
  ['This', 'is', 'a', 'test, short and sweet, of split().']
  \end{verbatim}
! %
  Sometimes you're not only interested in what the text between
  delimiters is, but also need to know what the delimiter was.  If
--- 1083,1087 ----
  ['This', 'is', 'a', 'test, short and sweet, of split().']
  \end{verbatim}
! 
  Sometimes you're not only interested in what the text between
  delimiters is, but also need to know what the delimiter was.  If
***************
*** 1090,1094 ****
  ['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']
  \end{verbatim}
! %
  The module-level function \function{re.split()} adds the RE to be
  used as the first argument, but is otherwise the same.  
--- 1097,1101 ----
  ['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']
  \end{verbatim}
! 
  The module-level function \function{re.split()} adds the RE to be
  used as the first argument, but is otherwise the same.  
***************
*** 1131,1135 ****
  'colour socks and red shoes'
  \end{verbatim}
! %
  Empty matches are replaced only when not they're not
  adjacent to a previous match.  
--- 1138,1142 ----
  'colour socks and red shoes'
  \end{verbatim}
! 
  Empty matches are replaced only when not they're not
  adjacent to a previous match.  
***************
*** 1140,1144 ****
  '-a-b-d-'
  \end{verbatim}
! %
  If \var{replacement} is a string, any backslash escapes in it are
  processed.  That is, \samp{\e n} is converted to a single newline
--- 1147,1151 ----
  '-a-b-d-'
  \end{verbatim}
! 
  If \var{replacement} is a string, any backslash escapes in it are
  processed.  That is, \samp{\e n} is converted to a single newline
***************
*** 1155,1159 ****
  'subsection{First} subsection{second}'
  \end{verbatim}
! %
  In addition to character escapes and backreferences as described
  above, \samp{\e g<name>} will use the substring matched by the group
--- 1162,1166 ----
  'subsection{First} subsection{second}'
  \end{verbatim}
! 
  In addition to character escapes and backreferences as described
  above, \samp{\e g<name>} will use the substring matched by the group
***************
*** 1176,1180 ****
  'subsection{First}'
  \end{verbatim}
! %
  \var{replacement} can also be a function, which gives you even more
  powerful control.  If \var{replacement} is a function, the function is
--- 1183,1187 ----
  'subsection{First}'
  \end{verbatim}
! 
  \var{replacement} can also be a function, which gives you even more
  powerful control.  If \var{replacement} is a function, the function is
***************
*** 1183,1187 ****
  information to compute the desired replacement string and return it.
  For example:
! %
  \begin{verbatim}
  >>> def hexrepl( match ):
--- 1190,1194 ----
  information to compute the desired replacement string and return it.
  For example:
! 
  \begin{verbatim}
  >>> def hexrepl( match ):
***************
*** 1194,1198 ****
  'Call 0xffd2 for printing, 0xc000 for user code.'
  \end{verbatim}
! %
  When using the module-level \function{re.sub()} function, the pattern
  is passed as the first argument.  The pattern may be a string or a
--- 1201,1205 ----
  'Call 0xffd2 for printing, 0xc000 for user code.'
  \end{verbatim}
! 
  When using the module-level \function{re.sub()} function, the pattern
  is passed as the first argument.  The pattern may be a string or a
***************
*** 1260,1264 ****
  None
  \end{verbatim}
! %
  On the other hand, \module{search()} will scan forward through the
  string, reporting the first match it finds.
--- 1267,1271 ----
  None
  \end{verbatim}
! 
  On the other hand, \module{search()} will scan forward through the
  string, reporting the first match it finds.
***************
*** 1270,1274 ****
  (2, 7)
  \end{verbatim}
! %
  Sometimes you'll be tempted to keep using \function{re.match()}, and
  just add \regexp{.*} to the front of your RE.  Resist this tempation,
--- 1277,1281 ----
  (2, 7)
  \end{verbatim}
! 
  Sometimes you'll be tempted to keep using \function{re.match()}, and
  just add \regexp{.*} to the front of your RE.  Resist this tempation,
***************
*** 1303,1307 ****
  <html><head><title>Title</title>
  \end{verbatim}
! %
  The RE matches the \character{<} in \samp{<html>}, and the
  \regexp{.*} consumes the rest of the string.  There's still more left
--- 1310,1314 ----
  <html><head><title>Title</title>
  \end{verbatim}
! 
  The RE matches the \character{<} in \samp{<html>}, and the
  \regexp{.*} consumes the rest of the string.  There's still more left
***************
*** 1324,1328 ****
  <html>
  \end{verbatim}
! %
  \subsection{Not using re.VERBOSE}

--- 1331,1335 ----
  <html>
  \end{verbatim}
! 
  \subsection{Not using re.VERBOSE}

***************
*** 1356,1360 ****
  """, re.VERBOSE)
  \end{verbatim}
! %
  This is far more readable than:

--- 1363,1368 ----
  """, re.VERBOSE)
  \end{verbatim}
! % $
! 
  This is far more readable than:

***************
*** 1362,1366 ****
  pat = re.compile(r"\s*(?P<header>[^:]+)\s*:(?P<value>.*?)\s*$")
  \end{verbatim}
! %
  \section{Feedback}

--- 1370,1375 ----
  pat = re.compile(r"\s*(?P<header>[^:]+)\s*:(?P<value>.*?)\s*$")
  \end{verbatim}
! % $
! 
  \section{Feedback}

***************
*** 1383,1387 ****
  substring matched by the group \emph{cannot} be retrieved after
  performing a match or referenced later in the pattern.
! %
  \item[\code{(?P<\var{name}>...)}] Similar to regular parentheses, but
  the substring matched by the group is accessible via the symbolic group
--- 1392,1396 ----
  substring matched by the group \emph{cannot} be retrieved after
  performing a match or referenced later in the pattern.
! 
  \item[\code{(?P<\var{name}>...)}] Similar to regular parentheses, but
  the substring matched by the group is accessible via the symbolic group
***************
*** 1396,1403 ****
  or \code{m.end('id')}, and also by name in pattern text
  (e.g. \regexp{(?P=id)}) and replacement text (e.g. \code{\e g<id>}).
! %
  \item[\code{(?P=\var{name})}] Matches whatever text was matched by the
  earlier group named \var{name}.
- %

  \item[\code{(?=...)}] Matches if \regexp{...} matches next, but doesn't
--- 1405,1411 ----
  or \code{m.end('id')}, and also by name in pattern text
  (e.g. \regexp{(?P=id)}) and replacement text (e.g. \code{\e g<id>}).
! 
  \item[\code{(?P=\var{name})}] Matches whatever text was matched by the
  earlier group named \var{name}.

  \item[\code{(?=...)}] Matches if \regexp{...} matches next, but doesn't
***************
*** 1405,1409 ****
  example, \regexp{Isaac (?=Asimov)} will match \code{'Isaac~'} only if it's
  followed by \code{'Asimov'}.
! %
  \item[\code{(?!...)}] Matches if \regexp{...} doesn't match next.  This
  is a negative lookahead assertion.  For example,
--- 1413,1417 ----
  example, \regexp{Isaac (?=Asimov)} will match \code{'Isaac~'} only if it's
  followed by \code{'Asimov'}.
! 
  \item[\code{(?!...)}] Matches if \regexp{...} doesn't match next.  This
  is a negative lookahead assertion.  For example,