|
From: A.M. K. <aku...@us...> - 2003-04-10 14:18:40
|
Update of /cvsroot/py-howto/pyhowto
In directory sc8-pr-cvs1:/tmp/cvs-serv11351
Modified Files:
regex.tex
Log Message:
[Patch #718809 from Jarno Virtanen] Various minor corrections to regex.tex; I've also made a few more minor rewrites.
Index: regex.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/regex.tex,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -r1.17 -r1.18
*** regex.tex 7 Apr 2003 19:51:23 -0000 1.17
--- regex.tex 10 Apr 2003 14:18:34 -0000 1.18
***************
*** 35,40 ****
Perl-style regular expression patterns. Earlier versions of Python
came with the \module{regex} module, which provides Emacs-style
! patterns. Emacs-style patterns are slightly less readable, and
! doesn't provide as many features, so there's not much reason to use
the \module{regex} module when writing new code, though you might
encounter old code that uses it.
--- 35,40 ----
Perl-style regular expression patterns. Earlier versions of Python
came with the \module{regex} module, which provides Emacs-style
! patterns. Emacs-style patterns are slightly less readable and
! don't provide as many features, so there's not much reason to use
the \module{regex} module when writing new code, though you might
encounter old code that uses it.
***************
*** 215,219 ****
it can at first, and if no match is found it will then progressively
back up and retry the rest of the RE again and again. It will back up
! until it's tried zero matches for \regexp{[bcd]*}, and if that
subsequently fails, the engine will conclude that the string doesn't
match the RE at all.
--- 215,219 ----
it can at first, and if no match is found it will then progressively
back up and retry the rest of the RE again and again. It will back up
! until it has tried zero matches for \regexp{[bcd]*}, and if that
subsequently fails, the engine will conclude that the string doesn't
match the RE at all.
***************
*** 245,249 ****
earlier, but that might as well be infinity.
! Readers of a reductionist bent may notice that the 3 other qualifiers
can all be expressed using this notation. \regexp{\{0,\}} is the same
as \regexp{*}, \regexp{\{1,\}} is equivalent to \regexp{+}, and
--- 245,249 ----
earlier, but that might as well be infinity.
! Readers of a reductionist bent may notice that the three other qualifiers
can all be expressed using this notation. \regexp{\{0,\}} is the same
as \regexp{*}, \regexp{\{1,\}} is equivalent to \regexp{+}, and
***************
*** 348,354 ****
\begin{tableii}{c|l}{code}{Method/Attribute}{Purpose}
! \lineii{match}{Determine if the RE matches at the beginning of
the string.}
! \lineii{search}{Scan through a string, looking for any location
where this RE matches.}
\lineii{findall()}{Find all substrings where the RE matches,
--- 348,354 ----
\begin{tableii}{c|l}{code}{Method/Attribute}{Purpose}
! \lineii{match()}{Determine if the RE matches at the beginning of
the string.}
! \lineii{search()}{Scan through a string, looking for any location
where this RE matches.}
\lineii{findall()}{Find all substrings where the RE matches,
***************
*** 997,1001 ****
\end{verbatim}
! \subsection{Other Assertions}
Another zero-width assertion is the lookahead assertion. Lookahead
--- 997,1001 ----
\end{verbatim}
! \subsection{Lookahead Assertions}
Another zero-width assertion is the lookahead assertion. Lookahead
***************
*** 1016,1020 ****
\end{itemize}
! An example will help make this concrete and will demonstrate a case
where a lookahead is useful. Consider a simple pattern to match a
filename and split it apart into a base name and an extension,
--- 1016,1020 ----
\end{itemize}
! An example will help make this concrete by demonstrating a case
where a lookahead is useful. Consider a simple pattern to match a
filename and split it apart into a base name and an extension,
***************
*** 1022,1030 ****
is the base name, and \samp{rc} is the filename's extension.
! The pattern to match this is quite simple: \regexp{.*[.].*\$}.
! (Notice that the \samp{.} needs to be treated specially because it's a
metacharacter; I've put it inside a character class. Also notice the
trailing \regexp{\$}; this is added to ensure that all the rest of the
! string must be included in the extension.) This regular expression
matches \samp{foo.bar} and \samp{autoexec.bat} and \samp{sendmail.cf} and
\samp{printers.conf}.
--- 1022,1033 ----
is the base name, and \samp{rc} is the filename's extension.
! The pattern to match this is quite simple:
!
! \regexp{.*[.].*\$}
!
! Notice that the \samp{.} needs to be treated specially because it's a
metacharacter; I've put it inside a character class. Also notice the
trailing \regexp{\$}; this is added to ensure that all the rest of the
! string must be included in the extension. This regular expression
matches \samp{foo.bar} and \samp{autoexec.bat} and \samp{sendmail.cf} and
\samp{printers.conf}.
***************
*** 1037,1045 ****
% $
! First attempt: Exclude \samp{bat} by requiring that the first
! character of the extension is not a \samp{b}. This is wrong, because it
! also doesn't match \samp{foo.bar}.
! \regexp{.*[.]([\^b]..|.[\^a].|..[\^t])\$}
The expression gets messier when you try to patch up the first
--- 1040,1049 ----
% $
! The first attempt above tries to exclude \samp{bat} by requiring that
! the first character of the extension is not a \samp{b}. This is
! wrong, because the pattern also doesn't match \samp{foo.bar}.
! % Messes up the HTML without the curly braces around \^
! \regexp{.*[.]([{\^}b]..|.[{\^}a].|..[{\^}t])\$}
The expression gets messier when you try to patch up the first
***************
*** 1048,1056 ****
\samp{a}; or the third character isn't \samp{t}. This accepts
\samp{foo.bar} and rejects \samp{autoexec.bat}, but it requires a
! three-letter extension, and doesn't accept \samp{sendmail.cf}.
! Another bug, so we'll complicate the pattern again in an effort to fix
! it.
! \regexp{.*[.]([\^b].?.?|.[\^a]?.?|..?[\^t]?)\$}
In the third attempt, the second and third letters are all made
--- 1052,1060 ----
\samp{a}; or the third character isn't \samp{t}. This accepts
\samp{foo.bar} and rejects \samp{autoexec.bat}, but it requires a
! three-letter extension and won't accept a filename with a two-letter
! extension such as \samp{sendmail.cf}. We'll complicate the pattern
! again in an effort to fix it.
! \regexp{.*[.]([{\^}b].?.?|.[{\^}a]?.?|..?[{\^}t]?)\$}
In the third attempt, the second and third letters are all made
***************
*** 1059,1081 ****
The pattern's getting really complicated now, which makes it hard to
! read and understand. Worse, this solution doesn't scale well; if the
! problem changes and you want to exclude both \samp{bat} and \samp{exe}
! as extensions, the pattern would get still more complicated and
! confusing.
!
! A negative lookahead cuts through all this. Go back to the original
! pattern, and, before the \regexp{.*} which matches the extension,
! insert \regexp{(?!bat\$)}. This means: if the expression \regexp{bat}
! doesn't match at this point, try the rest of the pattern; if
! \regexp{bat\$} does match, the whole pattern will fail. The trailing
! \regexp{\$} is required to ensure that something like
! \samp{sample.batch}, where the extension only starts with \samp{bat},
! will be allowed.
!
! After this modification, the whole pattern is
! \regexp{.*[.](?!bat\$).*\$}. Excluding another filename extension is
! now easy; simply add it as an alternative inside the assertion.
\regexp{.*[.](?!bat\$|exe\$).*\$}
! excludes both \samp{bat} and \samp{exe}.
--- 1063,1087 ----
The pattern's getting really complicated now, which makes it hard to
! read and understand. Worse, if the problem changes and you want to
! exclude both \samp{bat} and \samp{exe} as extensions, the pattern
! would get even more complicated and confusing.
!
! A negative lookahead cuts through all this:
!
! \regexp{.*[.](?!bat\$).*\$}
! % $
!
! The lookahead means: if the expression \regexp{bat} doesn't match at
! this point, try the rest of the pattern; if \regexp{bat\$} does match,
! the whole pattern will fail. The trailing \regexp{\$} is required to
! ensure that something like \samp{sample.batch}, where the extension
! only starts with \samp{bat}, will be allowed.
!
! Excluding another filename extension is now easy; simply add it as an
! alternative inside the assertion. The following pattern excludes
! filenames that end in either \samp{bat} or \samp{exe}:
!
\regexp{.*[.](?!bat\$|exe\$).*\$}
! % $
***************
*** 1087,1093 ****
\begin{tableii}{c|l}{code}{Method/Attribute}{Purpose}
! \lineii{split}{Split the string into a list, splitting it wherever the RE matches}
! \lineii{sub}{Find all substrings where the RE matches, and replace them with a different string}
! \lineii{subn}{Does the same thing as \method{sub()},
but returns the new string and the number of replacements}
\end{tableii}
--- 1093,1099 ----
\begin{tableii}{c|l}{code}{Method/Attribute}{Purpose}
! \lineii{split()}{Split the string into a list, splitting it wherever the RE matches}
! \lineii{sub()}{Find all substrings where the RE matches, and replace them with a different string}
! \lineii{subn()}{Does the same thing as \method{sub()},
but returns the new string and the number of replacements}
\end{tableii}
***************
*** 1193,1197 ****
\end{verbatim}
! Empty matches are replaced only when not they're not
adjacent to a previous match.
--- 1199,1203 ----
\end{verbatim}
! Empty matches are replaced only when they're not
adjacent to a previous match.
***************
*** 1223,1229 ****
There's also a syntax for referring to named groups as defined by the
\regexp{(?P<name>...)} syntax. \samp{\e g<name>} will use the
! substring matched by the group named \samp{name}, and \samp{\e
! g<\var{number}>} uses the corresponding group number. \samp{\e g<2>}
! is therefore equivalent to \samp{\e 2}, but isn't ambiguous in a
replacement string such as \samp{\e g<2>0}. (\samp{\e 20} would be
interpreted as a reference to group 20, not a reference to group 2
--- 1229,1237 ----
There's also a syntax for referring to named groups as defined by the
\regexp{(?P<name>...)} syntax. \samp{\e g<name>} will use the
! substring matched by the group named \samp{name}, and
! \samp{\e g<\var{number}>}
! uses the corresponding group number.
! \samp{\e g<2>} is therefore equivalent to \samp{\e 2},
! but isn't ambiguous in a
replacement string such as \samp{\e g<2>0}. (\samp{\e 20} would be
interpreted as a reference to group 20, not a reference to group 2
***************
*** 1303,1309 ****
from a string or replacing it with another single character. You
might do this with something like \code{re.sub('\e n', ' ', S)}, but
! \method{translate()} is capable of doing both these tasks,
! and will be much faster that any regular expression operation can ever
! be.
In short, before turning to the \module{re} module, consider whether
--- 1311,1316 ----
from a string or replacing it with another single character. You
might do this with something like \code{re.sub('\e n', ' ', S)}, but
! \method{translate()} is capable of doing both tasks
! and will be faster that any regular expression operation can be.
In short, before turning to the \module{re} module, consider whether
***************
*** 1347,1351 ****
starting character, only trying the full match if a \character{C} is found.
! Adding \regexp{.*} defeats this optimization, and requires scanning to
the end of the string and then backtracking to find a match for the
rest of the RE. Use \function{re.search()} instead.
--- 1354,1358 ----
starting character, only trying the full match if a \character{C} is found.
! Adding \regexp{.*} defeats this optimization, requiring scanning to
the end of the string and then backtracking to find a match for the
rest of the RE. Use \function{re.search()} instead.
|