If special entities such as &lt; (for "<") and &gt;
(for ">") are used in the HTML file, they are NOT
stripped for the text and screen modes (and now for the
DocBook XML mode as well).
As shown below, the issue is in the FixSpecialText
function:
----------------------------------------
def FixSpecialText(text):
"""This is where we strip and/or transform certain
HTML tags
for plain-text output formats"""
# <br> becomes \n
fixed = sub('<br>', '\n', text)
# <a></a> gets stripped (use '?' qualifier for a
non-greedy match)
fixed = sub('<a href=".+?">', '', fixed)
fixed = sub('</a>', '', fixed)
#<i>foo</i> becomes *foo*
fixed = sub('<i>', '*', fixed)
fixed = sub('</i>', '*', fixed)
fixed = sub('<em>', '*', fixed)
fixed = sub('</em>', '*', fixed)
# </li> becomes "; " (so at least we get a
semicolon-separated list)
fixed = sub('</li>', '; ', fixed)
# Need to figure out how to strip out &lt; and &gt; and
# replace them with < and >
#fixed = sub('&lt;', '<', fixed)
#fixed = sub('&gt;', '>', fixed)
# strip everything else
fixed = sub('<.+?>', '', fixed)
return fixed
----------------------------------------
This bug was originally reported in version 2.0 in
April 2001.