Re: [Pyparsing] Remove dependency on xml.sax.saxutils?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Mike -

Thanks for this submission, I see no reason why I wouldn't just drop this
into the main pyparsing code - it seems to conditionalize around the
presence/absence of xml.sax.saxutils very nicely.

My question is more about just how minimal/lame xml.sax.saxutils.escape
actually seems to be.  In the list of common HTML entities defined later in
pyparsing.py, I also include a mapping for '"' to "&quot;", but xml...escape
does not handle that case.  There is also handling of an optional dict,
which if provided calls __dict_replace, which is not implemented.  I think I
am less interested in a verbatim copy of xml...escape than I am in having
one that does a decent job of escaping - I think maybe I am more picky about
this code since it would actually become part of the pyparsing source.  

So I think I will just discard importing and using xml.sax.saxutils.escape
altogether, and replace it with xml_escape, which will be implemented as:

    def xml_escape(data):
        """Escape &, <, >, ", etc. in a string of data."""

        # ampersand must be replaced first
        from_symbols = '&><"'
        to_symbols = ['&'+s+';' for s in "amp gt lt quot".split()]
        for from_,to_ in zip(from_symbols, to_symbols):
            data = data.replace(from_, to_)
        return data

This handles the 4 special entities defined in HTML 2.0
(http://www.w3.org/MarkUp/html-spec/html-spec_9.html#SEC9.7).

-- Paul

(On further review, I see that I was erroneously mapping ' to &quot; instead
of " - I'll have that fix along with xml_escape posted to SVN shortly.)