Thread: [Pyparsing] Remove dependency on xml.sax.saxutils?
Brought to you by:
ptmcg
From: Michael D. <md...@st...> - 2008-06-04 13:44:41
Attachments:
pyparsing.py.diff
|
I'm porting matplotlib (which uses pyparsing) to the maemo platform, a Linux-based platform for the Nokia Internet Tablets (N770, N800, N880). One thing you often see on these smaller platforms is that the Python standard library has been modularized to save space. In particular on this platform, the Python xml package is distributed separately. It would be nice not to have to depend on the entire set of Python XML libraries just to use pyparsing. pyparsing uses xml.sax.saxutils.escape, which is actually a very straightforward and self-contained function. I've attached a simple patch to include this function in pyparsing.py itself when xml.sax.saxutils can't be imported. I realize this is a fairly uncommon use case, so I'll let you make the judgment call of whether it's worth including in the pyparsing trunk, but I thought it was worth bringing to your attention something that would improve the "portability" of pyparsing. Cheers, Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: Paul M. <pt...@au...> - 2008-06-04 17:41:59
|
Mike - Thanks for this submission, I see no reason why I wouldn't just drop this into the main pyparsing code - it seems to conditionalize around the presence/absence of xml.sax.saxutils very nicely. My question is more about just how minimal/lame xml.sax.saxutils.escape actually seems to be. In the list of common HTML entities defined later in pyparsing.py, I also include a mapping for '"' to """, but xml...escape does not handle that case. There is also handling of an optional dict, which if provided calls __dict_replace, which is not implemented. I think I am less interested in a verbatim copy of xml...escape than I am in having one that does a decent job of escaping - I think maybe I am more picky about this code since it would actually become part of the pyparsing source. So I think I will just discard importing and using xml.sax.saxutils.escape altogether, and replace it with xml_escape, which will be implemented as: def xml_escape(data): """Escape &, <, >, ", etc. in a string of data.""" # ampersand must be replaced first from_symbols = '&><"' to_symbols = ['&'+s+';' for s in "amp gt lt quot".split()] for from_,to_ in zip(from_symbols, to_symbols): data = data.replace(from_, to_) return data This handles the 4 special entities defined in HTML 2.0 (http://www.w3.org/MarkUp/html-spec/html-spec_9.html#SEC9.7). -- Paul (On further review, I see that I was erroneously mapping ' to " instead of " - I'll have that fix along with xml_escape posted to SVN shortly.) |
From: Michael D. <md...@st...> - 2008-06-04 18:04:14
|
Looks fine to me. Certainly addresses my original issue, and then some. Cheers, Mike Paul McGuire wrote: > Mike - > > Thanks for this submission, I see no reason why I wouldn't just drop this > into the main pyparsing code - it seems to conditionalize around the > presence/absence of xml.sax.saxutils very nicely. > > My question is more about just how minimal/lame xml.sax.saxutils.escape > actually seems to be. In the list of common HTML entities defined later in > pyparsing.py, I also include a mapping for '"' to """, but xml...escape > does not handle that case. There is also handling of an optional dict, > which if provided calls __dict_replace, which is not implemented. I think I > am less interested in a verbatim copy of xml...escape than I am in having > one that does a decent job of escaping - I think maybe I am more picky about > this code since it would actually become part of the pyparsing source. > > So I think I will just discard importing and using xml.sax.saxutils.escape > altogether, and replace it with xml_escape, which will be implemented as: > > def xml_escape(data): > """Escape &, <, >, ", etc. in a string of data.""" > > # ampersand must be replaced first > from_symbols = '&><"' > to_symbols = ['&'+s+';' for s in "amp gt lt quot".split()] > for from_,to_ in zip(from_symbols, to_symbols): > data = data.replace(from_, to_) > return data > > This handles the 4 special entities defined in HTML 2.0 > (http://www.w3.org/MarkUp/html-spec/html-spec_9.html#SEC9.7). > > -- Paul > > (On further review, I see that I was erroneously mapping ' to " instead > of " - I'll have that fix along with xml_escape posted to SVN shortly.) > > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |