|
From: David W. <wo...@cs...> - 2008-07-03 20:30:13
|
On 3-Jul-08, at 4:57 PM, Waylan Limberg wrote:
> Just remember the c variation will not work on IronPython, Jython or
> other python implementations. Additionally, there may be (shared) web
> hosts which will allow a user to copy a pure python module to the
> server, but not compile a c module. So if we do switch, the import
> should be in a try...except block and import the non-c variation if c
> is not available. In such a situation, would ElementTree give us any
> advantage?
My marginally educated but entirely untested guess is that it would
still be a bit faster... And certainly within the same order of
magnitude.
Now, the problem: ElementTree will get really, really upset about
invalid XHTML.
Really upset.
So you'll need to figure out how you'll handle bad input (unclosed
tags, improper quoting, etc):
>>> from elementtree import ElementTree as ET
>>> from StringIO import StringIO as S
>>> ET.parse(S("<a name=foo>bar</a>"))
...
ExpatError: not well-formed (invalid token): line 1, column 8
Not happy :(
As far as I can tell, the only way to get around that sort of problem
would be using BeautifulSoup to parse the input... But there goes any
hope of a significant speedup.
(now, of course, something like this is always possible:
try:
doc = ET.parse(input)
except ExpatError:
doc = bs_to_et(BeautifulSoup(input))
But I can't speak of the advantages or disadvantages with any more
authority than you could)
|