On 4-Jul-08, at 10:05 AM, Artem Yunusov wrote:
> Yuri Takhteyev wrote:
>> Interesting. It looks like lxml is way way faster than ElementTree.
>> Also, the website for lxml seems to suggest that ElementTree has some
>> serious problems in handling unicode
>> (http://codespeak.net/lxml/compatibility.html, third bullet). This
>> really worries me, more so than performance. This may not affect us,
>> but we need to make sure that ElementTree can handle unicode properly
>> if we would be using it. However, it looks like lxml is included
>> with
>> nothing at this point, and would require building stuff from C, which
>> may raise the bar for using markdown...
> lxml supports ElementTree API, so we could write something like this:
> try:
> from lxml import etree
> print "running with lxml.etree"
> ...
> except ImportError:
> print "Failed to import ElementTree from any known place"
>
> We can suggest to use lxml, but by default cElementTree will be
> used on
> python 2.5
I'd agree that this is the best way to go.
From what I've read and heard, lxml is faster/better, but it's also
not standard and I went through hell trying to install it about a
month ago (I don't think I succeeded either...)
Also, as far as ElementTree's handling of Unicode: in the twelve
months I was working on DrProject, which uses ElementTree for all
sorts of things, I can't remember any problems giving it Unicode (and
part of my work was getting 100% Unicode support).
|