|
From: Artem Y. <ne...@gm...> - 2008-07-04 13:05:23
|
Yuri Takhteyev wrote: > Interesting. It looks like lxml is way way faster than ElementTree. > Also, the website for lxml seems to suggest that ElementTree has some > serious problems in handling unicode > (http://codespeak.net/lxml/compatibility.html, third bullet). This > really worries me, more so than performance. This may not affect us, > but we need to make sure that ElementTree can handle unicode properly > if we would be using it. However, it looks like lxml is included with > nothing at this point, and would require building stuff from C, which > may raise the bar for using markdown... > lxml supports ElementTree API, so we could write something like this: try: from lxml import etree print "running with lxml.etree" except ImportError: try: # Python 2.5 import xml.etree.cElementTree as etree print "running with cElementTree on Python 2.5+" except ImportError: try: # Python 2.5 import xml.etree.ElementTree as etree print "running with ElementTree on Python 2.5+" except ImportError: try: # normal cElementTree install import cElementTree as etree print "running with cElementTree" except ImportError: try: # normal ElementTree install import elementtree.ElementTree as etree print "running with ElementTree" except ImportError: print "Failed to import ElementTree from any known place" We can suggest to use lxml, but by default cElementTree will be used on python 2.5 I didn't get what the real problem with unicode is, there are some general words at lxml site, and I think if the problem had been quite serious, ElementTree wouldn't have included in standard Python library. I tried some test with russian unicode data - didin't find any problems yet, but I think this issue need more proper investigation. |