python-markdown-discuss Mailing List for Python Markdown (Page 30)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> cElementTree ~13% faster than NanoDOM, and uses memory in 4.5 times less.
> lxml is ~4% faster than cElementTree, but cElementTree wins in memory
> usage(two times less)
> ElementTree little bit faster then NanoDOM, and ElementTree also wins in
> memory usage(2.5 times less)

That's good!

> Concerning the html/xtml output, I discovered that this option  supports
> only by new versions of ElementTree(1.3) and lxlm(2.0), so it won't be
> available for now on standard Python 2.5 ElementTree. Maybe we can do it
> optional.

Again, I wouldn't worry too much about this.  If someone wants HTML
output, converting XHTML to HTML4 should be easy enough.

> There is one problem with lxml: misc/boldlinks test cause such error:
>
>  File "etree.pyx", line 693, in etree._Element.text.__set__
>  File "apihelpers.pxi", line 344, in etree._setNodeText
>  File "apihelpers.pxi", line 648, in etree._utf8
> AssertionError: All strings must be XML compatible, either Unicode or ASCII
>
> I suppose that is because in this test we trying to assign to el.text
> data, that contains placeholders, and maybe by some reason lxlm treats
> placeholders values(u'\u0001' and u'\u0002') as not unicode or ascii.

We could re-think our choice of placeholders if we know that this is
the reason.  But it sounds like elementTree is the way to go.

A few minor things.  The current version in git fails on non-ASCII
files (e.g., tests/misc/russian.txt).  That's because we end up
encoding the content too early: line 1889 writes etree to xml, utf8
encoded, after which we try to run textPostProcessors on it.  That's
not good.  This seems to fix it:

        xml = codecs.decode(etree.tostring(root, encoding="utf8"), "utf8")

(I am assuming that standard etree doesn't have an option of
serializing to non-encoded unicode.  If it does, use that instead.)

Note that in my experience there is only one way to use Unicode right
with Python: assume that all strings are unicode.  So, for this
reason, I've been following the policy of decoding data when it comes
my world and encoding it only when it comes out, without _ever_
passing encoded strings around.  Encoded strings are evil.

Another thing: lots of tests seem to fail now because of whitespace
differences.  I am guessing that the way to solve it is to first
extend test-markdown.py to add an option of reflowing XHTML before
diffing.  Then, once we know that all tests pass except for white
space differences, we can change the expected output.

  - yuri

-- 
http://sputnik.freewisdom.org/

2006	Jan	Feb	Mar	Apr	May	Jun	Jul (14)	Aug (5)	Sep	Oct	Nov	Dec (3)
2007	Jan	Feb	Mar (7)	Apr (6)	May (25)	Jun (11)	Jul	Aug (5)	Sep (5)	Oct (39)	Nov (28)	Dec (6)
2008	Jan (4)	Feb (39)	Mar (14)	Apr (12)	May (14)	Jun (20)	Jul (60)	Aug (69)	Sep (20)	Oct (56)	Nov (41)	Dec (29)
2009	Jan (27)	Feb (21)	Mar (37)	Apr (18)	May (2)	Jun (6)	Jul (6)	Aug (5)	Sep (2)	Oct (12)	Nov (2)	Dec
2010	Jan (12)	Feb (13)	Mar (10)	Apr	May (6)	Jun (5)	Jul (10)	Aug (7)	Sep (8)	Oct (7)	Nov (1)	Dec
2011	Jan	Feb	Mar (6)	Apr (5)	May (6)	Jun (15)	Jul (2)	Aug (6)	Sep	Oct (1)	Nov (2)	Dec (5)
2012	Jan (6)	Feb	Mar (2)	Apr (2)	May (2)	Jun (1)	Jul	Aug (2)	Sep	Oct	Nov	Dec (20)
2013	Jan	Feb	Mar (5)	Apr (1)	May (1)	Jun (9)	Jul (3)	Aug (5)	Sep (5)	Oct	Nov (2)	Dec
2014	Jan (10)	Feb	Mar	Apr (2)	May	Jun	Jul	Aug (12)	Sep (9)	Oct (4)	Nov (8)	Dec (2)
2015	Jan (5)	Feb (5)	Mar (1)	Apr (1)	May (3)	Jun	Jul	Aug (9)	Sep	Oct	Nov	Dec
2016	Jan (2)	Feb (2)	Mar (9)	Apr (2)	May (6)	Jun	Jul	Aug (1)	Sep (7)	Oct (1)	Nov	Dec (1)
2017	Jan (9)	Feb	Mar (3)	Apr	May (14)	Jun	Jul (2)	Aug (1)	Sep	Oct	Nov (2)	Dec (5)
2018	Jan	Feb	Mar (3)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec (9)
2019	Jan (4)	Feb (1)	Mar	Apr	May (1)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2020	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (2)	Sep (1)	Oct (2)	Nov	Dec

python-markdown-discuss Mailing List for Python Markdown (Page 30)

python-markdown-discuss — A general purpose discussion list