Re: [Python-markdown-discuss] XHTML or HTML?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, Jan 10, 2009 at 7:32 AM, Eric Abrahamsen <gi...@gm...> wrote:
>> We had talked at one point of having markdown import
>> lxml rather than ElementTree if it was available. Don't remember why
>> did decided not to. The list archives would answer that. However, if
>> you could provide a patch that works - I'll likely commit it.
>
> Looking through the mailing list archives, it looks like things stopped at
> "yes, that would be a good idea". As far as I can tell there wasn't any
> further action. I tried adding lxml to the import cascade in
> etree_loader.py, and it imports okay, but fails tests (I can provide more
> details if necessary).

Yeah, I seem to recall there being some differences between the two. I
wasn't the one who wrote that code, so the details aren't as clear to
me. Perhaps the final decision was made by the core devs off list.

> Then I realized that the lxml etree implementation is
> essentially unrelated to the lxml.html.xhtml_to_html function. Before
> messing with things further, I want to make sure this the right way: I'm
> thinking of adding a "html" keyword argument to the Markdown class
> definition. If it's set to true we try to import lxml.html.xhtml_to_html. If
> that fails we log a warning and then ignore it. if it succeeds, run
> xhtml_to_html right after the treeprocessors (I guess, would have to
> experiment). Does this seem generally sound?

The general idea is good, except that all xhtml_to_html does (afaict)
is remove the xml namespace for each element - which we never set to
begin with. What we need is something to convert from ``<br />`` to
``<br>`` and the like. The only way I'm aware of would be to build up
the tree with lxml.html from the start. But that would require
everyone have lxml installed or we maintain 2 versions of the code.
Neither is a practical option.

However, I would love to be proved wrong on that assessment.

As an aside, for a quick-fix, you could write a postprocessor which
simply does something like ``text.replace(" />", ">")``. However,
there are a few edge cases where that won't quite cut it. Therefore,
we don't offer it as a builtin option. However, it may be good enough
for many peoples needs.

-- 
----
Waylan Limberg
wa...@gm...