Re: [Python-markdown-discuss] XHTML or HTML?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, Jan 10, 2009 at 11:27 PM, Waylan Limberg <wa...@gm...> wrote:

> On Sat, Jan 10, 2009 at 7:32 AM, Eric Abrahamsen <gi...@gm...> wrote:
> >> We had talked at one point of having markdown import
> >> lxml rather than ElementTree if it was available. Don't remember why
> >> did decided not to. The list archives would answer that. However, if
> >> you could provide a patch that works - I'll likely commit it.
> >
> > Looking through the mailing list archives, it looks like things stopped
> at
> > "yes, that would be a good idea". As far as I can tell there wasn't any
> > further action. I tried adding lxml to the import cascade in
> > etree_loader.py, and it imports okay, but fails tests (I can provide more
> > details if necessary).
>
> Yeah, I seem to recall there being some differences between the two. I
> wasn't the one who wrote that code, so the details aren't as clear to
> me. Perhaps the final decision was made by the core devs off list.

As far as I remember, there were some problems with several tests, and no
great performance boost in compare with cElementTree(only 4%), so we decided
to go with standard cElementTree/ElementTree.

Eric, If you want HTML output you can use ElementTree 1.3 [1]

    tree.write("out.html", method="html")

[1]: http://effbot.org/zone/elementtree-13-intro.htm

>
>
> > Then I realized that the lxml etree implementation is
> > essentially unrelated to the lxml.html.xhtml_to_html function. Before
> > messing with things further, I want to make sure this the right way: I'm
> > thinking of adding a "html" keyword argument to the Markdown class
> > definition. If it's set to true we try to import lxml.html.xhtml_to_html.
> If
> > that fails we log a warning and then ignore it. if it succeeds, run
> > xhtml_to_html right after the treeprocessors (I guess, would have to
> > experiment). Does this seem generally sound?
>
> The general idea is good, except that all xhtml_to_html does (afaict)
> is remove the xml namespace for each element - which we never set to
> begin with. What we need is something to convert from ``<br />`` to
> ``<br>`` and the like. The only way I'm aware of would be to build up
> the tree with lxml.html from the start. But that would require
> everyone have lxml installed or we maintain 2 versions of the code.
> Neither is a practical option.
>
> However, I would love to be proved wrong on that assessment.
>
> As an aside, for a quick-fix, you could write a postprocessor which
> simply does something like ``text.replace(" />", ">")``. However,
> there are a few edge cases where that won't quite cut it. Therefore,
> we don't offer it as a builtin option. However, it may be good enough
> for many peoples needs.
>
> --
> ----
> Waylan Limberg
> wa...@gm...
>
>
> ------------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It is the best place to buy or sell services for
> just about anything Open Source.
> http://p.sf.net/sfu/Xq1LFB
> _______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>