[Python-markdown-discuss] Two potentially conflicting custom InlineProcessors

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I run a Django site about Chinese literature, and so have some special
processors in the Markdown instance we use to process all blog posts and
longer text fields. We're using markdown 3.1.1. Two of these processors
are fighting with each other.

One of them detects runs of Chinese characters in text, and wraps them
in <span class="char"> tags so they can be targeted for various things.

The other provides very limited wiki-style links to other objects in our
database. For instance, if you want to create a link to an author's
page, you'd write [a:Ernest Hemingway]. The processor looks that up in
the database and adds an anchor tag with the appropriate href.

If I try to add a link to a Chinese author with eg [a:余华], both
processors fire, and we get weird behavior.

What I'd like to see is:

<a href="/authors/yu-hua/"><span class="char">余华</span></a>

Instead, what I get depends on which processor is registered with higher
priority (all the relevant code is pasted below). If the span-char
processor is registered first, then the link processor receives the
"klzzwxh:0000" placeholder instead of the author name, and thus can't
look up the link. If the link processor is registered first, then the
span-char processor is run twice, and the author name is wrapped in two
sets of span tags.

I read somewhere that you could use self.unescape() inside handleMatch()
to do this, but that just gives me:

stash = self.md.treeprocessors['inline'].stashed_nodes =>

AttributeError: 'NoneType' object has no attribute 'treeprocessors'

What am I missing? How do I get the behavior I'm after?

Thanks very much,
Eric

----
In the following, "chinese_regexp" is what it sounds like, and
"find_a_thing" is a general function for finding things in the database.

safe_md = markdown.Markdown(
    safe_mode=False,
    output_format="html5",
    extensions=["def_list"])

class SpanPattern(markdown.inlinepatterns.InlineProcessor):
    def handleMatch(self, m, data):
        el = etree.Element('span')
        el.text = m.group(0)
        el.set('class', 'char')
        return el, m.start(0), m.end(0)

safe_md.inlinePatterns.register(
    SpanPattern(chinese_regexp),
    'Chinese',
    50)

class LinkPattern(markdown.inlinepatterns.InlineProcessor):
    def handleMatch(self, m, data):
        mod = dict((("a", Author), ("p", Publisher), ("c", Collection),
                   ("w", Work)))[m.group(1)]
        data_string = m.group(2)
        qs = find_a_thing(mod, data_string)
        inst = qs[0] if len(qs) == 1 else None
        if inst:
            el = markdown.util.etree.Element("a")
            el.text = data_string
            el.set("href", qs[0].get_absolute_url())
            return el, m.start(0), m.end(0)
        return data_string, m.start(0), m.end(0)

safe_md.inlinePatterns.register(
    LinkPattern(r"\[([awcp]):([^]]+)\]"),
    "Dbase Links",
    100)