From: Waylan L. <wa...@gm...> - 2008-08-27 22:16:43
|
I haven't added to this discussion yet as I wasn't sure what position to take. Here's my thoughts, observations and almost working solution: Everyone seems to be going back and forth on which random string generator is better. Personally I'm wondering what all the fuss is about. What we want is a unique string that identifies said string as a placeholder for a specific item in a stash. We have 2 stashes (rawHtml and inline) so we also need to identify which stash. The thing is, the "start" and "end" chars give us the uniqueness that identifies the string as a placeholder. If we only had one stash, all we would need is the id number. So the question then, is how do we identify which stash this placeholder is for? Currently, each stash's placeholder either contains the string "inline" or "html" (there are currently a couple other subtle differences but there easily removable). Now, as the current wikilink bug demonstrates, using actual real words that could legitimately appear in the document and perhaps even have patterns matching against it causes problems. So, we need 2 strings that will never (or at least very unlikely) be matched by any other pattern. The popular solutions in this dicusion thus far seem to have a string of random chars generated at import time. Depending on the generation method used, there will be x chances of a collision with a real, valid string. Obviously, the higher x is, the better - or so it seems. Suppose I am serving a document via a cgi script which will cause an import and a new, different random string on each page view. I only have x page views before there will be a collision. Whoops! Now try debugging that! Therefore, I propose that we select 2 strings of random chars (using whatever method you desire) and **hardcode** those 2 strings into markdown.py. That way, on each import (each page view in the above scenario) the placeholder strings will be the same and debugging will be consistent. What we really want is a string that will never be matched by another inline pattern's regex. We just need a string of all same-case chars between a-z of length n. As long as it does not contain any known words or abbreviations it works for me. Additionally, if the string is consistent, that makes it easier for an extension author to write the regex for inline patterns that will not match the string in the placeholder. I have commited an *almost* working branch [1] that has everything except the random strings (it still uses "inline" & "html"). I say "almost working" because the output includes a lot of extra, unnecessary whitespace. The problem is not creating the placeholder, but replacing the placeholder with the real content later - at least the way Artem's code works. Based upon docstrings, I determined that I needed to refactor ``InlineStash.extractId``, which I did. However, it seems that Artem's code was jumping through an awful lot of hoops and I haven't fully groked what ``Markdown._processPlaceholders`` is doing when it calls ``InlineStash.extractId``. Wouldn't it be better if we simply used the indexes ``m.start`` and ``m.end`` from a regex match rather than the string manipulation hoops it's doing now? Once we get that worked out, I'll replace the strings "inline" & "html" with something more random. Here's the output of a few simple tests: >>> markdown.markdown('foo *bar* baz') u'<p>foo <em>bar</em> baz</p>' >>> markdown.markdown('foo *bar __blah__* baz') u'<p>foo <em>bar <strong>blah</strong>\n </em> baz</p>' What's up with the newline and space between the closing tags ``</strong>`` & ``</em>``? Is that from the (IMO unnecessary) ``IndentTree`` function or something in ``Markdown._processPlaceholders``? I'm not sure. Any thoughts? [1]: http://gitorious.org/projects/python-markdown/repos/mainline/commits/ab57ff93b5b2750c082c87072ced774881190744 On Tue, Aug 26, 2008 at 9:22 AM, David Wolever <wo...@cs...> wrote: > On 25-Aug-08, at 7:55 PM, Artem Yunusov wrote: >> Yes, I agree, it's not necessary here. But thanks David, I didn't know >> about it before, and used to use md5 for such things. > Ah, well, I'm glad I could be helpful anyway :) > > And, re: more readable: >> "abcdefghijklmnopqrstuvwxyz"[random.randint(1,26)] > I agree, that's pretty nice :) > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- Waylan Limberg wa...@gm... |