From: Kirill L. <ki...@la...> - 2003-07-17 04:34:45
|
David, I finally found some time to put together a simple adhoc preprocessor and played with it for awhile. Also I was trying to wrap my mind around TeX (for some other project). The more I was thinking about text processing, looking through TeX docs, making second pass through rst docs, and your comments in this discussion, the better I was understanding your position. I have to admit you were right from the very beginning. Sorry for making all that noise. You are right, the problem is hard to formalize -- different users will have different needs and requirements -- so the best we can do is giving user an instrument to express such automatic conversions. Together with, probably, most common sets of rules. If we will think about possible sets of rules, the most problematic area is quotes conversion. Looks like there are 5 types of quotes: 1. Double (``foo'') (TeX: ``foo'') 2. Single (`foo') (TeX: `foo') 3. German (,,foo'') (TeX: "`foo"') 4. French (<<foo>>) (TeX: "<foo">, \og foo \fg{}) 5. Single angle (<foo>) (TeX: \flq foo \frq) Interesting that germans use french comments too, but >>the other way round<<. The ultimate solution would be to come up with some easy ways to explicitely enter all different quotes (akin TeX). Coupled with optional automatic conversion of regular quotes to one of the above styles it will create a sound solution. However I don't see 5 good syntaxes for quotes, given that backquotes are overloaded heavily. Even if we consider that single angle quotes are not common, and don't really need a short form. What are possibilities? <<foo>>, "{foo"}, "<foo">, "-foo-", "'foo'", "''foo''", "@foo@", "$foo$", "#foo#"? Most of these are ugly, others might be ambiguos. The best I see so far is "''double''", "'single'", "<french">, "{german"} Or, alternatively, we can just pick two styles, and provide different sets of rules which will interpret them differently. For instance, pick "<this"> and "{that"}; for english ruleset one will be double the other single, while for french ruleset, one will be french quotes and the other --- german. > Please familiarize yourself with it more. Before suggesting changes > or additions to a system, shouldn't you understand the system well? I > suggest you read everything in <http://docutils.sf.net/spec/rst/>, > skim <http://docutils.sf.net/spec/notes.html>, and go through the > archives of the mailing lists (at least the threads referred to > earlier). Well, I did read all the documentation, prior to suggesting. But I certainly missed lots of things. Moreover, I have not used rst much. It is kind of catch-22 situation for me. From what I learnt so far I like rst a lot, and I certainly want to use it. I want to replace a homegrown text to html converter. But that converter performes quoting automatically. So replacing it will reduce the quality of html pages, hence I can't do that, hence don't have chance to use rst. In order to solve the problem I am volunteering to help with adding this feature to rst. > Backslashes are used to escape markup. They could also escape text > that is not to be converted. It's tricky though, and would have to be > accounted for in the pattern matching (regexps). Right, it should not be hard. > The only addition that I can forsee having any chance of being > acceptable would be a data-driven text conversion mechanism, using > regular expressions *at most* [1]_. No text conversions should be > built-in, for the same reason that the character entity definitions at > <http://docutils.sf.net/tmp/charents/> are not built-in: > extensibility, flexibility, and responsibility. > > Extensibility: Users can add their own text conversions without > programming. Arrows ("-->"), smileys (";-)") [2]_, sky's the limit. > > Flexibility: User A and user B can choose different sets of > conversions, or no conversions at all. > > Responsibility (or "plausible deniability" ;-): It's impossible to > construct text conversions with 100% accuracy. There *will* be > errors. Any such errors must not be the resposibility of the tool, > Docutils. If the conversions are explicitly installed in a document > then the responsibility lies with the author or user who did the > installation. And they'll have the power to uninstall. Ok, I agree. What is your current position on this functionality? What chances are for adding it to docutils? If you don't want to adopt it (or not any time soon) I'll continue the preprocessor quest, so that I could use it later in my project. Otherwise I'll try to patch the parser instead. > .. [1] Perhaps also allowing %-interpolations from a set of built-in > useful pattern fragments, like "%(start_string_prefix)s" (from > docutils.parsers.rst.states.Inliner.start_string_prefix). This might be handy. > .. [2] The definition directive ("text-replace") should either be > powerful enough to allow graphics as replacements, or the > conversions should be performed before substitution references. What is the usecase for this one? When regular inline substitution would not be sufficient? --Kirill |