From: Guenter M. <mi...@us...> - 2023-12-09 09:42:14
|
Dear Jax, On 2023-10-29, Jarret "Jax" Renker via Docutils-develop wrote: > I'm sending you a draft on how to handle literal tab characters inside > reStructuredText documents. Thank you for the suggestion and patch set! I finally found the time to "immerse myself" into the TAB problem. Sorry for the late answer. ... > Currently it is impossible for the *output* of a literal block to > contain tabs. This breaks some listings, e.g. Makefile: ... > This behaviour is also contrary to my expectation that a literal block > leaves its content untouched. This is indeed one of the "warts"¹ in Docutils. ¹ Things that are ugly but kept for "practical reasons". ... > Currently, the tab character is only supported for included CSV tables > but not for inline ones. ... This is a valid complaint, too, but it is less strong than the "literal-block" one, as it should be not too much pain for authors to change to a different delimiter for *embedded* CSV values. (The output uses a different delimiter anyway.) Now we have to find out, if solving these issues without introducing a bunch of new problems is possible and worth the effort. > The problem is that all tab characters get replaced *prior* to parsing, > namely in string2lines(). > An attempt for a solution >========================= > In an attempt to fix this, I toyed around with not replacing tab > characters before parsing but handling them during parsing. The parts > concerned with indention parsing are the methods trim_left() and > get_indented() of StringList. Adjusting them lets things "just > work(tm)". This is a draft, so some details and corner cases must > still be sorted out. The draft demonstrates a possible way forward and solves the issue in "well behaved" cases. Unfortunately, there are set of small issues, some failures and a general problem. Let's start with the last: Generic Problem: shifting post-tab text. ---------------------------------------- TAB expands to a 1 … tab-width spaces, depending on position:: for i in range(8): s = f"{i*'·'}\t{(8-i)*'·'}" print((s, s.expandtabs(8))) Therefore, shortening from the left affects the expansion of the remainder:: from docutils.statemachine import logical_rslice # added in patch 4/9 sample = 3 * '·\t·' print(f'{sample=}') print('i sample shortened by i and expanded') print('- ----------------------------------') for i in range(8): print(i, i*' ', repr(logical_rslice(sample, i, 8).expandtabs(8))) print() This may break documents, e.g. if a table is aligned with TABs: table:: ========== == Mueller\t3 Maier\t1 Maierhuber\t5 ========== == (Mind, that in rST, you may nest a literal block in a table cell!) Workaround: Request that block-indentation must be a multiple of "tab-width" if there are literal TABs in an indented text block. `docutils.statemachine.logical_rslice()` may issue a Warning and expand the remainder before returnig if there are TABs in it. ... > Obviously we should somehow use the docutils.conf option tab_width. Indeed. Unfortunately, I am not familiar with the parser, too, and its author David Goodger is less active on docutils-develop nowadays. Test failure ------------ The test suit (alltests.py) fails after applying the patches. The failing test is in test/test_parsers/test_rst/test_directives/test_include.py (custom TAB expansion with included code), so this may be an aftereffect of not observing the "tab-width" setting. I found some more problematic cases, some of them relate to the `generic problem` above but there seems to be at least one more problem with nested parsing... ~~~ from docutils.core import publish_parts sample = """ This is a simple test a block with definition list (space indented) a block \t with definition list (TAB + space indented) \ta block \t\twith definition list (TAB indented) A block \twith definition list (space + TAB indented) .. note:: \tsomething \t\tis up a literal block:: \t less\tindented first line \t\tdo we want leading spaces here? A table using TABs for alignment ========== == Mueller\t\t3 Maier\t\t1 Maierhuber\t5 ========== == A similar table in a block-quote ========== == Mueller\t3 Maier\t1 Maierhuber\t5 ========== == """ print(publish_parts(sample)['whole']) ~~~ Smaller Problems ---------------- :1/9: parsers.rst.Parser.parse() and statemachine.string2lines() are part of the API: changes must be backwards compatible or announced in the RELEASE-NOTES 2 releases in advance. → e.g., tab_width < 0 → don't replace :2/9: dito. → Test for old and new behaviour. :3/9: → We need more test cases for "mixed" indentation and different `tab-width`. (maybe later) :4/9: term "logical": are there analogues, precedences? suggestion: logical → expanded | tab_expanded | ...? logical_rslice(): "rslice()" was proposed as a reverse variant of the "slice()" standard function. → shorten(s, by, tab_width) | lshorten(s, by, tab_width) sincerely, Günter |