From: Jarret \Jax\ R. <jar...@pr...> - 2023-10-29 07:11:07
|
Dear docutils developers I'm sending you a draft on how to handle literal tab characters inside reStructuredText documents. The Problem =========== Maybe we start with two examples. From here on I'll denote the tab character with its escape sequence ``\t`` (in case it gets replaced by spaces). 1. tabs in Makefile listings/literal blocks ------------------------------------------- Currently it is impossible for the *output* of a literal block to contain tabs. This breaks some listings, e.g. Makefile: >This is a Makefile:: > > all: > \techo Hello This gets rendered as something like >This is a Makefile: ><pre> >all: > echo Hello ></pre> Which is not a valid Makefile as the rules require a tab and not 5 spaces. This behaviour is also contrary to my expectation that a literal block leaves its content untouched. 2. CSV with tabs ---------------- A common delimiter for a CSV-table (or should I say a CTV-table?) is the tab character. Currently, the tab character is only supported for included CSV tables but not for inline ones. The following does not work: >.. csv-table:: good delimiter > :delim: tab > > some\tcsv\tdata The problem is that all tab characters get replaced *prior* to parsing, namely in string2lines(). An attempt for a solution ========================= In an attempt to fix this, I toyed around with not replacing tab characters before parsing but handling them during parsing. The parts concerned with indention parsing are the methods trim_left() and get_indented() of StringList. Adjusting them lets things "just work(tm)". This is a draft, so some details and corner cases must still be sorted out. The above mentioned functions need the `tab_width` (currently at the default value of 8). Since I have not such a deep inside in the architecture of the parser, what would be the best way to pass that information to their consumer? Obviously we should somehow use the docutils.conf option tab_width. Here is some rambling of what I (don't) know: * The StringList methods are called by their StateMachineWS counterparts, which also does not have access to that information. However, we could make it a property of StateMachineWS. * StateMachineWS subclasses RSTStateMachine and NestedStateMachine do have access to document and hence to the tab_width setting. * We could also put the information into the state. In case we ever want to change the tab_width during parsing, e.g. inside a directive this might be the better place? However, I don't know how longlived/isolated these states are, changing the tab_width inside a directive should neither effect the next directive nor the rest of the document. Regards Jax |
From: engelbert g. <eng...@gm...> - 2023-10-31 13:50:04
|
Hello Jarret, thinking about tabs * python discourages usage hardtabs and space mixing, because becomes painful (IMHO) * python philosophy : explicit is better then implicit ... would an option :hard-tabs: keep be more in line ? cheers On Sun, 29 Oct 2023 at 08:11, Jarret "Jax" Renker via Docutils-develop < doc...@li...> wrote: > Dear docutils developers > > I'm sending you a draft on how to handle literal tab characters inside > reStructuredText documents. > > The Problem > =========== > > Maybe we start with two examples. From here on I'll denote the tab > character > with its escape sequence ``\t`` (in case it gets replaced by spaces). > > 1. tabs in Makefile listings/literal blocks > ------------------------------------------- > > Currently it is impossible for the *output* of a literal block to contain > tabs. > This breaks some listings, e.g. Makefile: > > >This is a Makefile:: > > > > all: > > \techo Hello > > This gets rendered as something like > > >This is a Makefile: > ><pre> > >all: > > echo Hello > ></pre> > > Which is not a valid Makefile as the rules require a tab and not 5 spaces. > > This behaviour is also contrary to my expectation that a literal block > leaves > its content untouched. > > 2. CSV with tabs > ---------------- > > A common delimiter for a CSV-table (or should I say a CTV-table?) is the > tab > character. Currently, the tab character is only supported for included CSV > tables but not for inline ones. The following does not work: > > >.. csv-table:: good delimiter > > :delim: tab > > > > some\tcsv\tdata > > The problem is that all tab characters get replaced *prior* to parsing, > namely > in string2lines(). > > An attempt for a solution > ========================= > > In an attempt to fix this, I toyed around with not replacing tab characters > before parsing but handling them during parsing. The parts concerned with > indention parsing are the methods trim_left() and get_indented() of > StringList. > Adjusting them lets things "just work(tm)". This is a draft, so some > details > and corner cases must still be sorted out. > > The above mentioned functions need the `tab_width` (currently at the > default > value of 8). Since I have not such a deep inside in the architecture of the > parser, what would be the best way to pass that information to their > consumer? > > Obviously we should somehow use the docutils.conf option tab_width. > > Here is some rambling of what I (don't) know: > > * The StringList methods are called by their StateMachineWS counterparts, > which also does not have access to that information. However, we could > make it a property of StateMachineWS. > > * StateMachineWS subclasses RSTStateMachine and NestedStateMachine do have > access to document and hence to the tab_width setting. > > * We could also put the information into the state. In case we ever want > to change the tab_width during parsing, e.g. inside a directive this > might be the better place? > > However, I don't know how longlived/isolated these states are, changing > the tab_width inside a directive should neither effect the next > directive > nor the rest of the document. > > Regards > Jax > > > _______________________________________________ > Docutils-develop mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list. > |
From: Jarret \Jax\ R. <jar...@pr...> - 2023-10-31 18:12:58
|
Hi, ------- Original Message ------- On Tuesday, October 31st, 2023 at 1:49 PM, engelbert gruber <eng...@gm...> wrote: > thinking about tabs > > * python discourages usage hardtabs and space mixing, because becomes painful (IMHO) Sure, but other languages, e.g. Makefile, mandate tabs and so do some styleguides. > * python philosophy : explicit is better then implicit ... would an option :hard-tabs: keep > be more in line ? I'm fine with that. I planned (re)using the tab_width option of docutils.conf (setting it to -1 preserves tabs, like its cousin does for the include directive) but introducing a new option is also ok. Regards, Jax |
From: engelbert g. <eng...@gm...> - 2023-10-31 18:23:34
|
mind you ... this is a discussion On Tue, 31 Oct 2023 at 19:12, Jarret "Jax" Renker < jar...@pr...> wrote: > Hi, > > ------- Original Message ------- > On Tuesday, October 31st, 2023 at 1:49 PM, engelbert gruber < > eng...@gm...> wrote: > > thinking about tabs > > > > * python discourages usage hardtabs and space mixing, because becomes > painful (IMHO) > > Sure, but other languages, e.g. Makefile, mandate tabs and so do some > styleguides. > > > * python philosophy : explicit is better then implicit ... would an > option :hard-tabs: keep > > be more in line ? > > I'm fine with that. I planned (re)using the tab_width option of > docutils.conf (setting it to -1 preserves tabs, like its cousin does for > the include directive) but introducing a new option is also ok. > > > Regards, Jax > |
From: Guenter M. <mi...@us...> - 2023-11-14 08:50:26
|
Dear Jarret, dear Docutils developers, thank you for the TAB patch set. A review/test is on my TODO list. On 2023-10-31, Jarret "Jax" Renker via Docutils-develop wrote: > On Tuesday, October 31st, 2023 at 1:49 PM, engelbert gruber <eng...@gm...> wrote: >> * python discourages usage hardtabs and space mixing, because becomes >> painful (IMHO) > Sure, but other languages, e.g. Makefile, mandate tabs and so do some > styleguides. I agree that there is a case for preserving TAB characters in literal blocks and code blocks (in extension to what is already possible with included files). >> * python philosophy : explicit is better then implicit ... would an >> option :hard-tabs: keep be more in line ? > I'm fine with that. I planned (re)using the tab_width option of > docutils.conf (setting it to -1 preserves tabs, like its cousin does > for the include directive) but introducing a new option is also ok. The "tab-width" setting is also important to tell the "visible indentation" of lines that start between tab stops: """\ * bullet list \tThe nesting of this paragraph depends on the ``tab-width`` setting: \tSecond paragraph of the first item with ``--tab-width 2``, \tblock quote nested in first item with ``--tab-width`` > 2, but \tblock quote following the list with ``--tab-width 1``. """ So we have the choice of a) tab-width: <positive integer> expand-tabs: <boolean> b) tab-width: <integer> Negative values preserve TAB characters in "code" and "literal" blocks. Other TABs are expanded to abs(tab_width). I prefer b), because it does not increase the number of settings and is compatible with the "include" directive's ``:tab-width:`` option. Günter |
From: Guenter M. <mi...@us...> - 2023-12-09 09:42:14
|
Dear Jax, On 2023-10-29, Jarret "Jax" Renker via Docutils-develop wrote: > I'm sending you a draft on how to handle literal tab characters inside > reStructuredText documents. Thank you for the suggestion and patch set! I finally found the time to "immerse myself" into the TAB problem. Sorry for the late answer. ... > Currently it is impossible for the *output* of a literal block to > contain tabs. This breaks some listings, e.g. Makefile: ... > This behaviour is also contrary to my expectation that a literal block > leaves its content untouched. This is indeed one of the "warts"¹ in Docutils. ¹ Things that are ugly but kept for "practical reasons". ... > Currently, the tab character is only supported for included CSV tables > but not for inline ones. ... This is a valid complaint, too, but it is less strong than the "literal-block" one, as it should be not too much pain for authors to change to a different delimiter for *embedded* CSV values. (The output uses a different delimiter anyway.) Now we have to find out, if solving these issues without introducing a bunch of new problems is possible and worth the effort. > The problem is that all tab characters get replaced *prior* to parsing, > namely in string2lines(). > An attempt for a solution >========================= > In an attempt to fix this, I toyed around with not replacing tab > characters before parsing but handling them during parsing. The parts > concerned with indention parsing are the methods trim_left() and > get_indented() of StringList. Adjusting them lets things "just > work(tm)". This is a draft, so some details and corner cases must > still be sorted out. The draft demonstrates a possible way forward and solves the issue in "well behaved" cases. Unfortunately, there are set of small issues, some failures and a general problem. Let's start with the last: Generic Problem: shifting post-tab text. ---------------------------------------- TAB expands to a 1 … tab-width spaces, depending on position:: for i in range(8): s = f"{i*'·'}\t{(8-i)*'·'}" print((s, s.expandtabs(8))) Therefore, shortening from the left affects the expansion of the remainder:: from docutils.statemachine import logical_rslice # added in patch 4/9 sample = 3 * '·\t·' print(f'{sample=}') print('i sample shortened by i and expanded') print('- ----------------------------------') for i in range(8): print(i, i*' ', repr(logical_rslice(sample, i, 8).expandtabs(8))) print() This may break documents, e.g. if a table is aligned with TABs: table:: ========== == Mueller\t3 Maier\t1 Maierhuber\t5 ========== == (Mind, that in rST, you may nest a literal block in a table cell!) Workaround: Request that block-indentation must be a multiple of "tab-width" if there are literal TABs in an indented text block. `docutils.statemachine.logical_rslice()` may issue a Warning and expand the remainder before returnig if there are TABs in it. ... > Obviously we should somehow use the docutils.conf option tab_width. Indeed. Unfortunately, I am not familiar with the parser, too, and its author David Goodger is less active on docutils-develop nowadays. Test failure ------------ The test suit (alltests.py) fails after applying the patches. The failing test is in test/test_parsers/test_rst/test_directives/test_include.py (custom TAB expansion with included code), so this may be an aftereffect of not observing the "tab-width" setting. I found some more problematic cases, some of them relate to the `generic problem` above but there seems to be at least one more problem with nested parsing... ~~~ from docutils.core import publish_parts sample = """ This is a simple test a block with definition list (space indented) a block \t with definition list (TAB + space indented) \ta block \t\twith definition list (TAB indented) A block \twith definition list (space + TAB indented) .. note:: \tsomething \t\tis up a literal block:: \t less\tindented first line \t\tdo we want leading spaces here? A table using TABs for alignment ========== == Mueller\t\t3 Maier\t\t1 Maierhuber\t5 ========== == A similar table in a block-quote ========== == Mueller\t3 Maier\t1 Maierhuber\t5 ========== == """ print(publish_parts(sample)['whole']) ~~~ Smaller Problems ---------------- :1/9: parsers.rst.Parser.parse() and statemachine.string2lines() are part of the API: changes must be backwards compatible or announced in the RELEASE-NOTES 2 releases in advance. → e.g., tab_width < 0 → don't replace :2/9: dito. → Test for old and new behaviour. :3/9: → We need more test cases for "mixed" indentation and different `tab-width`. (maybe later) :4/9: term "logical": are there analogues, precedences? suggestion: logical → expanded | tab_expanded | ...? logical_rslice(): "rslice()" was proposed as a reverse variant of the "slice()" standard function. → shorten(s, by, tab_width) | lshorten(s, by, tab_width) sincerely, Günter |