From: Tom R. <Tom...@po...> - 2016-03-31 22:54:47
|
summary ======= Small reST document (linked and attached) has sections with unique names. When I use docutils/restview to convert it to HTML, all but one section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* section id is "hashed" (i.e., created like a backref), which breaks an explicit internal anchor. Why not create all section IDs in the same way? More importantly, how to fix this (presuming as I do that it's a problem)? details ======= background ---------- I frequently generate HTML from reStructuredText, either directly or indirectly. I also frequently do internal linking: i.e., I create explicit links from text in one section of a document to another section. I'm currently working on a reST document which also exhibits the following problem (and have also experienced this previously), which I have "boiled down" to a relatively simple file name=problematic_naming_of_internal_anchors.rst , which I have mounted @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/HEAD/files/problematic_naming_of_internal_anchors.rst That is linked in "raw mode" (i.e., no rendering by Bitbucket) so you should see just the characters in the file, as in a text editor. (If you can't follow the link, note I have also attached the contents of the file to this post, following my .sig.) Please also note that the following is NOT about how Bitbucket renders reST (though BB reproduces the problem), since BB has its own problems with {section naming, internal anchors} as detailed here: https://bitbucket.org/site/master/issues/11314/restructuredtext-link-fragments-require However, presuming this problem is caused by docutils (as detailed below), fixing it would also improve the lives of everyone writing reST for display "in the cloud." problem ------- The problem I wish to raise here is exhibited by `restview <https://pypi.python.org/pypi/restview>`_, which I believe renders by just driving docutils. (Specifically, my version of `restview` renders with docutils-0.12, per header in generated HTML.) The document (problematic_naming_of_internal_anchors.rst) has the following section names, all of which are unique: for further processing integrate move short-term next hardware run short-term bodywear long-term long-term bodywear long-term house goods lighting The problem can be illustrated by comparing the section IDs generated for the section names={long-term bodywear, short-term bodywear} and the success of hand-coded links and generated/TOC links to those sections in the text. 1. reST section name='short-term bodywear' generates HTML= > <div class="section" id="short-term-bodywear"> > <h2><a class="toc-backref" href="#id8">short-term bodywear</a></h2> Note the form of the div attribute='id': it is the section name with all spaces replaced by dashes, aka 's/ /-/g'. This is as I expect (therefore good :-) 1.1. My hand-coded internal link to that section > .. |short-term bodywear| replace:: *short-term bodywear* > .. _short-term bodywear: #short-term-bodywear > > *see also* |short-term bodywear|_ works as expected, since the following HTML is generated: > <p><em>see also</em> <a class="reference external" href="#short-term-bodywear"><em>short-term bodywear</em></a></p> (I dunno why 'class="reference external"', since this is an internal link, but that's a quibble.) 1.2. The generated TOC link to that section also works as expected: > <li><a class="reference internal" href="#short-term-bodywear" id="id8">short-term bodywear</a></li> 2. reST section name='long-term bodywear' generates HTML= > <div class="section" id="id1"> > <h2><a class="toc-backref" href="#id10">long-term bodywear</a></h2> Note the form of the div attribute='id', which is NOT as I expect. I expect the generated ID to use the same rule (s/ /-/g) as was used to generate the ID from section name='short-term bodywear'; instead the div/section ID is "hashed" by appending a serial number to string='id'. 2.1. This unexpected behavior breaks my hand-coded internal reference to section name='long-term bodywear' > .. |long-term bodywear| replace:: *long-term bodywear* > .. _long-term bodywear: #long-term-bodywear > > *see also* |long-term bodywear|_ which generates (correctly) the following HTML: > <p><em>see also</em> <a class="reference external" href="#long-term-bodywear"><em>long-term bodywear</em></a></p> 2.2. However the generated TOC link to that section works by reproducing the unexpected behavior: > <li><a class="reference internal" href="#id1" id="id10">long-term bodywear</a></li> solution/questions ------------------ ISTM docutils should _always_ 1. for unique section names: generate `div id`s by `s/ /-/g` 2. for duplicate section names (and all backrefs): generate `div id`s by serial numbering, i.e. appending a serial number to string='id' So my first question is, am I missing something? Is there a reason to *not* behave thusly? If not: My second question is, is there any reason to believe that docutils is *not* producing the above behavior? If so, please lemme know and I'll put an `issue on restview <https://github.com/mgedmin/restview/issues>`_. If not: My third question presumes this behavior is due to a problem with docutils: is there anything else I should do to help get this fixed? Do I need to make an issue in a tracker? or do something to further debug the problem? or Something Completely Different? conclusion/attachment --------------------- If possible, please reply to me (directly) as well as to the list, and TIA, Tom Roche <Tom...@po...>-----problematic_naming_of_internal_anchors.rst follows to EOF === foo === .. contents:: **Table of Contents** for further processing ====================== integrate --------- move ---- short-term ========== next hardware run ----------------- short-term bodywear ------------------- .. howto style a link (e.g., make it italic): see http://docutils.sourceforge.net/FAQ.html#is-nested-inline-markup-possible .. |long-term bodywear| replace:: *long-term bodywear* .. _long-term bodywear: #long-term-bodywear *see also* |long-term bodywear|_ long-term ========= long-term bodywear ------------------ .. |short-term bodywear| replace:: *short-term bodywear* .. _short-term bodywear: #short-term-bodywear *see also* |short-term bodywear|_ long-term house goods --------------------- lighting ~~~~~~~~ |
From: Marc 'B. R. <ma...@ri...> - 2016-04-01 10:38:24
Attachments:
signature.asc
|
On 01/04/16 00:54, Tom Roche wrote: > ISTM docutils should _always_ > > 1. for unique section names: generate `div id`s by `s/ /-/g` > 2. for duplicate section names (and all backrefs): generate `div > id`s by serial numbering, i.e. appending a serial number to > string='id' > > So my first question is, am I missing something? Is there a reason > to *not* behave thusly? If not: > > My second question is, is there any reason to believe that docutils > is *not* producing the above behavior? If so, please lemme know and I'll > put an `issue on restview <https://github.com/mgedmin/restview/issues>`_. > If not: > > My third question presumes this behavior is due to a problem with > docutils: is there anything else I should do to help get this fixed? Do > I need to make an issue in a tracker? or do something to further debug > the problem? or Something Completely Different? I would not rely on the way id attributes are generated at all. That's an implementation detail IMHO and it's also a HTML thing, so this breaks when generating a PDF via LaTeX anyway. In the case presented you don't even have to do this because the link text matched the heading linked, so you can simply omit the link directives. Those are also responsible for the "reference external" instead of "reference internal" classes on the links. A workaround would be to move the link directives after the headlines, then docutils sees the headlines first and generates the ID(s) as you expect them to be. Ciao, Marc 'BlackJack' Rintsch -- “Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life.” -- Terry Pratchett, Jingo |
From: Tom R. <Tom...@po...> - 2016-04-01 23:17:55
|
[footnotes follow .sig] Tom Roche[1] >> Small reST document[2] has sections with unique names. When I use docutils/restview to convert it to HTML, all but one section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* section id is "hashed" (i.e., created like a backref), which breaks an explicit internal anchor. Marc 'BlackJack' Rintsch[3] > Here is the problem: you define an external target with the URL "#long-term-bodywear" (which happens to point to the same document) therefore also: "reference-external". Doh! Thanks for clarifying. I had assumed all links were created in the same way. > The following works here as expected:: > .. howto style a link (e.g., make it italic): see http://docutils.sourcefor > .. |long-term bodywear| replace:: *long-term bodywear* > .. |short-term bodywear| replace:: *short-term bodywear* > short-term bodywear > ------------------- > *see also* |long-term bodywear|_ > long-term bodywear > ------------------ > *see also* |short-term bodywear|_ ... and that renders correctly locally via `restview` and remotely via Bitbucket[4]. thanks again, Tom Roche <Tom...@po...> [1]: https://sourceforge.net/p/docutils/mailman/message/34982519/ [2]: flawed version @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/9277baf6d61904d1725c39dae4df8b7550192ebc/files/problematic_naming_of_internal_anchors.rst [3]: https://sourceforge.net/p/docutils/mailman/message/34984893/ [4]: fixed version @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/HEAD/files/problematic_naming_of_internal_anchors.rst |
From: Tom R. <Tom...@po...> - 2016-04-01 23:30:53
|
one more mistake to correct :-( > Tom Roche[1] > >> Small reST document[2] has sections with unique names. When I use docutils/restview to convert it to HTML, all but one section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* section id is "hashed" (i.e., created like a backref), which breaks an explicit internal anchor. - Marc 'BlackJack' Rintsch[3] + Günter Milde[3] > > Here is the problem: you define an external target with the URL "#long-term-bodywear" (which happens to point to the same document) therefore also: "reference-external". > Doh! Thanks for clarifying. I had assumed all links were created in the same way. > > The following works here as expected:: > > .. howto style a link (e.g., make it italic): see http://docutils.sourcefor > > .. |long-term bodywear| replace:: *long-term bodywear* > > .. |short-term bodywear| replace:: *short-term bodywear* > > short-term bodywear > > ------------------- > > *see also* |long-term bodywear|_ > > long-term bodywear > > ------------------ > > *see also* |short-term bodywear|_ > ... and that renders correctly locally via `restview` and remotely via Bitbucket[4]. > thanks again, Tom Roche <Tom...@po...> > [1]: https://sourceforge.net/p/docutils/mailman/message/34982519/ > [2]: flawed version @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/9277baf6d61904d1725c39dae4df8b7550192ebc/files/problematic_naming_of_internal_anchors.rst > [3]: https://sourceforge.net/p/docutils/mailman/message/34984893/ > [4]: fixed version @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/HEAD/files/problematic_naming_of_internal_anchors.rst |
From: Guenter M. <mi...@us...> - 2016-04-01 16:30:06
|
On 2016-03-31, Tom Roche wrote: > summary >======= > Small reST document (linked and attached) has sections with unique > names. When I use docutils/restview to convert it to HTML, all but one > section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* > section id is "hashed" (i.e., created like a backref), which breaks an > explicit internal anchor. Why not create all section IDs in the same > way? ... > The problem can be illustrated by comparing the section IDs generated > for the section names={long-term bodywear, short-term bodywear} and the > success of hand-coded links and generated/TOC links to those sections > in the text. > 1. reST section name='short-term bodywear' generates HTML= >> <div class="section" id="short-term-bodywear"> >> <h2><a class="toc-backref" href="#id8">short-term bodywear</a></h2> > Note the form of the div attribute='id': it is the section name with > all spaces replaced by dashes, aka 's/ /-/g'. This is as I expect > (therefore good :-) ... > 2. reST section name='long-term bodywear' generates HTML= >> <div class="section" id="id1"> >> <h2><a class="toc-backref" href="#id10">long-term bodywear</a></h2> > Note the form of the div attribute='id', which is NOT as I expect. > 2.1. This unexpected behavior breaks my hand-coded internal reference > to section name='long-term bodywear' >> .. |long-term bodywear| replace:: *long-term bodywear* >> .. _long-term bodywear: #long-term-bodywear >> *see also* |long-term bodywear|_ > which generates (correctly) the following HTML: >> <p><em>see also</em> <a class="reference external" >> href="#long-term-bodywear"><em>long-term bodywear</em></a></p> Here is the problem: you define an external target with the URL "#long-term-bodywear" (which happens to point to the same document) therefore also: "reference-external". Moreover, you define this external link *before* the equally named section. Now, when Docutils reaches the section header, the name is already "used up" and the standard fallback naming (via id-number) kicks in. > 2.2. However the generated TOC link to that section works by > reproducing the unexpected behavior: >> <li><a class="reference internal" href="#id1" id="id10">long-term bodywear</a></li> Yes, as is common for name duplication. > solution/questions > ------------------ > ISTM docutils should _always_ > 1. for unique section names: generate `div id`s by `s/ /-/g` > 2. for duplicate section names (and all backrefs): generate `div id`s > by serial numbering, i.e. appending a serial number to string='id' > So my first question is, am I missing something? Is there a reason to > *not* behave thusly? The point is, this could only be done for "link names", a common namespace for section names and other targets. The following works here as expected:: .. howto style a link (e.g., make it italic): see http://docutils.sourcefor .. |long-term bodywear| replace:: *long-term bodywear* .. |short-term bodywear| replace:: *short-term bodywear* short-term bodywear ------------------- *see also* |long-term bodywear|_ long-term bodywear ------------------ *see also* |short-term bodywear|_ Günter |