Menu

#51 Some bare URLs with multiple underscores must be escaped.

Default
open
nobody
None
4
2016-07-26
2016-03-03
No

When generating standalone hyperlinks, some bare URLs with multiple underscores are handled just fine by rst2html, such as: http://www.huffingtonpost.com/2014/01/04/boiling-water-extreme-cold-water-gun-ice-crystals_n_4538522.html

However, some URLs with multiple underscores must be escaped, such as: http://en.wikipedia.org/wiki/Wavefront_.obj_file

This is mentioned briefly in the documentation:

Backslashes may be used in URIs to escape markup characters, specifically asterisks ("*") and underscores ("_") which are vaid URI characters

So it can be fixed with a backslash thus: http://en.wikipedia.org/wiki/Wavefront\_.obj_file

However, it would be preferable if the parser be altered so that all valid URIs are transformed into hyperlinks. Is this possible?

If it is not possible, could the parser issue a warning when URIs containing underscores or asterisks are not escaped?

1 Attachments

Discussion

  • Günter Milde

    Günter Milde - 2016-05-25

    The difference between "working" and "failing" URLs is the punktuation after the underscore.

    docutils.sf.net/docs/ref/rst/restructuredtext.html#recognition-order specifies that
    Standalone hyperlinks are the last to be recognized.

    Underscores between alphanumerical characters do not interfere, because these are ignored due to the docutils.sf.net/docs/ref/rst/restructuredtext.html#inline-markup-recognition-rules.

    So, this is a side-effect of a feature.

    Your example shows that normally there is already a warning in these cases (unless you happen to define a target with a name matching the part of the URL that Docutils interprets as a hyperlink reference.

     
    • Nathaniel Beaver

      Standalone hyperlinks are the last to be recognized.

      Ok, I understand that the existing disambiguation rules rule out some valid URIs.

      Your example shows that normally there is already a warning in these cases

      Yes, and that is certainly better than nothing. However, might it be possible to issue a more specific warning in the case where the underscores are in a valid URL?

      For example, instead of

      example-urls.rst:19: (ERROR/3) Unknown target name: "wavefront".
      

      The warning might be

      example-urls.rst:19: (ERROR/3) Unknown target name: "wavefront".
      example-urls.rst:19: (ERROR/3) Character "_" not escaped in standalone hyperlink "http://en.wikipedia.org/wiki/Wavefront_.obj_file"
      

      Or something in that vein?

       
  • Günter Milde

    Günter Milde - 2016-05-26

    Ticket moved from /p/docutils/bugs/296/

     
  • Günter Milde

    Günter Milde - 2016-05-26

    This would still require recognizing standalone hyperlinks (URLs) before parsing for other inline markup and special-casing for inline markup characters inside them. (Then, also a setting to favour standalone hyperlinks over other inline markup is possible.)

    But this is an enhancement, not a bug fix.

     
  • Günter Milde

    Günter Milde - 2016-07-26
    • Priority: 5 --> 4
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.