When generating standalone hyperlinks, some bare URLs with multiple underscores are handled just fine by rst2html
, such as: http://www.huffingtonpost.com/2014/01/04/boiling-water-extreme-cold-water-gun-ice-crystals_n_4538522.html
However, some URLs with multiple underscores must be escaped, such as: http://en.wikipedia.org/wiki/Wavefront_.obj_file
This is mentioned briefly in the documentation:
Backslashes may be used in URIs to escape markup characters, specifically asterisks ("*") and underscores ("_") which are vaid URI characters
So it can be fixed with a backslash thus: http://en.wikipedia.org/wiki/Wavefront\_.obj_file
However, it would be preferable if the parser be altered so that all valid URIs are transformed into hyperlinks. Is this possible?
If it is not possible, could the parser issue a warning when URIs containing underscores or asterisks are not escaped?
The difference between "working" and "failing" URLs is the punktuation after the underscore.
docutils.sf.net/docs/ref/rst/restructuredtext.html#recognition-order specifies that
Standalone hyperlinks are the last to be recognized.
Underscores between alphanumerical characters do not interfere, because these are ignored due to the docutils.sf.net/docs/ref/rst/restructuredtext.html#inline-markup-recognition-rules.
So, this is a side-effect of a feature.
Your example shows that normally there is already a warning in these cases (unless you happen to define a target with a name matching the part of the URL that Docutils interprets as a hyperlink reference.
Ok, I understand that the existing disambiguation rules rule out some valid URIs.
Yes, and that is certainly better than nothing. However, might it be possible to issue a more specific warning in the case where the underscores are in a valid URL?
For example, instead of
The warning might be
Or something in that vein?
Ticket moved from /p/docutils/bugs/296/
This would still require recognizing standalone hyperlinks (URLs) before parsing for other inline markup and special-casing for inline markup characters inside them. (Then, also a setting to favour standalone hyperlinks over other inline markup is possible.)
But this is an enhancement, not a bug fix.