xref [r9785], [r9853], [r9855]
Dear @milde,
Thank you for the fix to my recent patch. It seems neither my patch nor the fix addressed the root cause of the test failures, as tests have resumed failing on Windows.
I believe the following demonstrates the problem:
>>> import sys; print(sys.platform)
win32
>>> import urllib.parse, urllib.request
>>> urllib.request.url2pathname('test/data/circle-broken.svg')
'test\\data\\circle-broken.svg'
>>> urllib.parse.unquote('test/data/circle-broken.svg')
'test/data/circle-broken.svg'
Currently, we use imagepath = urllib.request.url2pathname(uri_parts.path)
, which converts path separators to their platform-native format. On UNIX, url2pathname
simply calls unquote
, but on Windows it handles UNC paths (\\host\path\
) and escaped drive letters (///C|/users/
).
I don't know what led to using url2pathname()
, as it is quite specialised (the docstring notes "not recommended for general use"). Is it possible to use the simpler unquote()
here?
For local file paths (e.g. without a file:///
scheme), should we even be using URI parsing? Perhaps we should use proper path handling if there is no URI scheme (i.e. the user has provided a file-path).
A
Unfortunately, I don't know what the "test failure on Windows" actually is, could you provide the relevant test output, please?
If it works on Windows, using unquote() is a good idea. Could you test with
and adapt the test script if required?
Per definitionem, there is no way to provide a system file-path as image ressource, both rST and the Docutils Doctree explicitely expect an URI.
As URIs are capable of representing local file paths, I dont see the advantage.
Interpreting an image URI with undefined scheme as system path would be a complication of both, specification and implementation and a backwards-incompatible change. We would als have to think about images that are only read to get the size but kept external in HTML output.
See https://github.com/AA-Turner/docutils/actions/runs/10284219635/job/28459783099 for an online view (I advise collapsing the "alltests" tab and scrolling up to the pytest one, which has better formatted output). The full failure is very verbose, but it is because
if imagepath.startswith('/')
fails.Using
unquote()
does make the current tests pass on Windows, my question was more about what is the range of URIs that we deem valid? If we useunquote()
, then e.g.file:///C|users/john/mallard.png
will be improperly translated:Now it seems that (at least from wikipedia) this scheme of using
|
is not standardised/used, and might just be a Python quirk -- in which case I think just usingunquote()
is easier. Of note,pathlib
does not use the|
replacement mechanism.This seems to be a problem of imprecise language.
picture.png
as used in the example is not a URI, as it has no scheme. Nor is/data/blue square.png
or etc. Indeed, none of the.. image::
directives indocs/
have a scheme, and hence aren't URIs. The actual check that we do is inImage.run()
,reference = directives.uri(self.arguments[0])
, which just removes unescaped whitespace.This goes directly to my point, as the scheme is what makes a URI! (well, the scheme and the path).
I think that there is a strong argument based on historical practice in favour of documenting that we accept either a URI-with-scheme (
file:
orhttps:
etc) or a local file path. However, I might be missing something, and I agree we should avoid backwards-incompatible changes. My main motivation is simply to fix the tests, and it seems that usingunquote()
will work, unless you forsee any other problems.A
I am in favour of using unquote().
The "|" seems to be not Python specific, however, it is obsolete. Wiki says:
RFC 8089 says:
URI vs. Path
Actually not: by default, the "uri" attribute is written as-is to the HTML
<img>
element's "src" attribute which expects an URL and may fail for a number of valid system paths.It seems what Docutils expects is a URI-Reference¹:
In contrast to a system file path, a relative reference still requires forward slashes and quoting of some characters.
¹ NOTE: Previous specifications used the terms "partial URI" and
"relative URI" to denote a relative reference to a URI. [RFC 3989]
Last edit: Günter Milde 2024-08-11
Fixed in [r9866]. Documentation updated in [r9865].
Thank you for report and analysis.
Related
Commit: [r9865]
Commit: [r9866]
Sorry, another failure.
The root cause of this one seems to be the
uri_parts.scheme not in ('', 'file')
check. On Windows, we get the following, because drives in absolute paths are parsed as URI schemes:We could do something like
Path(uri).is_file()
, but that has the problems with using pathlib that you mentioned earlier...Full failure below, also on https://github.com/AA-Turner/docutils/actions/runs/10308701375/job/28536730749
A
The parsing is correct: The image argument is a "URI reference":
Hence, the example
~~~
.. image:: S:/Development/data/circle-broken.svg
~~~
describes an image accessible under scheme "S".
Our problem is, that we cannot easily use relative paths (to keep flexibility regarding where to run the tests from).
A simple solution would be using a "file" scheme in the offending examples.
If [r9880] fixes the problem, we can mark this bug as fixed.
Related
Commit: [r9880]
Last edit: Günter Milde 2024-08-11
The issue here now becomes why the tests were not failing on Linux, I believe. Both Windows and Linux would have had an absolute path (e.g.
.. image:: /home/user/development/docutils/docutils/test/data/circle-broken.svg
). On Windows, this failed due to misinterpreting the drive letter as a scheme. However, because the defaultroot_prefix
is/
, when an absolute path is used on Unix, the scheme is''
,imagepath.startswith('/')
isTrue
, androot_prefix/imagepath[1:]
just strips and restores the leading slash.I believe that if we are only accepting URI-refereneces, then we should reject absolute paths on UNIX, and the tests should fail.
--image-loading
was added in Docutils 0.18 and:loading:
was added in 0.21. Local testing shows that on Linux, absolute paths are allowed with--image-loading embed
or:loading: embed
in every released version and the current master branch. On Windows, 0.18-0.20 allowed absolute paths, but since 0.21 an error has been raised.The following table demonstrates the matrix.
image_loading
means using--image-loading embed
and:loading:
means using:loading: embed
. My test commands were:Absolute path support in the image directive with embed mode
image_loading
:loading:
image_loading
:loading:
I think we have two options. (1) support absolute paths on every platform, document as such. (2) make absolute paths return an error on Linux. The first restores backwards compatibility, the second is consistent with what we document.
A nit-pick here -- this commit leads to URIs like
file://s:/...
on Windows. From Wikipedia's page on file URIs "file://path
(i.e. two slashes, without a hostname) is never correct, but is often used.".I think we should use Path.as_uri() instead of
.as_posix()
.A
Related
Commit: [r9880]
Done in [r9882].
Related
Commit: [r9882]
(3) support URI-references: full URI | relative reference (relative to a "base URI"), e.g. an "absolute path reference" (starting with one slash) or a "relative path reference" (not starting with a slash).
the second is needlessly restrictive and hard to explain.
The third is consistent with what we document.
The different handling of Windows system paths is due to the syntax definition of URIs where the "path" part resembles a POSIX path.
Another reason to describe the image attribute as "URI reference" are paths with spaces ---
to get a path with spaces, percent encoding is required:
As spaces are not allowed in an URI anyway, they are dropped from a link block in reStructuredText to allow easy breaking of long URIs over several lines. For example:
is parsed into the native XML
This behaviour is surprising when you describe an image attribute as "URI or local path".
Last edit: Günter Milde 2024-09-11
Suppose a user has a directive
.. image:: /media/circle.svg
. By default this will load the file under/media/circle.svg
, as the user might expect if they think they are using filesystem paths, and don't realise the nuances of URI references. At some later date, this user setsroot_prefix
to/home/user/project/
. The image directive will stop working, as it now looks for a file under/home/user/project/media/circle.svg
.I think this behaviour is hard to explain and quite surprising.
I have a proposed fix at https://github.com/AA-Turner/docutils/pull/18 (https://github.com/AA-Turner/docutils/pull/18.patch). This does the following:
file:///
URIs to filesystem paths (using Python 3.13's Path.from_uri() where possible).file:
URIs are absolute pathsroot_prefix
to''
(the empty string), and only applies theroot_prefix
behaviour when it is non-empty (i.e. a user has configured the setting)The second commit in the patch goes further, by rejecting absolute paths when used in relative references. This is optional, but as I argue above I think allowing absolute paths in relative references leads to surprising behaviour, and the resolution is easy if you want to keep using absolute paths (just use
file:///
). I think this would be a straightforward end-state to explain to users: either the argument to.. image::
(and.. figure::
) is a URI or it is a relative filesystem path.A
@aa-turner: Could you create a patch that just
I agree that this is easier to explain (while the current implementation does not need a conditional). As the change is also backwards-compatible we can implement it without advance warning.
This part of the patch is applied in [r9991].
Related
Commit: [r9991]
A helper function that converts a URI-reference to a filesystem path for HTML and LaTeX writers based on this patch is introduced in [r9996]. The ODT writer will have to wait, as this is a backwards-incompatible change: : The ODT writer expects relative image paths to be relative to the source directory.
Related
Commit: [r9996]
Your analysis with "the
uri_parts.scheme not in ('', 'file')
check" beeing "the root cause" of the test failure is correct. This check was introduced in [r9501] "Refactor "_html_base" writer."However, even before (since introduction of image embedding in 0.18), the drive part was split from the path. To verify this, try
PS> echo ".. image:: S:/Development/docutils/docutils/test/data/circle.svg" | python rst2html5.py --link-stylesheet --image-loading embed
from a directory in a different drive.BTW: both, the introduction of the :loading: directive option and of direct embedding of SVG images did not change path handling (you should get the same behaviour with a PNG or JPG image).
Related
Commit: [r9501]
Using the "file" scheme does not work if you want to generate HTML and LaTeX/PDF output from the same source with the HTML to be served via https and linking to images.
The rejection of absolute paths in "relative references" to the image URI also prevents a use where you read the image size from the file but not embed the image (cf. [feature-requests:#102]):
Both use cases currently work fine in a configuration that uses the same directory (e.g. "/public/wwwfiles/") in "root-prefix" and the http-server's "DocumentRoot".
Last edit: Günter Milde 2024-08-12
@aa-turner: Do the tests still fail on Windows?
@aa-turner: Can we close this bug?