From: Günter M. <mi...@us...> - 2024-08-07 14:54:31
|
I am in favour of using unquote(). > Now it seems that (at least from wikipedia) this scheme of using | is not standardised/used, and might just be a Python quirk The "|" seems to be not Python specific, however, it is obsolete. Wiki says: > On Microsoft Windows systems, the normal colon (:) after a device letter has sometimes been replaced by a vertical bar (|) in file URLs. This reflected the original URL syntax, which made the colon a reserved character in a path part. Since Internet Explorer 4, file URIs have been standardized on Windows, and should follow the following scheme. URI vs. Path > This seems to be a problem of imprecise language. picture.png as used in the example is not a URI, as it has no scheme. > ... > we accept either a URI-with-scheme (file: or https: etc) or a local file path. Actually not: by default, the "uri" attribute is written as-is to the HTML `<img>` element's "src" attribute which expects an URL and may fail for a number of valid system paths. It seems what Docutils expects is a [URI-Reference](https://datatracker.ietf.org/doc/html/rfc3986#section-4.1): > A URI-reference is either a URI or a relative reference. > If the URI-reference's prefix does not match the syntax of a scheme followed by its colon separator, then the URI-reference is a [relative reference](https://datatracker.ietf.org/doc/html/rfc3986#section-4.2). In contrast to a system file path, a relative reference still requires forward slashes and quoting of some characters. --- **[bugs:#493] Test failure on Windows with embedded images** **Status:** open **Created:** Wed Aug 07, 2024 02:25 AM UTC by Adam Turner **Last Updated:** Wed Aug 07, 2024 01:12 PM UTC **Owner:** nobody xref [r9785], [r9853], [r9855] Dear @milde, Thank you for the fix to my recent patch. It seems neither my patch nor the fix addressed the root cause of the test failures, as tests have resumed failing on Windows. I believe the following demonstrates the problem: ```pycon >>> import sys; print(sys.platform) win32 >>> import urllib.parse, urllib.request >>> urllib.request.url2pathname('test/data/circle-broken.svg') 'test\\data\\circle-broken.svg' >>> urllib.parse.unquote('test/data/circle-broken.svg') 'test/data/circle-broken.svg' ``` Currently, we use `imagepath = urllib.request.url2pathname(uri_parts.path)`, which converts path separators to their platform-native format. On UNIX, `url2pathname` simply calls `unquote`, but on Windows it handles UNC paths (``\\host\path\``) and escaped drive letters (``///C|/users/``). I don't know what led to using `url2pathname()`, as it is quite specialised (the docstring notes "not recommended for general use"). Is it possible to use the simpler `unquote()` here? For local file paths (e.g. without a ``file:///`` scheme), should we even be using URI parsing? Perhaps we should use proper path handling if there is no URI scheme (i.e. the user has provided a file-path). A --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |