From: Adam T. <aa-...@us...> - 2022-06-09 22:48:24
|
--- ** [bugs:#451] Deprecate PEP 263 coding slugs support** **Status:** open **Created:** Thu Jun 09, 2022 10:48 PM UTC by Adam Turner **Last Updated:** Thu Jun 09, 2022 10:48 PM UTC **Owner:** nobody **Attachments:** - [0001-Deprecate-PEP-263-coding-slugs.patch](https://sourceforge.net/p/docutils/bugs/451/attachment/0001-Deprecate-PEP-263-coding-slugs.patch) (5.5 kB; application/octet-stream) Python 3 uses utf-8 as the encoding for Python source files, there is no longer a compelling use-case for the support, which adds complexity to the IO implementation. I propose deprecating support for removal in 1.0, but 2.0 might be a better option. Support was added in [r4506]. A --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Günter M. <mi...@us...> - 2022-06-10 09:02:13
|
> Python 3 uses utf-8 as the encoding for Python source files, there is > no longer a compelling use-case for the support, which adds complexity > to the IO implementation. I see still a reason to keep (and properly document) a way to specify the encoding of an rST source in the document itself. Use cases: * A collection of files, where one file for whatever reason must be in a different encoding. Compilation with "buildhtml.py". * Documents in an 8-bit or 16-bit encoding intended for compilation anywhere. Avoids shipping a separate configuration file. The "coding slug" might become obsoleted by a more generic "in-document configuration" (cf. TODO item [misc.settings directive](https://docutils.sourceforge.io/docs/dev/todo.html#misc-settings) but this is still a long way off. --- ** [bugs:#451] Deprecate PEP 263 coding slugs support** **Status:** open **Created:** Thu Jun 09, 2022 10:48 PM UTC by Adam Turner **Last Updated:** Thu Jun 09, 2022 10:48 PM UTC **Owner:** nobody **Attachments:** - [0001-Deprecate-PEP-263-coding-slugs.patch](https://sourceforge.net/p/docutils/bugs/451/attachment/0001-Deprecate-PEP-263-coding-slugs.patch) (5.5 kB; application/octet-stream) Python 3 uses utf-8 as the encoding for Python source files, there is no longer a compelling use-case for the support, which adds complexity to the IO implementation. I propose deprecating support for removal in 1.0, but 2.0 might be a better option. Support was added in [r4506]. A --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Adam T. <aa-...@us...> - 2022-06-11 01:13:58
|
> I see still a reason to keep (and properly document) a way to specify the > encoding of an rST source in the document itself. The underlying thrust of my argument is that this is very fragile -- for any encoding that is not compatible with ASCII (e.g. UTF-16) the current coding slug test fails:: ```pycon >>> m = re.search(br"coding[:=]\s*([-\w.]+)", "coding: utf-16".encode("utf-16-le")) >>> m is None True >>> m = re.search(br"coding[:=]\s*([-\w.]+)", "coding: latin-1".encode("latin-1")) >>> m.group(1).decode("ascii") 'latin-1' ``` Similarly any in-document metadata would suffer the same fate -- Unicode codepoints (which make up `str` objects) cannot be assumed to have a correspondence to bytes on disk. Better to fail loudly than have silent data corruption. > A collection of files, where one file for whatever reason must be in a different encoding. Compilation with "buildhtml.py". If the need arises for this, we would accept a feature request for `buildhtml.py` to have some enumeration of files and their input encodings. > Documents in an 8-bit or 16-bit encoding intended for compilation anywhere. Avoids shipping a separate configuration file. Not sure I understand this one fully, but such a file would likely come with compilation instructions that included the input encoding. Annecdotally, I looked through the ~70 results for the following search [1]_ on "grep.app" `coding[:=]( \t)*(([^u\W]|u[^t\W]|ut[^f\W]|utf-?[^8\W])[-\w.]+)` (lookahead/lookbehinds aren't supported) and no file had a coding slug that occured in the first two lines. Whilst obviously only a fraction of extant reST files are indexed by that provider, if it was a pattern in common usage I would expect to see more than 0. One of my longer-term goals is to simplify `docutils.io` quite a lot, as I think there is a lot of duplicated code that the current (Python 3) stdlib provides automatically for us. Making our file parsing more vanilla/standard is a step towards this larger goal, although I do believe this change stands alone on its merits. A _[1]: `https://grep.app/search?current=7&q=coding%5B%3A%3D%5D%28%20%5Ct%29%2A%28%28%5B%5Eu%5CW%5D%7Cu%5B%5Et%5CW%5D%7Cut%5B%5Ef%5CW%5D%7Cutf-%3F%5B%5E8%5CW%5D%29%5B-%5Cw.%5D%2B%29®exp=true&filter[lang][0]=reStructuredText` --- ** [bugs:#451] Deprecate PEP 263 coding slugs support** **Status:** open **Created:** Thu Jun 09, 2022 10:48 PM UTC by Adam Turner **Last Updated:** Thu Jun 09, 2022 10:48 PM UTC **Owner:** nobody **Attachments:** - [0001-Deprecate-PEP-263-coding-slugs.patch](https://sourceforge.net/p/docutils/bugs/451/attachment/0001-Deprecate-PEP-263-coding-slugs.patch) (5.5 kB; application/octet-stream) Python 3 uses utf-8 as the encoding for Python source files, there is no longer a compelling use-case for the support, which adds complexity to the IO implementation. I propose deprecating support for removal in 1.0, but 2.0 might be a better option. Support was added in [r4506]. A --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Günter M. <mi...@us...> - 2022-06-12 19:44:14
|
> The underlying thrust of my argument is that this is very fragile -- True, this method only works with ASCII compatible encodings. (This is one of the reasons why Docutils as well as PEP 263 complement it with BOM mark recognition.) ... > If the need arises [...], we would accept a feature request for > `buildhtml.py` to have some enumeration of files and their input > encodings. IMO, it is more safe keep "source code encoding both visible and changeable on a per-source file basis". [PEP 263] Python3 still supports the encoding slug. I vote to keep this option as well. --- ** [bugs:#451] Deprecate PEP 263 coding slugs support** **Status:** open **Created:** Thu Jun 09, 2022 10:48 PM UTC by Adam Turner **Last Updated:** Sat Jun 11, 2022 01:13 AM UTC **Owner:** nobody **Attachments:** - [0001-Deprecate-PEP-263-coding-slugs.patch](https://sourceforge.net/p/docutils/bugs/451/attachment/0001-Deprecate-PEP-263-coding-slugs.patch) (5.5 kB; application/octet-stream) Python 3 uses utf-8 as the encoding for Python source files, there is no longer a compelling use-case for the support, which adds complexity to the IO implementation. I propose deprecating support for removal in 1.0, but 2.0 might be a better option. Support was added in [r4506]. A --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Adam T. <aa-...@us...> - 2022-06-12 20:07:13
|
Fair enough, I will put this on hold for now. [bugs:#450] is more important to resolve at the moment before 0.19.0b1 release. A --- ** [bugs:#451] Deprecate PEP 263 coding slugs support** **Status:** open **Created:** Thu Jun 09, 2022 10:48 PM UTC by Adam Turner **Last Updated:** Sat Jun 11, 2022 01:13 AM UTC **Owner:** nobody **Attachments:** - [0001-Deprecate-PEP-263-coding-slugs.patch](https://sourceforge.net/p/docutils/bugs/451/attachment/0001-Deprecate-PEP-263-coding-slugs.patch) (5.5 kB; application/octet-stream) Python 3 uses utf-8 as the encoding for Python source files, there is no longer a compelling use-case for the support, which adds complexity to the IO implementation. I propose deprecating support for removal in 1.0, but 2.0 might be a better option. Support was added in [r4506]. A --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |