|
From: Guenter M. <mi...@us...> - 2021-05-04 17:52:56
|
On 2021-05-04, Mariusz Wasiluk wrote:
> Hello,
> I have following snippet:
> from docutils.core import publish_doctree
> dom = publish_doctree(r'Foo\\bar').asdom()
> print(repr(dom.toxml()))
> with docutils>=0.16, I get:
> u'<?xml version="1.0" ?><document
> source="<string>"><paragraph>Foo\x00\\bar</paragraph></document>'
> with previous versions I get:
> u'<?xml version="1.0" ?><document
> source="<string>"><paragraph>Foo\\bar</paragraph></document>'
> Why with the newest docutils versions I'm getting \x00 in the output?
This is an intended change:
Until 0.16, backslashs were removed prior to storing a Text string in the
document tree. Since 0.16 they are stored as NULL.
See the HISTORY.txt entry for 0.16:
- Keep `backslash escapes`__ in the document tree. Backslash characters in
text are be represented by NULL characters in the ``text`` attribute of
Doctree nodes and removed in the writing stage by the node's
``astext()`` method.
__ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#escaping-mechanism
This change was implemented in order to allow escaping "active characters"
also in transforms. The RELEASE_NOTES list one example:
[...] This allows, e.g., escaping of author-separators in
`bibliographic fields`__.
__ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#escaping-mechanism
__ docs/ref/rst/restructuredtext.html#bibliographic-fields
Another usage is escaping of characters that would otherwise be transformed by
the smartquotes__ transform.
__ https://docutils.sourceforge.io/docs/user/config.html#smart-quotes
Günter
|