From: Guenter M. <mi...@us...> - 2021-05-04 17:52:56
|
On 2021-05-04, Mariusz Wasiluk wrote: > Hello, > I have following snippet: > from docutils.core import publish_doctree > dom = publish_doctree(r'Foo\\bar').asdom() > print(repr(dom.toxml())) > with docutils>=0.16, I get: > u'<?xml version="1.0" ?><document > source="<string>"><paragraph>Foo\x00\\bar</paragraph></document>' > with previous versions I get: > u'<?xml version="1.0" ?><document > source="<string>"><paragraph>Foo\\bar</paragraph></document>' > Why with the newest docutils versions I'm getting \x00 in the output? This is an intended change: Until 0.16, backslashs were removed prior to storing a Text string in the document tree. Since 0.16 they are stored as NULL. See the HISTORY.txt entry for 0.16: - Keep `backslash escapes`__ in the document tree. Backslash characters in text are be represented by NULL characters in the ``text`` attribute of Doctree nodes and removed in the writing stage by the node's ``astext()`` method. __ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#escaping-mechanism This change was implemented in order to allow escaping "active characters" also in transforms. The RELEASE_NOTES list one example: [...] This allows, e.g., escaping of author-separators in `bibliographic fields`__. __ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#escaping-mechanism __ docs/ref/rst/restructuredtext.html#bibliographic-fields Another usage is escaping of characters that would otherwise be transformed by the smartquotes__ transform. __ https://docutils.sourceforge.io/docs/user/config.html#smart-quotes Günter |