From: Mariusz W. <go...@gm...> - 2021-05-04 18:23:50
|
Thank you for the answer. Yes, I saw the release notes however I don't understand how this behavior is in line with what is stated in https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#escaping-mechanism "The backslash is removed from the output" suggests that there will be no character in the output, even NULL. Currently, my example produces invalid XML (I get parsing error from lxml). So I wonder if my use case is invalid? Shall I remove \x00 from the XML output before further processing? wt., 4 maj 2021 o 19:53 Guenter Milde via Docutils-users < doc...@li...> napisał(a): > On 2021-05-04, Mariusz Wasiluk wrote: > > > Hello, > > > I have following snippet: > > > from docutils.core import publish_doctree > > dom = publish_doctree(r'Foo\\bar').asdom() > > print(repr(dom.toxml())) > > > with docutils>=0.16, I get: > > u'<?xml version="1.0" ?><document > > source="<string>"><paragraph>Foo\x00\\bar</paragraph></document>' > > > with previous versions I get: > > u'<?xml version="1.0" ?><document > > source="<string>"><paragraph>Foo\\bar</paragraph></document>' > > > Why with the newest docutils versions I'm getting \x00 in the output? > > This is an intended change: > > Until 0.16, backslashs were removed prior to storing a Text string in the > document tree. Since 0.16 they are stored as NULL. > > See the HISTORY.txt entry for 0.16: > > - Keep `backslash escapes`__ in the document tree. Backslash characters > in > text are be represented by NULL characters in the ``text`` attribute of > Doctree nodes and removed in the writing stage by the node's > ``astext()`` method. > > __ > http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#escaping-mechanism > > This change was implemented in order to allow escaping "active characters" > also in transforms. The RELEASE_NOTES list one example: > > [...] This allows, e.g., escaping of author-separators in > `bibliographic fields`__. > > __ > http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#escaping-mechanism > __ docs/ref/rst/restructuredtext.html#bibliographic-fields > > Another usage is escaping of characters that would otherwise be > transformed by > the smartquotes__ transform. > > __ https://docutils.sourceforge.io/docs/user/config.html#smart-quotes > > > Günter > > > > _______________________________________________ > Docutils-users mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-users > > Please use "Reply All" to reply to the list. > |