Menu

#395 Nameless sources are stored in document["source"] as the string "None"

None
closed-fixed
nobody
None
2021-04-03
2020-07-11
No

When publishing from a file-like object that lacks a name attribute (e.g., an io.StringIO instance), the resulting document object will have its "source" attribute set to the string "None" instead of to an actual None.

You can test this for yourself by running the following code:

from io            import StringIO
from docutils.core import publish_doctree
from docutils.io   import FileInput

document = publish_doctree(
    source=StringIO('This is test text.'),
    source_path=None,
    source_class=FileInput,
)
print(repr(document["source"]))

This behavior can be fixed by wrapping the line source_path = decode_path(source_path) in new_document() in docutils/utils/__init__.py (line 444) inside an if source_path is not None: check.

Why this matters to me: I'm writing a command & library for rendering reStructuredText and splitting the result into a more powerful set of fields than offered by Docutils' built-in templating. One of these fields is the name of the document source, and having that field be a string "None" when it should be None is just wrong.

Discussion

  • John Thorvald Wodder II

    Although the tests pass with the suggested change, there are a couple more edits that will need to be made to fully support it:

    • The visit_document() functions in docutils/writers/_html_base.py and docutils/writers/html5_polyglot/__init__.py will need to guard against node['source'] being None before passing it to os.path.basename().
    • The self.encode(node['source']) call in visit_system_message() in docutils/writers/latex2e/__init__.py will need to be edited.
    • Possibly other edits? These are all I found with a quick search.
     
  • Günter Milde

    Günter Milde - 2020-07-14

    The following patch changes the representation of missing source from
    "None" to the empty string "":

     diff --git a/docutils/docutils/utils/__init__.py b/docutils/docutils/utils/__init__.py
     index 519ee3dfe..425eb29e7 100644
     --- a/docutils/docutils/utils/__init__.py
     +++ b/docutils/docutils/utils/__init__.py
     @@ -350,7 +350,10 @@ def decode_path(path):
          try:
              path = path.decode(sys.getfilesystemencoding(), 'strict')
          except AttributeError: # default value None has no decode method
     -        return nodes.reprunicode(path)
     +        if not path:
     +            return nodes.reprunicode('')
     +        raise ValueError('`path` value must be a String or ``None``, not %r'
     +                         %path)
          except UnicodeDecodeError:
              try:
                  path = path.decode('utf-8', 'strict')
    

    This keeps the promise that utils.decode_path() returns a
    nodes.reprunicode object and would work with the HTML writer's
    visit_document().

    Alternatively, one may consider making '' a node's "source" default value.

     
  • John Thorvald Wodder II

    That works too, thanks!

     
  • Günter Milde

    Günter Milde - 2020-07-14
    • status: open --> open-fixed
     
  • Günter Milde

    Günter Milde - 2020-07-14

    Fixed in r8527. Thank you for reporting and analysis.

     
  • Günter Milde

    Günter Milde - 2021-04-03
    • status: open-fixed --> closed-fixed
     
  • Günter Milde

    Günter Milde - 2021-04-03

    Fixed in Docutils 0.17.
    Thanks again for the report.

     

Log in to post a comment.