[Docutils-develop] [ docutils-Bugs-3546533 ] unicode error with date directive

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #3546533, was opened at 2012-07-20 17:43
Message generated for change (Tracker Item Submitted) made by abadger1999
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3546533&group_id=38414

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Toshio Kuratomi (abadger1999)
Assigned to: Nobody/Anonymous (nobody)
Summary: unicode error with date directive

Initial Comment:
I received this bug report against our package in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=786867

Taking a look, the date directive is passing in a byte string in python2 but the class is trying to feed that byte string to a unicode() method.  When localized, the date directive can inject dates with non-ascii characters.  So we need to fix that.

Following patch seems to work:
Index: docutils-0.9.1/docutils/parsers/rst/directives/misc.py
===================================================================

--- docutils-0.9.1.orig/docutils/parsers/rst/directives/misc.py
+++ docutils-0.9.1/docutils/parsers/rst/directives/misc.py
@@ -10,6 +10,7 @@ import sys
 import os.path
 import re
 import time
+import locale
 from docutils import io, nodes, statemachine, utils
 from docutils.error_reporting import SafeString, ErrorString
 from docutils.parsers.rst import Directive, convert_directive_function
@@ -474,6 +475,17 @@ class Date(Directive):
                 'a substitution definition.' % self.name)
         format = '\n'.join(self.content) or '%Y-%m-%d'
         text = time.strftime(format)
+        if sys.version_info< (3, 0):
+            try:
+                text = unicode(text, locale.getpreferredencoding())
+            except UnicodeError:
+                try:
+                    text = unicode(text, 'utf-8')
+                except UnicodeError:
+                    # Fallback to something that can decode all bytes to
+                    # something.  Alternative fallback would be to decode
+                    # with errors='replace'
+                    text = unicode(text, 'latin-1')
         return [nodes.Text(text)]

Note that out of curiosity, I took a look at how often nodes.Text() is getting byte str type instead of unicode type using the following patch:

Index: docutils-0.9.1/docutils/nodes.py
===================================================================
--- docutils-0.9.1.orig/docutils/nodes.py
+++ docutils-0.9.1/docutils/nodes.py
@@ -329,6 +329,12 @@ class Text(Node, reprunicode):
     else:
         def __new__(cls, data, rawsource=None):
             """Prevent the rawsource argument from propagating to str."""
+            # Python2 is more lenient about mixing str and unicode than
+            # python3 mixing bytes and str but the danger is that our tests
+            # will give only ascii values to this function and be fine but in
+            # the real world someone will give it non-ascii and then crash
+            if isinstance(data, str):
+                raise TypeError('expecting unicode data, not str')
             return reprunicode.__new__(cls, data)
 
     def __init__(self, data, rawsource=''):

The results were quite bad:

[...]
  File "/srv/git/python-docutils/docutils-0.9.1/docutils/nodes.py", line 337, in __new__
    raise TypeError('expecting unicode data, not str')
TypeError: expecting unicode data, not str

----------------------------------------------------------------------
Ran 1192 tests in 8.961s

FAILED (errors=802)
[...]

These are all potential failure points -- whether they can fail in practice depends on whether the data being sent in can contain non-ASCII values or not.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3546533&group_id=38414