From: SourceForge.net <no...@so...> - 2012-07-21 00:43:41
|
Bugs item #3546533, was opened at 2012-07-20 17:43 Message generated for change (Tracker Item Submitted) made by abadger1999 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3546533&group_id=38414 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Toshio Kuratomi (abadger1999) Assigned to: Nobody/Anonymous (nobody) Summary: unicode error with date directive Initial Comment: I received this bug report against our package in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=786867 Taking a look, the date directive is passing in a byte string in python2 but the class is trying to feed that byte string to a unicode() method. When localized, the date directive can inject dates with non-ascii characters. So we need to fix that. Following patch seems to work: Index: docutils-0.9.1/docutils/parsers/rst/directives/misc.py =================================================================== --- docutils-0.9.1.orig/docutils/parsers/rst/directives/misc.py +++ docutils-0.9.1/docutils/parsers/rst/directives/misc.py @@ -10,6 +10,7 @@ import sys import os.path import re import time +import locale from docutils import io, nodes, statemachine, utils from docutils.error_reporting import SafeString, ErrorString from docutils.parsers.rst import Directive, convert_directive_function @@ -474,6 +475,17 @@ class Date(Directive): 'a substitution definition.' % self.name) format = '\n'.join(self.content) or '%Y-%m-%d' text = time.strftime(format) + if sys.version_info< (3, 0): + try: + text = unicode(text, locale.getpreferredencoding()) + except UnicodeError: + try: + text = unicode(text, 'utf-8') + except UnicodeError: + # Fallback to something that can decode all bytes to + # something. Alternative fallback would be to decode + # with errors='replace' + text = unicode(text, 'latin-1') return [nodes.Text(text)] Note that out of curiosity, I took a look at how often nodes.Text() is getting byte str type instead of unicode type using the following patch: Index: docutils-0.9.1/docutils/nodes.py =================================================================== --- docutils-0.9.1.orig/docutils/nodes.py +++ docutils-0.9.1/docutils/nodes.py @@ -329,6 +329,12 @@ class Text(Node, reprunicode): else: def __new__(cls, data, rawsource=None): """Prevent the rawsource argument from propagating to str.""" + # Python2 is more lenient about mixing str and unicode than + # python3 mixing bytes and str but the danger is that our tests + # will give only ascii values to this function and be fine but in + # the real world someone will give it non-ascii and then crash + if isinstance(data, str): + raise TypeError('expecting unicode data, not str') return reprunicode.__new__(cls, data) def __init__(self, data, rawsource=''): The results were quite bad: [...] File "/srv/git/python-docutils/docutils-0.9.1/docutils/nodes.py", line 337, in __new__ raise TypeError('expecting unicode data, not str') TypeError: expecting unicode data, not str ---------------------------------------------------------------------- Ran 1192 tests in 8.961s FAILED (errors=802) [...] These are all potential failure points -- whether they can fail in practice depends on whether the data being sent in can contain non-ASCII values or not. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3546533&group_id=38414 |