Right now, JsonWriter will simply concatenate all
strings "into" the written out string. However, it
should encode all characters not in the default ASCII
range (ord(character)>=128).
An example of the problem (\u20ac is a single
character). The following will read properly (in the
string Unicode character is encoded as 6 characters):
t = json.read(r'{"\u20ac_bef":27}')
Writing this out does not encode Unicode characters back:
json.write(t)
gives
>>> json.write(t)
u'{"\u20ac_bef":27}'
which is wrong, check out the 3rd character:
>>> json.write(t)[2]
u'\u20ac'
So, this should be encoded. Before line 289
(self._append(obj)), you should insert the following
fix (of course, the fix and fixFunction declarations
only need to be initialized once somewhere else):
import
fix = re.compile(u"([\u0080-\uffff])")
fixFunction = lambda(x): r"\u%04x" % ord(x.group(1))
fix.sub(fixFunction, obj)
This fixes it and should have no side-effects (besides
the introduction of a Regular expression).
Logged In: YES
user_id=270334
A quick note: The import statement above should of course
read "import re".
Logged In: YES
user_id=270334
Another addendum: I had to add the Unicode type to line 216/217:
if type(key) is not types.StringType and
type(key) is not types.UnicodeType:
raise ReadException, "Not a valid JSON
object key (should be a string): %s" % key
Otherwise my testcase would not be read at all, even though
it is valid JSON (I think, anyway).
Related to this bug is the test case
"testWriteEscapedHexCharacter", whose result is wrong. The
test case should be:
def testWriteEscapedHexCharacter(self):
s = json.write(u'\u1001')
self.assertEqual(r'"\u1001"', _removeWhitespace(s))
(in the assertion it should be a raw string instead of a
Unicode string).
That should complete make this report accurate.
My fixes applied to json.py
Corrected the testWriteEscapedHexCharacter test case
Logged In: YES
user_id=270334
All line numbers are against the 3.2 version.