Menu

#9 JsonWriter should encode Unicode characters

open
nobody
None
5
2005-08-07
2005-08-07
No

Right now, JsonWriter will simply concatenate all
strings "into" the written out string. However, it
should encode all characters not in the default ASCII
range (ord(character)>=128).

An example of the problem (\u20ac is a single
character). The following will read properly (in the
string Unicode character is encoded as 6 characters):

t = json.read(r'{"\u20ac_bef":27}')

Writing this out does not encode Unicode characters back:

json.write(t)

gives

>>> json.write(t)
u'{"\u20ac_bef":27}'

which is wrong, check out the 3rd character:

>>> json.write(t)[2]
u'\u20ac'

So, this should be encoded. Before line 289
(self._append(obj)), you should insert the following
fix (of course, the fix and fixFunction declarations
only need to be initialized once somewhere else):
import
fix = re.compile(u"([\u0080-\uffff])")
fixFunction = lambda(x): r"\u%04x" % ord(x.group(1))
fix.sub(fixFunction, obj)

This fixes it and should have no side-effects (besides
the introduction of a Regular expression).

Discussion

  • Koen van de Sande

    Logged In: YES
    user_id=270334

    A quick note: The import statement above should of course
    read "import re".

     
  • Koen van de Sande

    Logged In: YES
    user_id=270334

    Another addendum: I had to add the Unicode type to line 216/217:
    if type(key) is not types.StringType and
    type(key) is not types.UnicodeType:
    raise ReadException, "Not a valid JSON
    object key (should be a string): %s" % key

    Otherwise my testcase would not be read at all, even though
    it is valid JSON (I think, anyway).

    Related to this bug is the test case
    "testWriteEscapedHexCharacter", whose result is wrong. The
    test case should be:

    def testWriteEscapedHexCharacter(self):
    s = json.write(u'\u1001')
    self.assertEqual(r'"\u1001"', _removeWhitespace(s))

    (in the assertion it should be a raw string instead of a
    Unicode string).

    That should complete make this report accurate.

     
  • Koen van de Sande

    My fixes applied to json.py

     
  • Koen van de Sande

    Corrected the testWriteEscapedHexCharacter test case

     
  • Koen van de Sande

    Logged In: YES
    user_id=270334

    All line numbers are against the 3.2 version.

     

Log in to post a comment.