new json_string doesn't escape certain characters that it should
Generates text that depends on changing data (like dynamic HTML).
Brought to you by:
revusky
I have a string that contains the following text "• The Penguins continued their sloppy play". When applying the freemarker string function ?json_string the • character should be encoded as \u2022, but it is not.
I have also seen this with the character — which should be encoded as \u2014, but it is not.
I am running the latest version of freemarker 2.3.19.
For reference here is the relevant portion of the documentation provided on json_string:
All characters under UCS code point 0x20 will be escaped. When they have no dedicated escape sequence in JSON (like \n, \t, etc.), they will be replaced with a UNICODE escape (\uXXXX).
Yes, and the same should be fixed in ?js_string as well. I will try to fix it in 2.3.20. (I hope it will be released in a few days...)
Will the fix also be applied to j_string?
Yes, of course.
Actually, I didn't yet found anything that says that u2022 should be escaped... However, I known from experience that at some high-code characters JavaScript parsers used to die. I'm still looking for any references that tells which are those characters. Both the JSON RFC and the ECMAScript 5 spec allows these unescaped.
Last edit: Dániel Dékány 2013-06-10
So, I did found some bugs. u007f-u009f are UNICODE "control" characters, so possibly they should be escaped in JSON (not according to RFC 4627, however). Also u2028 and u2029 should be escaped, because in JavaScript (not in JSON) they count as line terminators so it breaks string literals. So so far, that's all I will fix.
What JSON parser breaks on u2022 or on u2014 (any why)?
BTW, I haven't found a JSON specification anywhere. It has two home pages dedicated to it, but no specification. I have found an RFC, where the grammar tells me that 0x7F-0x9F need not be escaped, while "/" must be (everybody else says that escaping '\' is optional, it's just against the
"</script>"hack). json.org says "control character" must be escaped, but doesn't tell what's a control character...Last edit: Dániel Dékány 2013-06-10
We found this when using freemarker to produce some data to send to a
customer of ours and they weren't accepting the u2022 character. When I
read the documentation on what should be escaped and what shouldn't it
wasn't clear exactly what was meant. I believe it says that characters
under u20 should be escaped. I'm not sure if that is inclusive and I wasn't
sure if that meant that characters that start with u20 (which the 2 I
reported do) should be included in that.
Since they are not meant to be escaped then I guess this is not really an
issue, although it does sound as though the work was not completely wasted.
Thanks for looking into this and for communicating what you found with me.
On Mon, Jun 10, 2013 at 11:17 AM, "Dániel Dékány" ddekany@users.sf.netwrote:
0x2022 and such is clearly not "under UCS code point 0x20". Is it maybe that the client uses the wrong charset to decode the received data before the JSON parsing? Like if you send the data with UTF-8 but then it's decoded with some 8 bit charset, then it will be seen as illegal JSON, because it will apparently contain unescaped control characters. Like the UTF-8 for u2022 is E2 80 A2, and if the client sees it as 3 characters, then 80 is an unescaped control character.
Last edit: Dániel Dékány 2013-06-10
Fixed the escaping issues that I have found (see earlier). I don't think u2022 etc. is a problem so it's not escaped. Will be released with 2.3.20.
Fixed in 2.3.20.