Menu

#47 a problem with \u0000 character

0.5.0
closed-fixed
Reader (16)
7
2015-03-06
2012-07-06
wojciech
No

It seems that either Reader or Value has a problem with \u0000 character interpretation in JSON string.

A sample code snippet to introduce the problem:

string jsonStr("[\"\\u0040\\u0040\\u0000\\u0011\"]");

std::cout << "jsonStr: [" << jsonStr << "]" << std::endl;

Json::Value root; // will contains the root value after parsing.
Json::Reader reader;

if ( ! reader.parse(input, root, false) )
{
// Report to the user the failure and their locations in the document.
std::cerr << "Error: Failed to parse input string:" << std::endl
<< reader.getFormatedErrorMessages();

(void) exit(EXIT_FAILURE);
}

if ( ! root.isArray() )
{
std::cerr << "Error: Parsed JSON is not an array." << std::endl;
exit(EXIT_FAILURE);
}

Json::Value::iterator valueIter = root.begin();
while ( root.end() != valueIter )
{
std::string value = (*valueIter).asString();

std::cout << "val: [" << value << "]" << std::endl;

std::cout << "value.length(): " << value.length() << std::endl;

std::cout << "hex-dump: [";
std::string::iterator charIter = value.begin();
while ( value.end() != charIter )
{
unsigned int val = *charIter;

std::cout << std::hex << std::setw(2) << std::setfill('0') << val << " ";

charIter++;
}
std::cout << "]" << std::endl;

valueIter++;
}

Here is output I receive in the console:

jsonStr: [["\u0040\u0040\u0000\u0011"]]
val: [@@]
value.length(): 2
hex-string: [40 40 ]

As You can see, parsed JSON string containing Unicode characters when converted to std::string has invalid content. It looks like Json::Value::asString() treats \u0000 character as a string terminator which is not true. In result converted string has length = 2 instead of 4. C++ std::string object knows how long is the string it holds, therefore it does not use \0 character as a terminator, like old C does.

I tested the reported problem in other JSON parser implementations, like in Firefox JS engine Gecko engine.

var json = '["\\u0040\\u0040\\u0000\\u0011\\u0069"]';

console.log("json:", json);

var o = JSON.parse(json);

console.log("o[0].length:", o[0].length);
console.log("o[0]:", o[0]);

The output is correct:

json: ["\u0040\u0040\u0000\u0011\u0069"]
o[0].length: 5
o[0]: @@i

Discussion

  • wojciech

    wojciech - 2012-07-06
    • labels: 1355415 --> Reader
    • priority: 5 --> 7
    • assigned_to: nobody --> blep
     
  • Stefan Wehner

    Stefan Wehner - 2013-04-10

    We've also come across this problem lately and I've added the patch 3610134
    for 0.6.0-rc2 which should add support for \0 characters.

    https://sourceforge.net/tracker/?func=detail&aid=3610134&group_id=144446&atid=758828

    If you want to test/try it, I'm sure the feedback will be appreciated!

     
  • Christopher Dunn

    UTF-8 with embedded zeroes is supported now at: https://github.com/open-source-parsers/jsoncpp/

     
  • Christopher Dunn

    • status: open --> closed-fixed
    • Group: --> 0.5.0
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.