#47 a problem with \u0000 character

0.5.0
closed-fixed
Reader (16)
7
2015-03-06
2012-07-06
wojciech
No

It seems that either Reader or Value has a problem with \u0000 character interpretation in JSON string.

A sample code snippet to introduce the problem:

string jsonStr("[\"\\u0040\\u0040\\u0000\\u0011\"]");

std::cout << "jsonStr: [" << jsonStr << "]" << std::endl;

Json::Value root; // will contains the root value after parsing.
Json::Reader reader;

if ( ! reader.parse(input, root, false) )
{
// Report to the user the failure and their locations in the document.
std::cerr << "Error: Failed to parse input string:" << std::endl
<< reader.getFormatedErrorMessages();

(void) exit(EXIT_FAILURE);
}

if ( ! root.isArray() )
{
std::cerr << "Error: Parsed JSON is not an array." << std::endl;
exit(EXIT_FAILURE);
}

Json::Value::iterator valueIter = root.begin();
while ( root.end() != valueIter )
{
std::string value = (*valueIter).asString();

std::cout << "val: [" << value << "]" << std::endl;

std::cout << "value.length(): " << value.length() << std::endl;

std::cout << "hex-dump: [";
std::string::iterator charIter = value.begin();
while ( value.end() != charIter )
{
unsigned int val = *charIter;

std::cout << std::hex << std::setw(2) << std::setfill('0') << val << " ";

charIter++;
}
std::cout << "]" << std::endl;

valueIter++;
}

Here is output I receive in the console:

jsonStr: [["\u0040\u0040\u0000\u0011"]]
val: [@@]
value.length(): 2
hex-string: [40 40 ]

As You can see, parsed JSON string containing Unicode characters when converted to std::string has invalid content. It looks like Json::Value::asString() treats \u0000 character as a string terminator which is not true. In result converted string has length = 2 instead of 4. C++ std::string object knows how long is the string it holds, therefore it does not use \0 character as a terminator, like old C does.

I tested the reported problem in other JSON parser implementations, like in Firefox JS engine Gecko engine.

var json = '["\\u0040\\u0040\\u0000\\u0011\\u0069"]';

console.log("json:", json);

var o = JSON.parse(json);

console.log("o[0].length:", o[0].length);
console.log("o[0]:", o[0]);

The output is correct:

json: ["\u0040\u0040\u0000\u0011\u0069"]
o[0].length: 5
o[0]: @@i

Discussion

  • wojciech
    wojciech
    2012-07-06

    • labels: 1355415 --> Reader
    • priority: 5 --> 7
    • assigned_to: nobody --> blep
     
    • status: open --> closed-fixed
    • Group: --> 0.5.0