It seems that either Reader or Value has a problem with \u0000 character interpretation in JSON string.
A sample code snippet to introduce the problem:
string jsonStr("[\"\\u0040\\u0040\\u0000\\u0011\"]");
std::cout << "jsonStr: [" << jsonStr << "]" << std::endl;
Json::Value root; // will contains the root value after parsing.
Json::Reader reader;
if ( ! reader.parse(input, root, false) )
{
// Report to the user the failure and their locations in the document.
std::cerr << "Error: Failed to parse input string:" << std::endl
<< reader.getFormatedErrorMessages();
(void) exit(EXIT_FAILURE);
}
if ( ! root.isArray() )
{
std::cerr << "Error: Parsed JSON is not an array." << std::endl;
exit(EXIT_FAILURE);
}
Json::Value::iterator valueIter = root.begin();
while ( root.end() != valueIter )
{
std::string value = (*valueIter).asString();
std::cout << "val: [" << value << "]" << std::endl;
std::cout << "value.length(): " << value.length() << std::endl;
std::cout << "hex-dump: [";
std::string::iterator charIter = value.begin();
while ( value.end() != charIter )
{
unsigned int val = *charIter;
std::cout << std::hex << std::setw(2) << std::setfill('0') << val << " ";
charIter++;
}
std::cout << "]" << std::endl;
valueIter++;
}
Here is output I receive in the console:
jsonStr: [["\u0040\u0040\u0000\u0011"]]
val: [@@]
value.length(): 2
hex-string: [40 40 ]
As You can see, parsed JSON string containing Unicode characters when converted to std::string has invalid content. It looks like Json::Value::asString() treats \u0000 character as a string terminator which is not true. In result converted string has length = 2 instead of 4. C++ std::string object knows how long is the string it holds, therefore it does not use \0 character as a terminator, like old C does.
I tested the reported problem in other JSON parser implementations, like in Firefox JS engine Gecko engine.
var json = '["\\u0040\\u0040\\u0000\\u0011\\u0069"]';
console.log("json:", json);
var o = JSON.parse(json);
console.log("o[0].length:", o[0].length);
console.log("o[0]:", o[0]);
The output is correct:
json: ["\u0040\u0040\u0000\u0011\u0069"]
o[0].length: 5
o[0]: @@i
We've also come across this problem lately and I've added the patch 3610134
for 0.6.0-rc2 which should add support for \0 characters.
https://sourceforge.net/tracker/?func=detail&aid=3610134&group_id=144446&atid=758828
If you want to test/try it, I'm sure the feedback will be appreciated!
UTF-8 with embedded zeroes is supported now at: https://github.com/open-source-parsers/jsoncpp/