json-cpp / Bugs / #47 a problem with \u0000 character

C++ JSON parser (Now in GitHub)

#47 a problem with \u0000 character

Milestone: 0.5.0

Status: closed-fixed

Owner: Baptiste Lepilleur

Labels: Reader (16)

Priority: 7

Updated: 2015-03-06

Created: 2012-07-06

Creator: wojciech

Private: No

It seems that either Reader or Value has a problem with \u0000 character interpretation in JSON string.

A sample code snippet to introduce the problem:

string jsonStr("[\"\\u0040\\u0040\\u0000\\u0011\"]");

std::cout << "jsonStr: [" << jsonStr << "]" << std::endl;

Json::Value root; // will contains the root value after parsing.
Json::Reader reader;

if ( ! reader.parse(input, root, false) )
{
// Report to the user the failure and their locations in the document.
std::cerr << "Error: Failed to parse input string:" << std::endl
<< reader.getFormatedErrorMessages();

(void) exit(EXIT_FAILURE);
}

if ( ! root.isArray() )
{
std::cerr << "Error: Parsed JSON is not an array." << std::endl;
exit(EXIT_FAILURE);
}

Json::Value::iterator valueIter = root.begin();
while ( root.end() != valueIter )
{
std::string value = (*valueIter).asString();

std::cout << "val: [" << value << "]" << std::endl;

std::cout << "value.length(): " << value.length() << std::endl;

std::cout << "hex-dump: [";
std::string::iterator charIter = value.begin();
while ( value.end() != charIter )
{
unsigned int val = *charIter;

std::cout << std::hex << std::setw(2) << std::setfill('0') << val << " ";

charIter++;
}
std::cout << "]" << std::endl;

valueIter++;
}

Here is output I receive in the console:

jsonStr: [["\u0040\u0040\u0000\u0011"]]
val: [@@]
value.length(): 2
hex-string: [40 40 ]

As You can see, parsed JSON string containing Unicode characters when converted to std::string has invalid content. It looks like Json::Value::asString() treats \u0000 character as a string terminator which is not true. In result converted string has length = 2 instead of 4. C++ std::string object knows how long is the string it holds, therefore it does not use \0 character as a terminator, like old C does.

I tested the reported problem in other JSON parser implementations, like in Firefox JS engine Gecko engine.

var json = '["\\u0040\\u0040\\u0000\\u0011\\u0069"]';

console.log("json:", json);

var o = JSON.parse(json);

console.log("o[0].length:", o[0].length);
console.log("o[0]:", o[0]);

The output is correct:

json: ["\u0040\u0040\u0000\u0011\u0069"]
o[0].length: 5
o[0]: @@i

Discussion

wojciech - 2012-07-06

labels: 1355415 --> Reader

priority: 5 --> 7

assigned_to: nobody --> blep
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefan Wehner - 2013-04-10

We've also come across this problem lately and I've added the patch 3610134
for 0.6.0-rc2 which should add support for \0 characters.

https://sourceforge.net/tracker/?func=detail&aid=3610134&group_id=144446&atid=758828

If you want to test/try it, I'm sure the feedback will be appreciated!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher Dunn - 2015-03-06

UTF-8 with embedded zeroes is supported now at: https://github.com/open-source-parsers/jsoncpp/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher Dunn - 2015-03-06

status: open --> closed-fixed

Group: --> 0.5.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

a problem with \u0000 character

C++ JSON parser (Now in GitHub)

Group

Searches

Help

#47 a problem with \u0000 character

Discussion