1. What input/output encodings are supported?
2. How are strings stored in jsoncpp?
3. What encodings are supported in the API?
Answers:
#1
On input, only ascii. If you want unicode strings, you must use the "\u..." escape sequence, specified in the JSON standard. Not even UTF-8 is supported..
On output, only UTF-8. I believe we currently fail to convert the UTF-8 back to ascii. Only ascii control-characters are escaped.
As a result, I believe that a proper ascii file can be read by jsoncpp and then written as a valid JSON file that can no longer be read by jsoncpp. That's a bug.
#2
String data are stored as UTF-8.
#3
Strings are also returned as UTF-8.
@edward0, If you want some other encoding, you will have to use your own codec. On Linux, you can use `iconv()`.
I don't like wide characters because they do not specify an encoding. Any function which returns wchar_t would have to accept an argument for the encoding, or assume one. (UTF-16? UTF-32?) Since we store in UTF-8, the extra API functions would not improve efficiency at all.
@blep, Should I submit the problem identified in #1 as a bug?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The current way to address unicode data is to convert them to UTF-8 before putting them in Json::Value.
wchar_t support is probably something worth adding on the roadmap, but designing the API/implementation for this would likely be tricky...
@Edward0,
There are 3 related questions:
1. What input/output encodings are supported?
2. How are strings stored in jsoncpp?
3. What encodings are supported in the API?
Answers:
#1
On input, only ascii. If you want unicode strings, you must use the "\u..." escape sequence, specified in the JSON standard. Not even UTF-8 is supported..
On output, only UTF-8. I believe we currently fail to convert the UTF-8 back to ascii. Only ascii control-characters are escaped.
As a result, I believe that a proper ascii file can be read by jsoncpp and then written as a valid JSON file that can no longer be read by jsoncpp. That's a bug.
#2
String data are stored as UTF-8.
#3
Strings are also returned as UTF-8.
@edward0, If you want some other encoding, you will have to use your own codec. On Linux, you can use `iconv()`.
I don't like wide characters because they do not specify an encoding. Any function which returns wchar_t would have to accept an argument for the encoding, or assume one. (UTF-16? UTF-32?) Since we store in UTF-8, the extra API functions would not improve efficiency at all.
@blep, Should I submit the problem identified in #1 as a bug?
UTF-8 round-trip is fixed at: https://github.com/open-source-parsers/jsoncpp/
UTF-16 will never be supported. (Surrogate pairs are a mess.)
UTF-32 is on the distant wishlish.