Wide chars

C++ JSON parser (Now in GitHub)

Status: Beta

Brought to you by: aaronjacobs, blep, christopherdunn

#14 Wide chars

Milestone: Next Release (example)

Status: closed

Owner: nobody

Labels: None

Priority: 5

Updated: 2015-03-06

Created: 2010-10-10

Creator: Anonymous

Private: No

Please support wchars in next versions for Cyrillic!;)

Discussion

Baptiste Lepilleur - 2011-05-01

The current way to address unicode data is to convert them to UTF-8 before putting them in Json::Value.

wchar_t support is probably something worth adding on the roadmap, but designing the API/implementation for this would likely be tricky...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher Dunn - 2011-06-23

@Edward0,

There are 3 related questions:

1. What input/output encodings are supported?
2. How are strings stored in jsoncpp?
3. What encodings are supported in the API?

Answers:

#1
On input, only ascii. If you want unicode strings, you must use the "\u..." escape sequence, specified in the JSON standard. Not even UTF-8 is supported..

On output, only UTF-8. I believe we currently fail to convert the UTF-8 back to ascii. Only ascii control-characters are escaped.

As a result, I believe that a proper ascii file can be read by jsoncpp and then written as a valid JSON file that can no longer be read by jsoncpp. That's a bug.

#2
String data are stored as UTF-8.

#3
Strings are also returned as UTF-8.

@edward0, If you want some other encoding, you will have to use your own codec. On Linux, you can use `iconv()`.

I don't like wide characters because they do not specify an encoding. Any function which returns wchar_t would have to accept an argument for the encoding, or assume one. (UTF-16? UTF-32?) Since we store in UTF-8, the extra API functions would not improve efficiency at all.

@blep, Should I submit the problem identified in #1 as a bug?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher Dunn - 2015-03-06

UTF-8 round-trip is fixed at: https://github.com/open-source-parsers/jsoncpp/

UTF-16 will never be supported. (Surrogate pairs are a mess.)

UTF-32 is on the distant wishlish.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher Dunn - 2015-03-06

status: open --> closed

Group: --> Next Release (example)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.