From: <ma...@al...> - 2010-04-19 14:50:26
|
Hello Baptiste ! I decided to use jsoncpp in my application, but faced an issue ! The application uses std:wstring for JSON message values due to unicode maintenance, but I don't see how I can form JSON message through Json::Value. So, why it doesn't support std::wstring and how it could be solved ? Regards. Maxim Aga |
From: Stephan B. <sg...@go...> - 2010-04-19 16:13:47
|
On Mon, Apr 19, 2010 at 4:37 PM, <ma...@al...> wrote: > I decided to use jsoncpp in my application, but faced an issue ! > > The application uses std:wstring for JSON message values due to unicode > maintenance, > > but I don’t see how I can form JSON message through Json::Value. > > So, why it doesn’t support std::wstring and how it could be solved ? > std::wstring uses an unspecified character size and (AFAIK) byte ordering. e.g. gcc's wstring uses 4-byte characters. i recently implemented a conversion from wstring to std::basic_string<uint16_t> for use with utf8cpp, and it looks like this: ValueType == basic_string<uint16_t> void Utf16String::init( wchar_t const * v, size_t len ) { size_t const sl = (v&&len) ? len : 0; if( ! sl ) { this->init( ValueType(), 0 ); return; } ValueType vec; vec.reserve( sl ); vec.assign( v, v+sl ); this->init( vec, 0); } Utf16String::Utf16String( std::wstring const & v ) { this->init( v.empty() ? 0 : v.c_str(), v.size() ); } however, i don't know if that will work as-is for both big- and little-endian. i hope this helps a little bit. -- ----- stephan beal http://wanderinghorse.net/home/stephan/ |
From: Baptiste L. <bap...@gm...> - 2010-04-20 15:26:02
|
2010/4/19 <ma...@al...> > Hello Baptiste ! > > > > I decided to use jsoncpp in my application, but faced an issue ! > > The application uses std:wstring for JSON message values due to unicode > maintenance, > > but I don’t see how I can form JSON message through Json::Value. > > So, why it doesn’t support std::wstring and how it could be solved ? > Unicode awarness in JsonCpp is fairly recent (e.g. handling of unicode escape sequence). And IMHO we still need more tests (such as testing if we correctly handle surrogate escape sequences). You are the first to ask this features. Though, on Windows it is fairly common to represent utf_16 encoded string as wstring, as it is the native encoding of the O.S.. But as Stephan pointed out, the C++ standard specify neither the size of wchar_t (and reality show varying implementation: typically 2 bytes on MSVC, and 4 bytes with gcc). Side note: the next C++ standard introduces utf16_t and utf32_t types to make this explicit, but those are not yet widely available in the industry, so I'd rather we do not rely on this yet. We could fairly easily adds support to set a Json::Value from a std::wstring, since utf-16 and utf-32 encoding are not conflicting (please correct me if I'm wrong). This raises a few questions: - should we store the wstring as a sequence of char after conversion to utf-8, or should it be stored "as this". - What asString() and asCString() should return when initialized with an std::wstring? The string converted in utf-8? - Should we add asWString() and asCWString() ? If so should they return the string encoded in utf-16 or utf-32 on platforms where wchar_t is 32 bits? The string that was passed at initialization? Or should we makes encoding explicit: asUTF16WString(), asUTF32WString()? Concerning the last options, as UTF32 would not be possible on platforms where wchar_t is 2 bytes, this would introduce portability issues... Baptiste. > > > Regards. > > * * > > *Maxim Aga* > |
From: Stephan B. <sg...@go...> - 2010-04-20 15:45:24
|
On Tue, Apr 20, 2010 at 5:25 PM, Baptiste Lepilleur < bap...@gm...> wrote: > 2010/4/19 <ma...@al...> > >> The application uses std:wstring for JSON message values due to unicode >> maintenance, >> > Sorry, one more point: as far as i'm aware, JSON uses only utf8, not utf16 (though most JS interpreters support utf16 script input and/or use utf16 internally as their native string type). If that is indeed true, i believe that utf16 support in jsoncpp is a bit superfluous, except to allow clients to import/export their utf16 data into jsoncpp's internal format (ascii/utf8?). -- ----- stephan beal http://wanderinghorse.net/home/stephan/ |
From: Stephan B. <sg...@go...> - 2010-04-20 15:39:33
|
On Tue, Apr 20, 2010 at 5:25 PM, Baptiste Lepilleur < bap...@gm...> wrote: > Unicode awarness in JsonCpp is fairly recent (e.g. handling of unicode > escape sequence). And IMHO we still need more tests (such as testing if we > correctly handle surrogate escape sequences). > If i may recommend: http://utfcpp.sourceforge.net/ is VERY easy to use and is Public Domain. It can convert/very utf8/16/32 and has a very handy iterator class which lets you iterator over utf8/16/32 strings in a sane manner (each iteration returns on logical character, regardless of its real length). > implementation: typically 2 bytes on MSVC, and 4 bytes with gcc). Side > note: the next C++ standard introduces utf16_t and utf32_t types to make > this explicit, > Yeah!!!! > but those are not yet widely available in the industry, so I'd rather we do > not rely on this yet. > :( > - What asString() and asCString() should return when initialized with an > std::wstring? The string converted in utf-8? > utfcpp makes the conversion to utf8 trivial: utf16to8( inputIteratorBegin, inputIteratorEnd, outputIterator) e.g., something like: std::string u8; utf16to8( wstr.begin(), wstr.end(), std::back_inserter( u8 ) ); i only recently started using utfcpp, but i'm very impressed with how easy it is to use (i'm no Unicode expert, so i need tools like this to help me :). -- ----- stephan beal http://wanderinghorse.net/home/stephan/ |
From: Stephan B. <sg...@go...> - 2010-04-20 15:39:59
|
On Tue, Apr 20, 2010 at 5:39 PM, Stephan Beal <sg...@go...> wrote: > is VERY easy to use and is Public Domain. It can convert/very utf8/16/32 > and has a very handy > convert/very == convert/verIFy -- ----- stephan beal http://wanderinghorse.net/home/stephan/ |
From: Baptiste L. <bap...@gm...> - 2010-04-21 16:22:33
|
2010/4/21 <ma...@al...> > [...] > > > Should we add asWString() and asCWString() ? If so should they return the > string encoded in utf-16 or utf-32 on platforms where wchar_t is 32 bits? > The string that was passed at initialization? Or should we makes encoding > explicit: asUTF16WString(), asUTF32WString()? Concerning the last options, > as UTF32 would not be possible on platforms where wchar_t is 2 bytes, this > would introduce portability issues... > Yep, better to have asWString() and asCWString(), and they always should > return string encoded in utf-16. > > What is your rational for always returning string encoded in utf-16? I'm afraid that if we do so on a system where wchar_t is encoded using utf-32, the resulting string would not be compatible with system w* API, such as towlower... listed in part here: http://www.unix.org/version2/whatsnew/login_mse.html. |