Re: [Swig-user] Default typemaps for C++/Java string types and I18N

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 23/03/11 16:04, Soren Soe wrote:
> William S Fulton wrote:
>> On 18/03/11 19:37, Soren Soe wrote:
>>> Hi,
>>>
>>> I am curious about the default typemap for for example std::string
>>> conversion to and from Java String. Java string is unicode, and on the
>>> C++ side std::string is constructed from 8 bit char. The C++ strings
>>> are not necessarily UTF8 encoded in the current language locale, so when
>>> the swig typemap (std_string.i) uses GetStringUTFChars on the jstring,
>>> the resulting 8bit string may be garbage depending on the locale. I am
>>> working on a application that runs under both *nix and windows.
>>>
>>> The UTF8 conversion works fine on Linux, but not so on windows, where I
>>> am trying to get my application running on a Japanese OS with locale set
>>> to Japanese_Japan.932. The string encoding on the C++ side uses a
>>> multi-byte representation for the native characters, but the encoding is
>>> not UTF8.
>>>
>>> My question is why the default string typemaps are coded to use
>>> GetStringUTFChars and NewStringUTF? Shouldn't they be written to use
>>> the std::codecvt facet from the standard C++ library? The codecvt
>>> facet will convert between wchar_t and char according to the current
>>> locale.
>>>
>>> I have written my own typemaps for std::string and char* to use the
>>> std::codecvt facet and my application is now behaving as expected on the
>>> Japenese OS. However, I am worried that I am missing something
>>> fundamental here; I find it hard to imagine that the default swig
>>> typemaps are not I18N compatible.
>>>
>>> Any help/comments would be greatly appreciated.
>>>
>>
>> I think it is simply that they were simply written with ASCII in mind
>> and no-one has used them for anything outside of that. I don't recall
>> this issue being brought up before.
>>
>> I suggest you put a patch to the current typemaps on the SourceForge
>> patch tracker. A simple test using UTF would be much appreciated for
>> the US locale for regression testing.
>>
>> How does this work for char * in C only mode?
>>
>> William
>>
>>
> I would be happy to patch the typemaps as you suggest. However, I am not
> an expert on writing typemaps so I am sure the ones I wrote are *not* up
> to the required standard.
>
Modifications to the current typemaps would surely be okay.

> What's worse is that writing code to use the std::codecvt is not exactly
> straight forward. I had to write some support code to interface with the
> codecvt facet. There was no way I would litter the typemaps with the raw
> code, plus the support code is used in other places too for string
> conversions, so currently the typemaps make use of the support code. If
> I knew more about the proper way of organizing typemaps and sharing code
> between typemaps, I could attempt a patch as suggested that wouldn't
> rely on my support code. I will look into doing this, but if you have
> any pointers or examples to get me started in the right direction please
> let me know.
>
For support code, this can be put into a function which will only be 
generated if you use fragments, as described here: 
http://www.swig.org/Doc2.0/Typemaps.html#Typemaps_fragments

> As for the char* and C. These typemaps must be written to use the C
> native wide character to/from multi-byte conversion routines. In fact,
> maybe the C native conversions should be used in the std::string
> typemaps? Since my project deals with C++/Java only I was able add
> specific typemaps for char* that use the codecvt facet, e.g. same
> conversions as for std::string.
>
Yes we need to accommodate lowest common denominator and that often 
means C code even though C++ might be used.

William

Re: [Swig-user] Default typemaps for C++/Java string types and I18N

A code generator for connecting C/C++ with other programming languages

Re: [Swig-user] Default typemaps for C++/Java string types and I18N