[Swig-devel] [ swig-Bugs-1797418 ] C#: wchar_t should be marshalled as UnmanagedType.U2

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #1797418, was opened at 2007-09-18 14:47
Message generated for change (Comment added) made by qwertie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=101645&aid=1797418&group_id=1645

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: csharp
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Piepgrass (qwertie)
Assigned to: William Fulton (wsfulton)
Summary: C#: wchar_t should be marshalled as UnmanagedType.U2

Initial Comment:
wchar_t should be marshalled as UnmanagedType.U2, not the default of 1 byte.

----------------------------------------------------------------------

>Comment By: David Piepgrass (qwertie)
Date: 2007-09-20 13:38

Message:
Logged In: YES 
user_id=171344
Originator: YES

Unicode as 32-bit ints is inevitably wasteful even if you use the full
Unicode range, which is only 20 bits. Further, AFAIK all "living language"
characters fit in 16 bits. If wchar_t were only used to represent single
characters it would be fine, but for strings (wstring) it's very wasteful.
I guess it's better to use std::basic_string<unsigned short> rather than
std::wstring, although on Windows wchar_t may be better because a debugger
understands that it represents characters.

As for whether to use 16-bit or 8-bit strings, it's a no-win situation.
UTF-8 is inefficient for representing languages like Chinese, while UTF-16
is inefficient for European languages. And the minimum addressing boundary
of our computers is 8 bits, so I'm afraid 5-bit character strings are out
:P

----------------------------------------------------------------------

Comment By: Olly Betts (olly)
Date: 2007-09-19 17:15

Message:
Logged In: YES 
user_id=14972
Originator: NO

A 32 bit type is the narrowest available integer type which can hold the
full Unicode range, so I guess that's why it was chosen.  Unicode as wide
characters inevitably is wasteful if you don't actually use that range. 
Restricting to the BMP and using a 16 bit type is wasteful if you only have
English text.  If you only want upper case letters and 6 other characters,
you only need 5 bits, so 8 bits per character is wasteful!

Anyway, using a plain int sounds reasonable to me, but I don't really know
the innards of C# - William's your man for that.

----------------------------------------------------------------------

Comment By: David Piepgrass (qwertie)
Date: 2007-09-19 09:15

Message:
Logged In: YES 
user_id=171344
Originator: YES

Here's an idea: perhaps wchar_t should be marshalled as a plain int in the
PINVOKE class, and the two wrappers can convert between char and wchar_t on
each end.

----------------------------------------------------------------------

Comment By: David Piepgrass (qwertie)
Date: 2007-09-19 09:13

Message:
Logged In: YES 
user_id=171344
Originator: YES

sizeof(wchar_t) is 4??? That's amazing to me. What a waste of memory. I'll
be sure not to call my wide characters "wchar_t" if I get around to coding
on Linux.

Unfortunately, U4 doesn't work on Win32; the .NET framework throws an
exception with a message saying 'char' can only marshal as U1, U2, I1 or
I2.

----------------------------------------------------------------------

Comment By: Olly Betts (olly)
Date: 2007-09-19 08:25

Message:
Logged In: YES 
user_id=14972
Originator: NO

On Linux at least, sizeof(wchar_t) is 4, so U2 will truncate characters
outside the BMP.  That's better than the current situation, but should this
actually be U4?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=101645&aid=1797418&group_id=1645

[Swig-devel] [ swig-Bugs-1797418 ] C#: wchar_t should be marshalled as UnmanagedType.U2

A code generator for connecting C/C++ with other programming languages

[Swig-devel] [ swig-Bugs-1797418 ] C#: wchar_t should be marshalled as UnmanagedType.U2