Re: [icu-design] ICU4C API proposal: UnicodeString<->UTF-32

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Tue, Feb 17, 2009 at 10:18 AM, Andy Heninger <and...@gm...>wrote:

> On Mon, Feb 16, 2009 at 5:27 PM, Markus Scherer <mar...@gm...>
> wrote:
> > NB:
> > I could just as well add such functions outside the UnicodeString class,
> as
> > UnicodeString UnicodeStringFromUTF32(const UChar32 *utf32, length);
> > and
> > int32_t UnicodeStringToUTF32(
> >     const UnicodeString &s,
> >     UChar32 *utf32, int32_t capacity, UErrorCode *pErrorCode);
> > Would that be better?
>
> I think we should follow the conventions and style of the existing
> UnicodeString class as closely as possible, which would suggest
> constructors and/or member functions.  Seems like it would be less
> confusing overall.

Note: Sometime soon I plan to propose additional functions, for creating a
UnicodeString from an STL string (UTF-8) and vice versa. It seems cleaner to
add all of these as non-member functions.

It might also be nice to have dedicated functions for UTF-8 char* (not
taking a charset name parameter), and those would not work well as
constructor/setTo overloads. We already got into that problem with the
dedicated from-invariant-characters constructor/setTo for which we had to
invent a weird signature with a special enum type.

In terms of performance, I don't think there is much of a difference.
Whichever way the API is done, a conversion from UTF-32 to UTF-16 has to be
done. It's fast, but not as fast as the existing setTo() which either just
do a memcpy() or alias the UnicodeString's internal pointer to a buffer. In
fact, by not providing constructor/setTo functions for "expensive"
operations, they stand out better to someone looking at code. But you are
right that we didn't follow this model with our existing constructors (only
with the setTo() functions.)

Ok, there is setTo(UChar32) -- but that's not much of a conversion, it's
just a U16_APPEND() macro wrapper :-)

markus

Re: [icu-design] ICU4C API proposal: UnicodeString<->UTF-32

Open Source C/C++/Java libraries from Unicode

Re: [icu-design] ICU4C API proposal: UnicodeString<->UTF-32