[icu-support] Test results: UnicodeString constructor vs u_strFromUtf8

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

This is a followup on my question a few days ago when I asked whether it
wouldn't be faster to special case UTF-8 construction in UnicodeString
using the u_strFromUtf8 functions.

Results are that it appears to be up to 2x faster to construct a
UnicodeString manually using the u_strFromUTF8* functions than by using
the UnicodeString constructor.

Constructor method is:
  UnicodeString s(data, dataLen, "UTF-8");

Manual method is roughly:
  UnicodeString s;
  UChar * pBuf = s.getBuffer(dataLen);
  u_strFromUTF8(pBuf, dataLen, &actualLen, data, dataLen, &err);
  s.releaseBuffer(actualLen);

My results running on WinXP with a optimized for speed release build.

viaUnicodeStringConstructor (1000000 times): 1853 ms
viaUnicodeStringConstructor invalid (1000000 times): 2604 ms
viaStrFromUTF8WithSub (1000000 times): 1141 ms
viaStrFromUTF8WithSub invalid (1000000 times): 1312 ms
  -- detected invalid input
viaStrFromUTF8Lenient (1000000 times): 1082 ms
viaStrFromUTF8Lenient invalid (1000000 times): 921 ms
viaStrFromUTF8 (1000000 times): 1132 ms
viaStrFromUTF8 invalid (1000000 times): 791 ms
  -- detected invalid input

Project used to generate the results at
http://jellycan.com/etc/UnicodeString.zip

It would seem from these results that UnicodeString should have the
constructor method special cased for creating it from UTF-8. Note that
it would also be nice to have a setTo method for setting the contents of
the string using a codepage.

Regards,
Brodie

[icu-support] Test results: UnicodeString constructor vs u_strFromUtf8

Open Source C/C++/Java libraries from Unicode

[icu-support] Test results: UnicodeString constructor vs u_strFromUtf8