|
From: Brodie T. <bro...@je...> - 2006-12-14 20:49:44
|
This is a followup on my question a few days ago when I asked whether it wouldn't be faster to special case UTF-8 construction in UnicodeString using the u_strFromUtf8 functions. Results are that it appears to be up to 2x faster to construct a UnicodeString manually using the u_strFromUTF8* functions than by using the UnicodeString constructor. Constructor method is: UnicodeString s(data, dataLen, "UTF-8"); Manual method is roughly: UnicodeString s; UChar * pBuf = s.getBuffer(dataLen); u_strFromUTF8(pBuf, dataLen, &actualLen, data, dataLen, &err); s.releaseBuffer(actualLen); My results running on WinXP with a optimized for speed release build. viaUnicodeStringConstructor (1000000 times): 1853 ms viaUnicodeStringConstructor invalid (1000000 times): 2604 ms viaStrFromUTF8WithSub (1000000 times): 1141 ms viaStrFromUTF8WithSub invalid (1000000 times): 1312 ms -- detected invalid input viaStrFromUTF8Lenient (1000000 times): 1082 ms viaStrFromUTF8Lenient invalid (1000000 times): 921 ms viaStrFromUTF8 (1000000 times): 1132 ms viaStrFromUTF8 invalid (1000000 times): 791 ms -- detected invalid input Project used to generate the results at http://jellycan.com/etc/UnicodeString.zip It would seem from these results that UnicodeString should have the constructor method special cased for creating it from UTF-8. Note that it would also be nice to have a setTo method for setting the contents of the string using a codepage. Regards, Brodie |