2009-09-05 11:06:16 UTC
From what I've seen, I don't understand why you're saying that unicode isn't supported.
I can store UTF-8 in bstring correctly, (even through I can't expect a[i] giving me the i-th character, but the i-th byte of the string).
Converting to/from UCS2/UCS4 and UTF-8 is straightforward ( a 10 lines of code), so it isn't that hard to do, when one need to manipulate native unicode string.
Actually getting [i] to get the i-th character is not that hard too, as it only requires to decode the high bits of a char, but it's a O(N) operation, instead of O(1).
Overall, using UTF-8 doesn't really increase the required string length (because, if you use a UCS-4 string, most of bytes are zero anyway, while in UTF-8 it's not the case) , and char access in a string is quite rare anyway (so the O(N) penalty isn't that bad)
Cyril