Donate Share

better string library

The forum address has changed, you have been automatically redirected. Please update any bookmarks to use the new URL.

Subscribe

UTF-8

You are viewing a single message from this topic. View all messages.

  1. 2009-09-05 11:06:16 UTC
    From what I've seen, I don't understand why you're saying that unicode isn't supported.

    I can store UTF-8 in bstring correctly, (even through I can't expect a[i] giving me the i-th character, but the i-th byte of the string).
    Converting to/from UCS2/UCS4 and UTF-8 is straightforward ( a 10 lines of code), so it isn't that hard to do, when one need to manipulate native unicode string.

    Actually getting [i] to get the i-th character is not that hard too, as it only requires to decode the high bits of a char, but it's a O(N) operation, instead of O(1).
    Overall, using UTF-8 doesn't really increase the required string length (because, if you use a UCS-4 string, most of bytes are zero anyway, while in UTF-8 it's not the case) , and char access in a string is quite rare anyway (so the O(N) penalty isn't that bad)

    Cyril
< Previous | 1 | Next >

Add a Reply

This forum does not allow anonymous participation.

Log in to add a reply. Not registered? Create an account to participate and receive email updates when replies are posted to this topic.