wcs
Brought to you by:
rparlett
Originally created by: grshiplett
What steps will reproduce the problem?
1. clients decide to use utf-16 in a project
2. s/w will emit utf-8 for webpages only
3. wcs can be efficient (utf-16)
What is the expected output? What do you see instead?
a Unicode format which is randomly accessible
What version of the product are you using? On what operating system?
What C compiler are you using, and what version?
Please provide any additional information below.
Perhaps a wcs() was rejected early on in decision to use ucs() ?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: grshiplett
for a defense of wide-char types as in utf-16, see various from India ; output in utf-8 is often seen a different/separate issue from internally processing as utf-16 where storage as utf-16 is not an issue.
There is of course resistance : example might be excellent notepad++ for Windows which remains ANSI/utf-8 and endian-oriented only.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: r.parl...@gmail.com
Hello Robert
Thanks for the suggestion. I think utf-16 would be a poor choice for an internal format, since it can't represent the entire unicode range (at least not without giving up random access, which is the only thing going for it). Also, the nice thing about utf-8 is that it provides a bridge between unicode and conventional string types - string(u), for ucs u, is a more-or-less instant operation.
Could one not simply write a ucs string to utf-16 conversion procedure? There are some conversion procs in the file lib/main/text.icn which may show the correct outline to follow - they have to be done quite carefully in order to be efficient for long strings.
Kind regards
R
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: grshiplett
Hi,
thanks for the suggestion ... I'll take a look.
I use a few languages that do use utf-16, but I don't know what the future
will bring - one of of them accepts a very wide variety of source encodings
( some programmers in India were not happy to use UTF-32 for their local
languages and I recall their enthusiasm for utf-8 )