Menu

#18 wcs

New
nobody
None
Medium
Defect
2014-04-03
2014-03-31
Anonymous
No

Originally created by: grshiplett

What steps will reproduce the problem?
1. clients decide to use utf-16 in a project
2. s/w will emit utf-8 for webpages only
3. wcs can be efficient (utf-16)

What is the expected output? What do you see instead?
a Unicode format which is randomly accessible

What version of the product are you using? On what operating system?

What C compiler are you using, and what version?

Please provide any additional information below.
Perhaps a wcs() was rejected early on in decision to use ucs() ?

Discussion

  • Anonymous

    Anonymous - 2014-03-31

    Originally posted by: grshiplett

    for a defense of wide-char types as in utf-16, see various from India ; output in utf-8 is often seen a different/separate issue from internally processing as utf-16 where storage as utf-16 is not an issue.
    There is of course resistance : example might be excellent notepad++ for Windows which remains ANSI/utf-8 and endian-oriented only.

     
  • Anonymous

    Anonymous - 2014-04-02

    Originally posted by: r.parl...@gmail.com

    Hello Robert

    Thanks for the suggestion.  I think utf-16 would be a poor choice for an internal format, since it can't represent the entire unicode range (at least not without giving up random access, which is the only thing going for it).  Also, the nice thing about utf-8 is that it provides a bridge between unicode and conventional string types - string(u), for ucs u, is a more-or-less instant operation.

    Could one not simply write a ucs string to utf-16 conversion procedure?  There are some conversion procs in the file lib/main/text.icn which may show the correct outline to follow - they have to be done quite carefully in order to be efficient for long strings.

    Kind regards
    R

     
  • Anonymous

    Anonymous - 2014-04-03

    Originally posted by: grshiplett

    Hi,

    thanks for the suggestion ... I'll take a look.

    I use a few languages that do use utf-16, but I don't know what the future
    will bring - one of of them accepts a very wide variety of source encodings
    ( some programmers in India were not happy to use UTF-32 for their local
    languages and I recall their enthusiasm for utf-8 )

     

Log in to post a comment.