Menu

#6 unicode support in char and string procedures

open
nobody
None
4
2005-01-16
2001-12-16
No

Rather than numeric comparison, Character's methods for
comparison should be used, and Collator's for string
comparison

Discussion

  • Matthias Radestock

    Logged In: YES
    user_id=110070

    same goes for upcase/downcase.

    Also, I'm not sure whether char's comparison methods respect the locale. I suspect we might have to do a
    conversion to string and use a Collator for that too :(

     
  • Matthias Radestock

    Logged In: YES
    user_id=110070

    also, let's not forget the various character class tests, i.e:
    char-alphabetic?
    char-numeric?
    char-whitespace?
    char-upper-case?
    char-lower-case?

    Using collators for string comparison and conversion will be a significant performance hit in sisc since strings
    are represented as arrays and need to be converted to/from strings to carry out these operations.

    Perhaps siscs string representation should be changed to String? This should speed up all string operations,
    except string-set! which needs to convert the String to a different representation and back.

     
  • Matthias Radestock

    Logged In: YES
    user_id=110070

    The consensus seems to be that making the existing
    string/char functions unicode-compliant is just not going to
    work.

    A separate set of functions or even a different data type
    are better solutions.

     
  • Matthias Radestock

    • priority: 5 --> 4
    • summary: Comparison not locale sensitive --> unicode support
     
  • Matthias Radestock

    • summary: unicode support --> unicode support in char and string procedures
     
  • Denys Rtveliashvili

    Logged In: YES
    user_id=1416496

    Hm.. It seems to me that there will be a problem if a
    separate datatype or a separate set of functions is used for
    unicode-aware operations. Because there could be a serious
    confusion for those who try to use the ordinary string /
    character - related functions expecting that they would work
    with unicode.
    Also, converting from ordinary strings to unicode aware
    strings and back would be a performance hit too.

    I propose to discuss the making ordinary strings unicode
    aware a little further, to make sure if there is really no
    possibility to do it.

     
  • Matthias Radestock

    Logged In: YES
    user_id=110070

    We are not going to do anything until the R6RS committee have made up their minds on what to do about Unicode.

     
  • Nobody/Anonymous

    Logged In: NO

    That is reasonable.

    Interesting, how long would it take for the committee the to
    finalize the R6RS..? Currently I see only a proposal on
    Unicode support, which is almost 1 year old:

    ----------------------------------------------

    Unicode support
    ---------------

    We have written up a proposal for Unicode support that
    defines the
    notion of "char" to be a Unicode scalar value---strings are
    simply
    vectors of these scalar values. This allows Unicode support
    to be
    largely a conservative extension of the character and string
    processing
    in R5RS, and avoids the API problems inherent in using a
    UTF-16-based
    representation. Moreover, this approach has already been
    successfully
    implemented by several Scheme implementations.

    Along with Unicode support, we are also considering
    extensions to the
    character and string literal syntax. Details are still under
    discussion.

    ----------------------------------------------

     

Log in to post a comment.

MongoDB Logo MongoDB