Also, I'm not sure whether char's comparison methods respect the locale. I suspect we might have to do a
conversion to string and use a Collator for that too :(
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
also, let's not forget the various character class tests, i.e:
char-alphabetic?
char-numeric?
char-whitespace?
char-upper-case?
char-lower-case?
Using collators for string comparison and conversion will be a significant performance hit in sisc since strings
are represented as arrays and need to be converted to/from strings to carry out these operations.
Perhaps siscs string representation should be changed to String? This should speed up all string operations,
except string-set! which needs to convert the String to a different representation and back.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm.. It seems to me that there will be a problem if a
separate datatype or a separate set of functions is used for
unicode-aware operations. Because there could be a serious
confusion for those who try to use the ordinary string /
character - related functions expecting that they would work
with unicode.
Also, converting from ordinary strings to unicode aware
strings and back would be a performance hit too.
I propose to discuss the making ordinary strings unicode
aware a little further, to make sure if there is really no
possibility to do it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Interesting, how long would it take for the committee the to
finalize the R6RS..? Currently I see only a proposal on
Unicode support, which is almost 1 year old:
----------------------------------------------
Unicode support
---------------
We have written up a proposal for Unicode support that
defines the
notion of "char" to be a Unicode scalar value---strings are
simply
vectors of these scalar values. This allows Unicode support
to be
largely a conservative extension of the character and string
processing
in R5RS, and avoids the API problems inherent in using a
UTF-16-based
representation. Moreover, this approach has already been
successfully
implemented by several Scheme implementations.
Along with Unicode support, we are also considering
extensions to the
character and string literal syntax. Details are still under
discussion.
----------------------------------------------
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=110070
same goes for upcase/downcase.
Also, I'm not sure whether char's comparison methods respect the locale. I suspect we might have to do a
conversion to string and use a Collator for that too :(
Logged In: YES
user_id=110070
also, let's not forget the various character class tests, i.e:
char-alphabetic?
char-numeric?
char-whitespace?
char-upper-case?
char-lower-case?
Using collators for string comparison and conversion will be a significant performance hit in sisc since strings
are represented as arrays and need to be converted to/from strings to carry out these operations.
Perhaps siscs string representation should be changed to String? This should speed up all string operations,
except string-set! which needs to convert the String to a different representation and back.
Logged In: YES
user_id=110070
The consensus seems to be that making the existing
string/char functions unicode-compliant is just not going to
work.
A separate set of functions or even a different data type
are better solutions.
Logged In: YES
user_id=1416496
Hm.. It seems to me that there will be a problem if a
separate datatype or a separate set of functions is used for
unicode-aware operations. Because there could be a serious
confusion for those who try to use the ordinary string /
character - related functions expecting that they would work
with unicode.
Also, converting from ordinary strings to unicode aware
strings and back would be a performance hit too.
I propose to discuss the making ordinary strings unicode
aware a little further, to make sure if there is really no
possibility to do it.
Logged In: YES
user_id=110070
We are not going to do anything until the R6RS committee have made up their minds on what to do about Unicode.
Logged In: NO
That is reasonable.
Interesting, how long would it take for the committee the to
finalize the R6RS..? Currently I see only a proposal on
Unicode support, which is almost 1 year old:
----------------------------------------------
Unicode support
---------------
We have written up a proposal for Unicode support that
defines the
notion of "char" to be a Unicode scalar value---strings are
simply
vectors of these scalar values. This allows Unicode support
to be
largely a conservative extension of the character and string
processing
in R5RS, and avoids the API problems inherent in using a
UTF-16-based
representation. Moreover, this approach has already been
successfully
implemented by several Scheme implementations.
Along with Unicode support, we are also considering
extensions to the
character and string literal syntax. Details are still under
discussion.
----------------------------------------------