Is it possible to have access to the sources of your version with Unicode ?

In sandbow/jlf/_readme.odt, there is a chapter "Experience report" where I describe how we migrated an application to wide-char Unicode.
We kept the possibility to build a byte-char and wide-char version from the same sources.
If we keep this possibility for ooRexx, maybe that could let deliver a byte-char version totally compatible, while leaving an open door for necessary evolutions in the wide-char version ?


2010/8/30 Rick McGuire <>
On Mon, Aug 30, 2010 at 4:51 PM, Jean-Louis Faucher
> Hi Rick, Mark
>> >> I've looked at the past at adding better character set support, and my
>> >> personal preference would be to make the interpreter do everything
>> >> internally using unicode so that it is not necessary to convert
>> >> strings to a common format to perform language operations.
> ok, so my assumption was wrong :-)
> I thought that Unicode as unique format was not good for ooRexx because of
> the possibility of platforms where it's not the default. And because of the
> potential risk of breaking legacy applications. But why not, after all it's
> a matter of compromise.
>>  ooRexx
>> >> performance is very much tied to string object performance, so changes
>> >> here should be handled with care.
> Unicode being the unique internal charset, I assume you want also a unique
> encoding. Do you have a preference ?
> From a performance point of view, utf-8 is not a good candidate. But this is
> the encoding that will not break the C/C++ api, which is natively supported
> under Unix, and used internally by Gtk.
> utf-16 or utf-32 are the most efficient, are identical in term of impact on
> the C/C++ api. Python supports both, it's a decision at compile-time. ICU
> uses utf-16 internally.

By unicode used internally, I really was meaning utf-16 would be used
as the internal encoding for everything.  All of the portable code
will use utf-16 strings for everything, but there will need to be
conversions to native encodings required at all of the places where
8-bit encodings are used today and what strategy should be used when
the utf-16 cannot be mapped down to an 8-bit version.  Unfortunately,
as we learned with 4.0, people scream bloody murder at the slightest
hint of an incompatibility.


>> >
>> > I'd like to see Unicode support added, and having the interpreter do
>> > everything internally sounds best.
>> >
>> > Unfortunately, I don't think I have the skill set to design a good
>> > implementation.  I was hoping Rick would add to the discussion,
>> > because I know he could design a clean implementation.  Both the
>> > transition to 64-bit and the C++ native  APIs have demonstrated that.
>> >
>> > So, I'd like to see Unicode support added, it would be great if it was
>> > something that could interest Rick.  I'd like to see it done with his
>> > guidance.
> Totally agree.
>> Frankly, if I wasn't stuck on the language compatibility issues, this
>> would have been done a long time ago!  Shortly after ooRexx was open
>> sourced I even started working on a version that was unicode
>> internally using ICU to help with the implementation.  I shortly ran
>> into a lot of language incompatibility issues that I still have no
>> solution for.  At that year's Rexx Symposium, almost every single
>> speaker (except me :-) ) showed some sample code that was at risk of
>> breaking once everything was redefined as unicode.
>> My initial thought was to follow the lead blazed by NetRexx, but Mike
>> pretty much punted on the issues as well.  The I/O model is still
>> largely the Java one, which has a lot of support in it to deal with
>> encoding issues.  Things like c2x(), etc. only work on single
>> characters rather than strings, which basically punts on that issue.
>> And the language literal strings were defined from the outset with
>> unicode in mind, so that compatibility issue didn't exist either.  And
>> there are no native APIs in netrexx, so that was no help.
> This experiment is above all a way for me put my hands in the internals,
> with a concrete goal in mind. So i'm ready to follow the direction that will
> be defined by the team. In the meantime, I will probably continue to play
> with m17n in my sandbox, unless we can start right now to work on Unicode
> :-)
> Jean-Louis
> ------------------------------------------------------------------------------
> This Dev2Dev email is sponsored by:
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> _______________________________________________
> Oorexx-devel mailing list

This Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
Oorexx-devel mailing list