Thread: [Squeak-VMdev] Proposal: VM string encoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Folks,

I discussed this with Michael a bit and we came up with this proposal:

	* Add a new system attribute 1006 returning a string describing =
the=20
expected VM string encoding (http://minnow.cc.gatech.edu/squeak/314).
	* Values are "UTF-8", "macintosh", "ISO-8859-1", etc. (exact =
spelling=20
as in http://www.iana.org/assignments/character-sets).
	* If attribute 1006 is not supported, assume "macintosh".

Using an attribute instead of a primitive seems simpler than a new=20
primitive.

I'm not sure but I think it's possible to have filesystem-specific file=20=

name encodings under Unix. If neccessary, we should add a primitive to=20=

FilePlugin to answer the encoding for a particular directory, using the=20=

same values as described above. Anyway, we probably can leave that to=20
later.

- Bert -

Am 27.09.2004 um 02:54 schrieb Andreas Raab:

> John,
>
> We talked about this in the past - we need to do something to figure=20=

> out what the primitives expect in their stringy interfaces (file=20
> names, clipboard, etc). I'm still in favour of having a primitive=20
> which answers the VM's expected encoding and defaults to MacRoman=20
> (which is indeed what I think most VMs actually use). After which we=20=

> can start playing with using UTF8 or Latin1 or whatever else (I could=20=

> easily imagine that an Eastern European VM uses a different encoding=20=

> than a Far East VM).
>
> Cheers,
>  - Andreas
>
> PS. I bought a Really, Really Good (tm) coffee maker today. It's just=20=

> unbelievable how good coffee can be (heh, heh).
>
> ----- Original Message ----- From: "John M McIntosh"=20
> <jo...@sm...>
> To: "The general-purpose Squeak developers list"=20
> <squ...@li...>
> Sent: Sunday, September 26, 2004 3:43 PM
> Subject: Re: umlaute in squeak?
>
>
> I think for this we need cut/copy/paste primitives that understand
> unicode
> Yell louder, I'm sure it exists in the OS api, just no-one has looked
> at it yet...
> This would imply that m17n would need to handle things. I'd think one
> could change the methods to
> figure out if the VM supports unicode cut/copy/paste and do the right
> thing...
>
> Perhaps one could even be convinced to allow for other types of data =
on
> the clipboard (pictures?)
>
> On Sep 26, 2004, at 11:18 AM, Bert Freudenberg wrote:
>
>> Yep, there are still some open ends in m17n, mostly VM related. For=20=

>> example, cut and paste from external sources shredders umlauts=20
>> (tested  on Win and Mac), and file names in the file list do not look=20=

>> right (although, on the Mac at least they can still be accessed).
>>
>> - Bert -
>>
>> Am 26.09.2004 um 19:26 schrieb Martin Kuball:
>>
>>> Hi!
>>>
>>> After some digging in the source code I found the problem. I'm using=20=

>>> a
>>> utf8 locale and that produces 2 byte characters for the special
>>> german characters. But the vm uses only the 1st byte. This explains
>>> why I always see the same character for different umlaut characters.
>>> They always have the same 1st byte and differ only in the 2nd byte.
>>>
>>> I will try to work out a solution (other than changing the locale,
>>> because I think it should work out of the box in as many =
environments
>>> as possible)
>>>
>>> Martin
>>>
>>>
>>> Am Monday 20 September 2004 11:12 schrieb danil a. osipchuk:
>>>> Hi, Martin
>>>>
>>>> It seems that you are using unix vm. I've solved the issue by
>>>> editing sqUnixX11.c and setting there:
>>>> static x2sqKey_t x2sqKey=3D x2sqKeyInput;
>>>> (it's x2sqKeyPlain by default in sources on Ian site). I also have
>>>> built some fonts from TTF  (russian in my case).
>>>> After rebuilding vm I've got squeak with russian fonts.
>>>> I hope that things will be less complicated when m17n project will
>>>> be included in core Squeak. Also, there are a plenty of German
>>>> squeakers here - may be they will point the shortest path.
>>>>  Danil
>>>>
>>>>> Am Saturday 18 September 2004 23:13 schrieb Bernhard Pieber:
>>>>>> Martin Kuball <Mar...@we...> wrote:
>>>>>>> Is it possible to enter non 7bit characters like german umlaute
>>>>>>> into squeak text fields? When I type one of these (s=DDS...) I
>>>>>>> only get an A with a ~ above it.
>>>>>>
>>>>>> What do you mean by squeak text fields? I just tried it in a
>>>>>> workspace in 3.7 and 3.8alpha and there it works. Which version
>>>>>> of Squeak and which font did you use?
>>>>>
>>>>> With text field I mean any morph where you can enter text. I tried
>>>>> with the new 3.7full and the standard font. By the way it has
>>>>> never worked for me. I even tried the Windows version once but it
>>>>> showed the same behaviour.
>>>>>
>>>>> Martin

Thread: [Squeak-VMdev] Proposal: VM string encoding

squeak-vmdev