Re: [Mingw-users] MSYS code page

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Op 14-2-2013 22:55, John Brown schreef:
>> 2013/2/13 John Brown
>> I did not fully understand your test cases, but I do not see where your
>> tests show that MSYS does not convert characters.
>>
>>   Renato Silva
>> Hopefully you did not understand because I didn't either @@. But I have
>> corrected it and I hope you do now.
> I was wondering if something was wrong with me. I am glad to see that I
> am OK.
>
>> John, as for your file format, ok I understand it better
>> now, but it's still just crazy to me. I can't see much sense in their
>> triple-digit thing, for example.
> Neither do I, but I am sure of their method. I observed the pattern
> when I calculated the total that I could not see based on other
> numbers that were always clear.
>
> As for the permille, it does not seem to have any numerical value.
> It may be there just for decoration. In any case, other than the
> triple-digits, the portions of the file that I need can be easily
> extracted using regular expressions. I can search-and-replace the
> triple-digits, again using regular expressions.
>
>> Actually, iconv gives an error instead, due to the permille.
> ...
>>   
>>   >iconv -f latin1 -t cp850 original_bytes.txt
>> iconv: test.txt:1:1: cannot convert
>>   
> I got that too. I did not bother to report it because I was satisfied
> - I managed to make MSYS display the same output as Notepad, thanks to
> Erwin Waterlander. I was also tired.
>
> And thanks for the tip about ls --show-control-chars.
>

Hi,

Remember that msys is derived from a very old Cygwin 1.3. Cygwin only 
started to support locales (and Unicode) properly since version 1.7.

Msys 2.0 will be based on Cygwin 1.7. Only then we can be freed of the 
code page annoyance.

The OEM code pages are an annoyance to all non-Unicode Windows command 
line programs. English speaking people don't notice it so much, because 
the English language doesn't use much diacritical marks like accents or 
umlauts. Usually ASCII is sufficient for English, with a few exceptions. 
Like naïve or passé.

I don't understand why Microsoft didn't make the default OEM code page 
by default equal to the ANSI code page long time ago. A good moment in 
time was Vista, which was also available as 64 bit to the public. OEM 
code pages are for backwards compatibility with real DOS programs. How 
many people still run DOS command line programs on Windows? I think the 
majority of the command line programs are Windows programs by now. I 
think they should have switched the default code page in cmd.exe and 
PowerShell to ANSI and the few people who run DOS programs can switch to 
CP850 or whatever.

What's the point of PowerShell to be backwards compatible with DOS 
programs wrt code page?
And what really surprises me, is that even on 64 bit Windows the DOS OEM 
code pages are default, while it is not even possible to run a DOS 
program in cmd.exe on 64 bit (because NTVDM has been removed).

Microsoft's advice is to write Unicode programs, and to use the Windows 
API for that (try WriteConsoleW). A Windows Unicode command line program 
will produce consistent output, independent of the active code page. 
Then the only limitation is the font.

But programs ported from Unix typically don't use the Windows API. And 
that is why Cygwin 1.7 had to build an UTF-8 layer that translates to 
and from Windows internal UTF-16 format.

So what can you do now?
For yourself, if you don't run real DOS programs, it's fine to switch 
the default OEMCP permanently to 1252.
If you distribute software for Windows, there is no escape to use 
Microsoft's Unicode functions if you want to get out of the code page 
trouble.

regards,

-- 
Erwin Waterlander
http://waterlan.home.xs4all.nl/

Re: [Mingw-users] MSYS code page

A native Windows port of the GNU Compiler Collection (GCC)

Re: [Mingw-users] MSYS code page