From: John B. <joh...@ho...> - 2013-02-12 18:31:24
|
On Tue, 12 Feb 2013 14:07:22 -0200 , Renato Silva wrote > 2013/2/12 John Brown > Which codepage does an MSYS terminal use? > > I have a file output.dat which seems to be mostly text with 0x0A line > endings, but there are a few bytes that are out of the 7-bit ASCII > range. As a result, the file looks different depending on the tool > used to view it. [examples of different output snipped] > > I believe that the MSYS output is closest to what the creators > of the file had in mind. I would like to know what `cat' and/or > MSYS did to produce that output. > > I don't think knowing what encoding is used by the "MSYS terminal" will > help with your problem. You need to rather find out if the original > file is really supposed to be read as text, and if so, what encoding > was used to generate it. I did not give the full story. The line that I showed was a single line from the file, which represents the total on an invoice. Actually, I understand the data well enough to do what I need to do. In the MSYS example %TOTAL O¸'.23w if we consider the two bytes to the left of the the decimal point, the first byte (the one that looks like a speck of dust on your monitor) means the digit 8, and the next byte (looks like a single quote) means that the character is repeated 3 times. Therefore the TOTAL is 888.23. I am just curious about what they were thinking when they implemented this unnecessary shorthand, and it *is* unnecessary because if the TOTAL is 12345.67, they write it out as 12,345.67 with the comma to separate thousands. Clearly there is no shortage of space. So it is my opinion that: 1) The original file is supposed to be read as text, and 2) The encoding that was used to generate the file is similar to the one that is in effect when I run `cat' on the file in a MSYS window. I know the numeric values of those bytes, and one way or another that knowledge should be enough. I am just curious about why they chose those bytes for that purpose. For example, I see a speck, but maybe they see a symbol that looks like an 8 e.g. the infinity symbol. Also, now that I know that `$ cat <file>' can give me different results from `C:\> type <file>' I am curious about that too. I could ask the vendor about their file, but they probably would not tell me. Regards, John Brown. |