From: Thomas W. <wi...@ac...> - 2005-12-29 21:27:10
|
Sean, Sean Parent wrote: >=20 > On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: >=20 >> Sean, >> >> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: >> > The Begin =20 > app has to convert to platform line endings at some point because the =20 > OS controls deal with it that way. I think you are wrong here. I am pretty sure Windows deals in '\n' only.=20 Here is an excerpt from the MSDN docs regarding fopen. ... t Open in text (translated) mode. In this mode, CTRL+Z is interpreted as=20 an end-of-file character on input. In files opened for reading/writing=20 with "a+", fopen checks for a CTRL+Z at the end of the file and removes=20 it, if possible. This is done because using fseek and ftell to move=20 within a file that ends with a CTRL+Z, may cause fseek to behave=20 improperly near the end of the file. Also, in text mode, carriage return=96linefeed combinations are translate= d=20 into single linefeeds on input, and linefeed characters are translated=20 to carriage return=96linefeed combinations on output. When a Unicode=20 stream-I/O function operates in text mode (the default), the source or=20 destination stream is assumed to be a sequence of multibyte characters.=20 Therefore, the Unicode stream-input functions convert multibyte=20 characters to wide characters (as if by a call to the mbtowc function).=20 For the same reason, the Unicode stream-output functions convert wide=20 characters to multibyte characters (as if by a call to the wctomb=20 function). b Open in binary (untranslated) mode; translations involving=20 carriage-return and linefeed characters are suppressed. If t or b is not given in mode, the default translation mode is defined=20 by the global variable _fmode. If t or b is prefixed to the argument,=20 the function fails and returns NULL. For more information about using text and binary modes in Unicode and=20 multibyte stream-I/O, see Text and Binary Mode File I/O and Unicode=20 Stream I/O in Text and Binary Modes. ... with text mode I/O being the default. Looking at the libstdc++v3 FAQ in Apples developer documentation you'll=20 see the same thing. I.e. text mode I/O does convert native line endings=20 to '\n' I am aware of the fact that this is no proof that the controls or any=20 other part of the API will deal in '\n' only but it's at least a strong=20 indication. Having the default file I/O mode being incompatible with the=20 rest of the API is hard to imagine even for seasoned Windows programmer. >> IIUC this is what every std conforming iostream library does by =20 >> default. Well to be precise, it is what it asks the OS to do. I.e. =20 >> the OS is responsible for the mapping between in memory =20 >> representation and on disk representation of file contents. On =20 >> Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 >> plattform where the in memory representation uses anything but a =20 >> single '\n' for line endings. Come to think of this does the C =20 >> standard actually specify this? >=20 >=20 > There is no such requirement for C to map line endings (or C++) in the= =20 > standard. AFAICS the standard (either one) is mostly silent on what text mode and=20 binary mode means. I wonder if that should be considered a defect. My statement seems to correspond to existing practice though. > We recently reviewed the definition of a line ending in the > standard committee because GCC treats treats the following as a single= =20 > comment line where [sp] is any sequence of white space characters: >=20 > // some comment \[sp] > a =3D b; >=20 > The \ character quotes an "end of line" which gcc defines to be any =20 > white space followed by \n. I don't recall what the Windows iostream =20 > library actually does here - I think they convert the other way on the= =20 > way in (\n -> \r\n) but I'd have to go look again. I have to reread the thread but I think this is actually a different=20 issue. It involves the interpretation of source files by the compiler=20 and the compiler is free to use any I/O scheme to read them. >=20 >> >> Given this there seem to be two strategies to deal with this. >> >> a) Require every file to be in the plattforms native encoding and let= =20 >> the OS do the work. The program logic will always assume line endings= =20 >> to be '\n'. >=20 >=20 > This isn't workable because as noted above the "in memory" =20 > representation at some point will have to be for the platform (\r\n on= =20 > Windows) for the OS controls to function properly. As said above I think you are wrong here. > The on disk =20 > representation can be anything because files get moved from platform t= o=20 > platform. Agreed if you want to tolerate this a) is not workable. >=20 >> >> b) Try to detect the line ending encoding on input and do the mapping= =20 >> "native-encoding" -> '\n' yourself on input (Experience shows that=20 >> this is complicated and a frequent source of bugs). The rest of the=20 >> program logic will still assume line endings to be '\n'. When writing= =20 >> the file you can either write to the original encoding, that is only=20 >> if you memorized that, or to the native encoding. >=20 >=20 > In general I'd rather simply be agnostic as much as possible and =20 > convert only when necessary=20 Lets say you modify the file contents and then write them to disk what=20 line endings do you use in your modifications? > I can see three valid behaviors for the begin application - first it =20 > should properly read _any_ line endings then it should: That is a reasonable requirement. As a user I would not ask for that,=20 but it is nice to have. >=20 > 1. Always write platform line endings > 2. Always write Unix line endings as a canonical form (probably a bad =20 > thing because many Windows apps can't read this correctly) > 3. Always preserve line endings styles of the file >=20 > So I'd vote for 1 with the caveat that we=20 > define \n to be the "platform line endings" for Mac instead of the =20 > classic Mac OS \r. Agreed. >=20 > Can you send me a file where the line endings have been "doubled" - I'= d=20 > like to see what they actually are (please ZIP the file or the email=20 > system will attempt to convert line endings...). The Adobe e- mail=20 > server is picky about letting in zip files so change the extension to=20 > .zap or send it to sea...@ma.... Will do. Thomas --=20 Thomas Witt wi...@ac... |