From: Sean P. <sp...@ad...> - 2005-12-29 23:15:34
|
New hypothesis, After looking at the code there is the following line in =20 express_viewer.hpp --- adobe::replace( filtered, '\r', '\n' ); --- My guess is that this is there to handle \r to \n conversion on the =20 Mac - but this is also getting applied on Windows. The result is that the file which was \r\n windows line endings now =20 has \n\n line endings which are then getting converted by the windows =20= stream code to \r\n\r\n which is what I'm seeing in the final file - Since I'm not clear on what the requirements are for the intermediate =20= formats I'm not going to attempt a quick fix. I'll let Foster noodle =20 it around when he's back from vacation :-) Sean On Dec 29, 2005, at 1:26 PM, Thomas Witt wrote: > > Sean, > > Sean Parent wrote: >> On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: >>> Sean, >>> >>> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: >>> > >> The Begin app has to convert to platform line endings at some =20 >> point because the OS controls deal with it that way. > > I think you are wrong here. I am pretty sure Windows deals in '\n' =20 > only. Here is an excerpt from the MSDN docs regarding fopen. > > ... > > t > > Open in text (translated) mode. In this mode, CTRL+Z is interpreted =20= > as an end-of-file character on input. In files opened for reading/=20 > writing with "a+", fopen checks for a CTRL+Z at the end of the file =20= > and removes it, if possible. This is done because using fseek and =20 > ftell to move within a file that ends with a CTRL+Z, may cause =20 > fseek to behave improperly near the end of the file. > > Also, in text mode, carriage return=96linefeed combinations are =20 > translated into single linefeeds on input, and linefeed characters =20 > are translated to carriage return=96linefeed combinations on output. =20= > When a Unicode stream-I/O function operates in text mode (the =20 > default), the source or destination stream is assumed to be a =20 > sequence of multibyte characters. Therefore, the Unicode stream-=20 > input functions convert multibyte characters to wide characters (as =20= > if by a call to the mbtowc function). For the same reason, the =20 > Unicode stream-output functions convert wide characters to =20 > multibyte characters (as if by a call to the wctomb function). > > b > > Open in binary (untranslated) mode; translations involving carriage-=20= > return and linefeed characters are suppressed. > > If t or b is not given in mode, the default translation mode is =20 > defined by the global variable _fmode. If t or b is prefixed to the =20= > argument, the function fails and returns NULL. > > For more information about using text and binary modes in Unicode =20 > and multibyte stream-I/O, see Text and Binary Mode File I/O and =20 > Unicode Stream I/O in Text and Binary Modes. > > ... > > with text mode I/O being the default. > > Looking at the libstdc++v3 FAQ in Apples developer documentation =20 > you'll see the same thing. I.e. text mode I/O does convert native =20 > line endings to '\n' > > I am aware of the fact that this is no proof that the controls or =20 > any other part of the API will deal in '\n' only but it's at least =20 > a strong indication. Having the default file I/O mode being =20 > incompatible with the rest of the API is hard to imagine even for =20 > seasoned Windows programmer. > >>> IIUC this is what every std conforming iostream library does by =20 >>> default. Well to be precise, it is what it asks the OS to do. =20 >>> I.e. the OS is responsible for the mapping between in memory =20 >>> representation and on disk representation of file contents. On =20 >>> Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 >>> plattform where the in memory representation uses anything but a =20= >>> single '\n' for line endings. Come to think of this does the C =20 >>> standard actually specify this? >> There is no such requirement for C to map line endings (or C++) =20 >> in the standard. > > AFAICS the standard (either one) is mostly silent on what text mode =20= > and binary mode means. I wonder if that should be considered a defect. > > My statement seems to correspond to existing practice though. > > > We recently reviewed the definition of a line ending in the >> standard committee because GCC treats treats the following as a =20 >> single comment line where [sp] is any sequence of white space =20 >> characters: >> // some comment \[sp] >> a =3D b; >> The \ character quotes an "end of line" which gcc defines to be =20 >> any white space followed by \n. I don't recall what the Windows =20 >> iostream library actually does here - I think they convert the =20 >> other way on the way in (\n -> \r\n) but I'd have to go look again. > > I have to reread the thread but I think this is actually a =20 > different issue. It involves the interpretation of source files by =20 > the compiler and the compiler is free to use any I/O scheme to read =20= > them. > > >>> >>> Given this there seem to be two strategies to deal with this. >>> >>> a) Require every file to be in the plattforms native encoding =20 >>> and let the OS do the work. The program logic will always assume =20= >>> line endings to be '\n'. >> This isn't workable because as noted above the "in memory" =20 >> representation at some point will have to be for the platform (\r=20 >> \n on Windows) for the OS controls to function properly. > > As said above I think you are wrong here. > >> The on disk representation can be anything because files get =20 >> moved from platform to platform. > > Agreed if you want to tolerate this a) is not workable. > >>> >>> b) Try to detect the line ending encoding on input and do the =20 >>> mapping "native-encoding" -> '\n' yourself on input (Experience =20 >>> shows that this is complicated and a frequent source of bugs). =20 >>> The rest of the program logic will still assume line endings to =20 >>> be '\n'. When writing the file you can either write to the =20 >>> original encoding, that is only if you memorized that, or to the =20= >>> native encoding. >> In general I'd rather simply be agnostic as much as possible and =20 >> convert only when necessary > > Lets say you modify the file contents and then write them to disk =20 > what line endings do you use in your modifications? > >> I can see three valid behaviors for the begin application - first =20 >> it should properly read _any_ line endings then it should: > > That is a reasonable requirement. As a user I would not ask for =20 > that, but it is nice to have. > >> 1. Always write platform line endings >> 2. Always write Unix line endings as a canonical form (probably a =20 >> bad thing because many Windows apps can't read this correctly) >> 3. Always preserve line endings styles of the file >> So I'd vote for 1 with the caveat that we define \n to be the =20 >> "platform line endings" for Mac instead of the classic Mac OS \r. > > Agreed. > >> Can you send me a file where the line endings have been "doubled" =20 >> - I'd like to see what they actually are (please ZIP the file or =20 >> the email system will attempt to convert line endings...). The =20 >> Adobe e- mail server is picky about letting in zip files so change =20= >> the extension to .zap or send it to sea...@ma.... > > Will do. > > Thomas > > --=20 > Thomas Witt > wi...@ac... |