Re: [Adobe-source-devel] Begin line ending handling

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

New hypothesis,

After looking at the code there is the following line in =20
express_viewer.hpp
---
		adobe::replace( filtered, '\r', '\n' );
---

My guess is that this is there to handle \r to \n conversion on the =20
Mac - but this is also getting applied on Windows.

The result is that the file which was \r\n windows line endings now =20
has \n\n line endings which are then getting converted by the windows =20=

stream code to \r\n\r\n which is what I'm seeing in the final file -

Since I'm not clear on what the requirements are for the intermediate =20=

formats I'm not going to attempt a quick fix. I'll let Foster noodle =20
it around when he's back from vacation :-)

Sean

On Dec 29, 2005, at 1:26 PM, Thomas Witt wrote:

>
> Sean,
>
> Sean Parent wrote:
>> On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote:
>>> Sean,
>>>
>>> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote:
>>>
>
>> The Begin  app has to convert to platform line endings at some =20
>> point because the  OS controls deal with it that way.
>
> I think you are wrong here. I am pretty sure Windows deals in '\n' =20
> only. Here is an excerpt from the MSDN docs regarding fopen.
>
> ...
>
> t
>
> Open in text (translated) mode. In this mode, CTRL+Z is interpreted =20=

> as an end-of-file character on input. In files opened for reading/=20
> writing with "a+", fopen checks for a CTRL+Z at the end of the file =20=

> and removes it, if possible. This is done because using fseek and =20
> ftell to move within a file that ends with a CTRL+Z, may cause =20
> fseek to behave improperly near the end of the file.
>
> Also, in text mode, carriage return=96linefeed combinations are =20
> translated into single linefeeds on input, and linefeed characters =20
> are translated to carriage return=96linefeed combinations on output. =20=

> When a Unicode stream-I/O function operates in text mode (the =20
> default), the source or destination stream is assumed to be a =20
> sequence of multibyte characters. Therefore, the Unicode stream-=20
> input functions convert multibyte characters to wide characters (as =20=

> if by a call to the mbtowc function). For the same reason, the =20
> Unicode stream-output functions convert wide characters to =20
> multibyte characters (as if by a call to the wctomb function).
>
> b
>
> Open in binary (untranslated) mode; translations involving carriage-=20=

> return and linefeed characters are suppressed.
>
> If t or b is not given in mode, the default translation mode is =20
> defined by the global variable _fmode. If t or b is prefixed to the =20=

> argument, the function fails and returns NULL.
>
> For more information about using text and binary modes in Unicode =20
> and multibyte stream-I/O, see Text and Binary Mode File I/O and =20
> Unicode Stream I/O in Text and Binary Modes.
>
> ...
>
> with text mode I/O being the default.
>
> Looking at the libstdc++v3 FAQ in Apples developer documentation =20
> you'll see the same thing. I.e. text mode I/O does convert native =20
> line endings to '\n'
>
> I am aware of the fact that this is no proof that the controls or =20
> any other part of the API will deal in '\n' only but it's at least =20
> a strong indication. Having the default file I/O mode being =20
> incompatible with the rest of the API is hard to imagine even for =20
> seasoned Windows programmer.
>
>>> IIUC this is what every std conforming iostream library does by  =20
>>> default. Well to be precise, it is what it asks the OS to do. =20
>>> I.e.  the OS is responsible for the mapping between in memory  =20
>>> representation and on disk representation of file contents. On  =20
>>> Windows this will mean mapping "\r\n" to "\n". I haven't seen a  =20
>>> plattform where the in memory representation uses anything but a  =20=

>>> single '\n' for line endings. Come to think of this does the C  =20
>>> standard actually specify this?
>> There is no such requirement for C to map line endings (or C++) =20
>> in  the standard.
>
> AFAICS the standard (either one) is mostly silent on what text mode =20=

> and binary mode means. I wonder if that should be considered a defect.
>
> My statement seems to correspond to existing practice though.
>
> > We recently reviewed the definition of a line ending in  the
>> standard committee because GCC treats treats the following as a  =20
>> single comment line where [sp] is any sequence of white space  =20
>> characters:
>> // some comment \[sp]
>> a =3D b;
>> The \ character quotes an "end of line" which gcc defines to be =20
>> any  white space followed by \n. I don't recall what the Windows =20
>> iostream  library actually does here - I think they convert the =20
>> other way on  the way in (\n -> \r\n) but I'd have to go look again.
>
> I have to reread the thread but I think this is actually a =20
> different issue. It involves the interpretation of source files by =20
> the compiler and the compiler is free to use any I/O scheme to read =20=

> them.
>
>
>>>
>>> Given this there seem to be two strategies to deal with this.
>>>
>>> a) Require every file to be in the plattforms native encoding =20
>>> and  let the OS do the work. The program logic will always assume =20=

>>> line  endings to be '\n'.
>> This isn't workable because as noted above the "in memory"  =20
>> representation at some point will have to be for the platform (\r=20
>> \n  on Windows) for the OS controls to function properly.
>
> As said above I think you are wrong here.
>
>> The on disk  representation can be anything because files get =20
>> moved from platform  to platform.
>
> Agreed if you want to tolerate this a) is not workable.
>
>>>
>>> b) Try to detect the line ending encoding on input and do the  =20
>>> mapping "native-encoding" -> '\n' yourself on input (Experience  =20
>>> shows that this is complicated and a frequent source of bugs). =20
>>> The  rest of the program logic will still assume line endings to =20
>>> be  '\n'. When writing the file you can either write to the =20
>>> original  encoding, that is only if you memorized that, or to the =20=

>>> native  encoding.
>> In general I'd rather simply be agnostic as much as possible and  =20
>> convert only when necessary
>
> Lets say you modify the file contents and then write them to disk =20
> what line endings do you use in your modifications?
>
>> I can see three valid behaviors for the begin application - first =20
>> it  should properly read _any_ line endings then it should:
>
> That is a reasonable requirement. As a user I would not ask for =20
> that, but it is nice to have.
>
>> 1. Always write platform line endings
>> 2. Always write Unix line endings as a canonical form (probably a =20
>> bad  thing because many Windows apps can't read this correctly)
>> 3. Always preserve line endings styles of the file
>> So I'd vote for 1 with the caveat that  we define \n to be the =20
>> "platform line endings" for Mac instead of the  classic Mac OS \r.
>
> Agreed.
>
>> Can you send me a file where the line endings have been "doubled" =20
>> -  I'd like to see what they actually are (please ZIP the file or =20
>> the  email system will attempt to convert line endings...). The =20
>> Adobe e- mail server is picky about letting in zip files so change =20=

>> the  extension to .zap or send it to sea...@ma....
>
> Will do.
>
> Thomas
>
> --=20
> Thomas Witt
> wi...@ac...