Re: [Adobe-source-devel] Begin line ending handling

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote:

> Sean,
>
> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote:
>
>> The parsers are all line ending agnostic. For the Begin app - I'd =20
>> assume (I haven't looked at that part of the code) it converts =20
>> into platform line endings on the way in - and probably leaves =20
>> them that way.
>
> Hmm=85 to me this seems like a complicated and brittle way to deal =20
> with this. Let me explain=85

The parsers being line ending agnostic (I happen to think all code =20
dealing with text files should be) - is the correct thing. The Begin =20
app has to convert to platform line endings at some point because the =20=

OS controls deal with it that way.
>
>> On Windows this would convert /n to /r/n - which may be what you =20
>> are seeing?
>
>> I _really_ dislike any iostream libraries which try to auto-=20
>> convert line endings
>
> IIUC this is what every std conforming iostream library does by =20
> default. Well to be precise, it is what it asks the OS to do. I.e. =20
> the OS is responsible for the mapping between in memory =20
> representation and on disk representation of file contents. On =20
> Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20
> plattform where the in memory representation uses anything but a =20
> single '\n' for line endings. Come to think of this does the C =20
> standard actually specify this?

There is no such requirement for C to map line endings (or C++) in =20
the standard. We recently reviewed the definition of a line ending in =20=

the standard committee because GCC treats treats the following as a =20
single comment line where [sp] is any sequence of white space =20
characters:

// some comment \[sp]
a =3D b;

The \ character quotes an "end of line" which gcc defines to be any =20
white space followed by \n. I don't recall what the Windows iostream =20
library actually does here - I think they convert the other way on =20
the way in (\n -> \r\n) but I'd have to go look again.

>
> Given this there seem to be two strategies to deal with this.
>
> a) Require every file to be in the plattforms native encoding and =20
> let the OS do the work. The program logic will always assume line =20
> endings to be '\n'.

This isn't workable because as noted above the "in memory" =20
representation at some point will have to be for the platform (\r\n =20
on Windows) for the OS controls to function properly. The on disk =20
representation can be anything because files get moved from platform =20
to platform.

>
> b) Try to detect the line ending encoding on input and do the =20
> mapping "native-encoding" -> '\n' yourself on input (Experience =20
> shows that this is complicated and a frequent source of bugs). The =20
> rest of the program logic will still assume line endings to be =20
> '\n'. When writing the file you can either write to the original =20
> encoding, that is only if you memorized that, or to the native =20
> encoding.

In general I'd rather simply be agnostic as much as possible and =20
convert only when necessary - if our current logic to convert on =20
reading for Windows is correct (it may not be, I have not review it) =20
and what you are seeing is simply \n getting converted to \r\n then I =20=

don't have any issues - if you are actually seeing extra blank lines =20
in the Begin app then we have something wrong.

I can see three valid behaviors for the begin application - first it =20
should properly read _any_ line endings then it should:

1. Always write platform line endings
2. Always write Unix line endings as a canonical form (probably a bad =20=

thing because many Windows apps can't read this correctly)
3. Always preserve line endings styles of the file

On the Mac I tend to prefer answer 2 or 3 because the tool set is =20
agnostic - but if you consider the Unix side now to be the "native" =20
representation on the Mac then 2 and 1 become the same. On Windows I =20
prefer 1 because this allows you to use other tools on the platform =20
with files saved from begin. So I'd vote for 1 with the caveat that =20
we define \n to be the "platform line endings" for Mac instead of the =20=

classic Mac OS \r.

Sean

>
> In my experience b) is rarely worth the added complexity, but this =20
> pretty much depends on your use case. I've used frequently used a) =20
> in cross plattform projects without problems even with users =20
> working on two plattforms simultaneously. To get back to my first =20
> sentence. To me the key part of both solutions is to only use the =20
> "native" C++ encoding '\n' internally as it simplifies the code and =20=

> ensures interoperability with third party code like widget libraries.
>
> What do you think?
>
>> - Often these will just do things like convert /r to /n... which =20
>> would double your line endings. We'll look into it.
>>
>> I did fix some code related to line endings awhile ago - are you =20
>> running the 1.0.11 release?
>
> Unless I am missing something it's CVS Head.

Can you send me a file where the line endings have been "doubled" - =20
I'd like to see what they actually are (please ZIP the file or the =20
email system will attempt to convert line endings...). The Adobe e-=20
mail server is picky about letting in zip files so change the =20
extension to .zap or send it to sea...@ma....

Sean

>
> Regards
>
> Thomas
>
> Thomas Witt
> wi...@ac...
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through =20
> log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD =20
> SPLUNK!
> http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dclick
> _______________________________________________
> Adobe-source-devel mailing list
> Ado...@li...
> https://lists.sourceforge.net/lists/listinfo/adobe-source-devel