From: Thomas W. <wi...@ac...> - 2005-12-29 03:03:46
|
Hi, what's the rationale for Begin to deal with the different plattform specific line endings? In my understanding iostreams handle this transparently by default. Whatever the rationale is the current windows logic seems broken as it doubles the newlines on save. Regards Thomas -- Thomas Witt wi...@ac... |
From: Sean P. <sea...@ma...> - 2005-12-29 06:44:04
|
The parsers are all line ending agnostic. For the Begin app - I'd assume (I haven't looked at that part of the code) it converts into platform line endings on the way in - and probably leaves them that way. On Windows this would convert /n to /r/n - which may be what you are seeing? I _really_ dislike any iostream libraries which try to auto-convert line endings - Often these will just do things like convert /r to /n... which would double your line endings. We'll look into it. I did fix some code related to line endings awhile ago - are you running the 1.0.11 release? Sean On Dec 28, 2005, at 7:03 PM, Thomas Witt wrote: > > Hi, > > what's the rationale for Begin to deal with the different plattform > specific line endings? In my understanding iostreams handle this > transparently by default. Whatever the rationale is the current > windows logic seems broken as it doubles the newlines on save. > > Regards > > Thomas > > -- > Thomas Witt > wi...@ac... > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through > log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD > SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > Adobe-source-devel mailing list > Ado...@li... > https://lists.sourceforge.net/lists/listinfo/adobe-source-devel |
From: Thomas W. <wi...@ac...> - 2005-12-29 17:15:11
|
Sean, On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: > The parsers are all line ending agnostic. For the Begin app - I'd =20 > assume (I haven't looked at that part of the code) it converts into =20= > platform line endings on the way in - and probably leaves them that =20= > way. Hmm=85 to me this seems like a complicated and brittle way to deal with =20= this. Let me explain=85 > On Windows this would convert /n to /r/n - which may be what you =20 > are seeing? > I _really_ dislike any iostream libraries which try to auto-convert =20= > line endings IIUC this is what every std conforming iostream library does by =20 default. Well to be precise, it is what it asks the OS to do. I.e. =20 the OS is responsible for the mapping between in memory =20 representation and on disk representation of file contents. On =20 Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 plattform where the in memory representation uses anything but a =20 single '\n' for line endings. Come to think of this does the C =20 standard actually specify this? Given this there seem to be two strategies to deal with this. a) Require every file to be in the plattforms native encoding and let =20= the OS do the work. The program logic will always assume line endings =20= to be '\n'. b) Try to detect the line ending encoding on input and do the mapping =20= "native-encoding" -> '\n' yourself on input (Experience shows that =20 this is complicated and a frequent source of bugs). The rest of the =20 program logic will still assume line endings to be '\n'. When writing =20= the file you can either write to the original encoding, that is only =20 if you memorized that, or to the native encoding. In my experience b) is rarely worth the added complexity, but this =20 pretty much depends on your use case. I've used frequently used a) in =20= cross plattform projects without problems even with users working on =20 two plattforms simultaneously. To get back to my first sentence. To =20 me the key part of both solutions is to only use the "native" C++ =20 encoding '\n' internally as it simplifies the code and ensures =20 interoperability with third party code like widget libraries. What do you think? > - Often these will just do things like convert /r to /n... which =20 > would double your line endings. We'll look into it. > > I did fix some code related to line endings awhile ago - are you =20 > running the 1.0.11 release? Unless I am missing something it's CVS Head. Regards Thomas Thomas Witt wi...@ac... |
From: Sean P. <sp...@ad...> - 2005-12-29 19:18:08
|
On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: > Sean, > > On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: > >> The parsers are all line ending agnostic. For the Begin app - I'd =20 >> assume (I haven't looked at that part of the code) it converts =20 >> into platform line endings on the way in - and probably leaves =20 >> them that way. > > Hmm=85 to me this seems like a complicated and brittle way to deal =20 > with this. Let me explain=85 The parsers being line ending agnostic (I happen to think all code =20 dealing with text files should be) - is the correct thing. The Begin =20 app has to convert to platform line endings at some point because the =20= OS controls deal with it that way. > >> On Windows this would convert /n to /r/n - which may be what you =20 >> are seeing? > >> I _really_ dislike any iostream libraries which try to auto-=20 >> convert line endings > > IIUC this is what every std conforming iostream library does by =20 > default. Well to be precise, it is what it asks the OS to do. I.e. =20 > the OS is responsible for the mapping between in memory =20 > representation and on disk representation of file contents. On =20 > Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 > plattform where the in memory representation uses anything but a =20 > single '\n' for line endings. Come to think of this does the C =20 > standard actually specify this? There is no such requirement for C to map line endings (or C++) in =20 the standard. We recently reviewed the definition of a line ending in =20= the standard committee because GCC treats treats the following as a =20 single comment line where [sp] is any sequence of white space =20 characters: // some comment \[sp] a =3D b; The \ character quotes an "end of line" which gcc defines to be any =20 white space followed by \n. I don't recall what the Windows iostream =20 library actually does here - I think they convert the other way on =20 the way in (\n -> \r\n) but I'd have to go look again. > > Given this there seem to be two strategies to deal with this. > > a) Require every file to be in the plattforms native encoding and =20 > let the OS do the work. The program logic will always assume line =20 > endings to be '\n'. This isn't workable because as noted above the "in memory" =20 representation at some point will have to be for the platform (\r\n =20 on Windows) for the OS controls to function properly. The on disk =20 representation can be anything because files get moved from platform =20 to platform. > > b) Try to detect the line ending encoding on input and do the =20 > mapping "native-encoding" -> '\n' yourself on input (Experience =20 > shows that this is complicated and a frequent source of bugs). The =20 > rest of the program logic will still assume line endings to be =20 > '\n'. When writing the file you can either write to the original =20 > encoding, that is only if you memorized that, or to the native =20 > encoding. In general I'd rather simply be agnostic as much as possible and =20 convert only when necessary - if our current logic to convert on =20 reading for Windows is correct (it may not be, I have not review it) =20 and what you are seeing is simply \n getting converted to \r\n then I =20= don't have any issues - if you are actually seeing extra blank lines =20 in the Begin app then we have something wrong. I can see three valid behaviors for the begin application - first it =20 should properly read _any_ line endings then it should: 1. Always write platform line endings 2. Always write Unix line endings as a canonical form (probably a bad =20= thing because many Windows apps can't read this correctly) 3. Always preserve line endings styles of the file On the Mac I tend to prefer answer 2 or 3 because the tool set is =20 agnostic - but if you consider the Unix side now to be the "native" =20 representation on the Mac then 2 and 1 become the same. On Windows I =20 prefer 1 because this allows you to use other tools on the platform =20 with files saved from begin. So I'd vote for 1 with the caveat that =20 we define \n to be the "platform line endings" for Mac instead of the =20= classic Mac OS \r. Sean > > In my experience b) is rarely worth the added complexity, but this =20 > pretty much depends on your use case. I've used frequently used a) =20 > in cross plattform projects without problems even with users =20 > working on two plattforms simultaneously. To get back to my first =20 > sentence. To me the key part of both solutions is to only use the =20 > "native" C++ encoding '\n' internally as it simplifies the code and =20= > ensures interoperability with third party code like widget libraries. > > What do you think? > >> - Often these will just do things like convert /r to /n... which =20 >> would double your line endings. We'll look into it. >> >> I did fix some code related to line endings awhile ago - are you =20 >> running the 1.0.11 release? > > Unless I am missing something it's CVS Head. Can you send me a file where the line endings have been "doubled" - =20 I'd like to see what they actually are (please ZIP the file or the =20 email system will attempt to convert line endings...). The Adobe e-=20 mail server is picky about letting in zip files so change the =20 extension to .zap or send it to sea...@ma.... Sean > > Regards > > Thomas > > Thomas Witt > wi...@ac... > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through =20 > log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD =20 > SPLUNK! > http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dclick > _______________________________________________ > Adobe-source-devel mailing list > Ado...@li... > https://lists.sourceforge.net/lists/listinfo/adobe-source-devel |
From: Thomas W. <wi...@ac...> - 2005-12-29 21:27:10
|
Sean, Sean Parent wrote: >=20 > On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: >=20 >> Sean, >> >> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: >> > The Begin =20 > app has to convert to platform line endings at some point because the =20 > OS controls deal with it that way. I think you are wrong here. I am pretty sure Windows deals in '\n' only.=20 Here is an excerpt from the MSDN docs regarding fopen. ... t Open in text (translated) mode. In this mode, CTRL+Z is interpreted as=20 an end-of-file character on input. In files opened for reading/writing=20 with "a+", fopen checks for a CTRL+Z at the end of the file and removes=20 it, if possible. This is done because using fseek and ftell to move=20 within a file that ends with a CTRL+Z, may cause fseek to behave=20 improperly near the end of the file. Also, in text mode, carriage return=96linefeed combinations are translate= d=20 into single linefeeds on input, and linefeed characters are translated=20 to carriage return=96linefeed combinations on output. When a Unicode=20 stream-I/O function operates in text mode (the default), the source or=20 destination stream is assumed to be a sequence of multibyte characters.=20 Therefore, the Unicode stream-input functions convert multibyte=20 characters to wide characters (as if by a call to the mbtowc function).=20 For the same reason, the Unicode stream-output functions convert wide=20 characters to multibyte characters (as if by a call to the wctomb=20 function). b Open in binary (untranslated) mode; translations involving=20 carriage-return and linefeed characters are suppressed. If t or b is not given in mode, the default translation mode is defined=20 by the global variable _fmode. If t or b is prefixed to the argument,=20 the function fails and returns NULL. For more information about using text and binary modes in Unicode and=20 multibyte stream-I/O, see Text and Binary Mode File I/O and Unicode=20 Stream I/O in Text and Binary Modes. ... with text mode I/O being the default. Looking at the libstdc++v3 FAQ in Apples developer documentation you'll=20 see the same thing. I.e. text mode I/O does convert native line endings=20 to '\n' I am aware of the fact that this is no proof that the controls or any=20 other part of the API will deal in '\n' only but it's at least a strong=20 indication. Having the default file I/O mode being incompatible with the=20 rest of the API is hard to imagine even for seasoned Windows programmer. >> IIUC this is what every std conforming iostream library does by =20 >> default. Well to be precise, it is what it asks the OS to do. I.e. =20 >> the OS is responsible for the mapping between in memory =20 >> representation and on disk representation of file contents. On =20 >> Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 >> plattform where the in memory representation uses anything but a =20 >> single '\n' for line endings. Come to think of this does the C =20 >> standard actually specify this? >=20 >=20 > There is no such requirement for C to map line endings (or C++) in the= =20 > standard. AFAICS the standard (either one) is mostly silent on what text mode and=20 binary mode means. I wonder if that should be considered a defect. My statement seems to correspond to existing practice though. > We recently reviewed the definition of a line ending in the > standard committee because GCC treats treats the following as a single= =20 > comment line where [sp] is any sequence of white space characters: >=20 > // some comment \[sp] > a =3D b; >=20 > The \ character quotes an "end of line" which gcc defines to be any =20 > white space followed by \n. I don't recall what the Windows iostream =20 > library actually does here - I think they convert the other way on the= =20 > way in (\n -> \r\n) but I'd have to go look again. I have to reread the thread but I think this is actually a different=20 issue. It involves the interpretation of source files by the compiler=20 and the compiler is free to use any I/O scheme to read them. >=20 >> >> Given this there seem to be two strategies to deal with this. >> >> a) Require every file to be in the plattforms native encoding and let= =20 >> the OS do the work. The program logic will always assume line endings= =20 >> to be '\n'. >=20 >=20 > This isn't workable because as noted above the "in memory" =20 > representation at some point will have to be for the platform (\r\n on= =20 > Windows) for the OS controls to function properly. As said above I think you are wrong here. > The on disk =20 > representation can be anything because files get moved from platform t= o=20 > platform. Agreed if you want to tolerate this a) is not workable. >=20 >> >> b) Try to detect the line ending encoding on input and do the mapping= =20 >> "native-encoding" -> '\n' yourself on input (Experience shows that=20 >> this is complicated and a frequent source of bugs). The rest of the=20 >> program logic will still assume line endings to be '\n'. When writing= =20 >> the file you can either write to the original encoding, that is only=20 >> if you memorized that, or to the native encoding. >=20 >=20 > In general I'd rather simply be agnostic as much as possible and =20 > convert only when necessary=20 Lets say you modify the file contents and then write them to disk what=20 line endings do you use in your modifications? > I can see three valid behaviors for the begin application - first it =20 > should properly read _any_ line endings then it should: That is a reasonable requirement. As a user I would not ask for that,=20 but it is nice to have. >=20 > 1. Always write platform line endings > 2. Always write Unix line endings as a canonical form (probably a bad =20 > thing because many Windows apps can't read this correctly) > 3. Always preserve line endings styles of the file >=20 > So I'd vote for 1 with the caveat that we=20 > define \n to be the "platform line endings" for Mac instead of the =20 > classic Mac OS \r. Agreed. >=20 > Can you send me a file where the line endings have been "doubled" - I'= d=20 > like to see what they actually are (please ZIP the file or the email=20 > system will attempt to convert line endings...). The Adobe e- mail=20 > server is picky about letting in zip files so change the extension to=20 > .zap or send it to sea...@ma.... Will do. Thomas --=20 Thomas Witt wi...@ac... |
From: Sean P. <sp...@ad...> - 2005-12-29 23:15:34
|
New hypothesis, After looking at the code there is the following line in =20 express_viewer.hpp --- adobe::replace( filtered, '\r', '\n' ); --- My guess is that this is there to handle \r to \n conversion on the =20 Mac - but this is also getting applied on Windows. The result is that the file which was \r\n windows line endings now =20 has \n\n line endings which are then getting converted by the windows =20= stream code to \r\n\r\n which is what I'm seeing in the final file - Since I'm not clear on what the requirements are for the intermediate =20= formats I'm not going to attempt a quick fix. I'll let Foster noodle =20 it around when he's back from vacation :-) Sean On Dec 29, 2005, at 1:26 PM, Thomas Witt wrote: > > Sean, > > Sean Parent wrote: >> On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: >>> Sean, >>> >>> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: >>> > >> The Begin app has to convert to platform line endings at some =20 >> point because the OS controls deal with it that way. > > I think you are wrong here. I am pretty sure Windows deals in '\n' =20 > only. Here is an excerpt from the MSDN docs regarding fopen. > > ... > > t > > Open in text (translated) mode. In this mode, CTRL+Z is interpreted =20= > as an end-of-file character on input. In files opened for reading/=20 > writing with "a+", fopen checks for a CTRL+Z at the end of the file =20= > and removes it, if possible. This is done because using fseek and =20 > ftell to move within a file that ends with a CTRL+Z, may cause =20 > fseek to behave improperly near the end of the file. > > Also, in text mode, carriage return=96linefeed combinations are =20 > translated into single linefeeds on input, and linefeed characters =20 > are translated to carriage return=96linefeed combinations on output. =20= > When a Unicode stream-I/O function operates in text mode (the =20 > default), the source or destination stream is assumed to be a =20 > sequence of multibyte characters. Therefore, the Unicode stream-=20 > input functions convert multibyte characters to wide characters (as =20= > if by a call to the mbtowc function). For the same reason, the =20 > Unicode stream-output functions convert wide characters to =20 > multibyte characters (as if by a call to the wctomb function). > > b > > Open in binary (untranslated) mode; translations involving carriage-=20= > return and linefeed characters are suppressed. > > If t or b is not given in mode, the default translation mode is =20 > defined by the global variable _fmode. If t or b is prefixed to the =20= > argument, the function fails and returns NULL. > > For more information about using text and binary modes in Unicode =20 > and multibyte stream-I/O, see Text and Binary Mode File I/O and =20 > Unicode Stream I/O in Text and Binary Modes. > > ... > > with text mode I/O being the default. > > Looking at the libstdc++v3 FAQ in Apples developer documentation =20 > you'll see the same thing. I.e. text mode I/O does convert native =20 > line endings to '\n' > > I am aware of the fact that this is no proof that the controls or =20 > any other part of the API will deal in '\n' only but it's at least =20 > a strong indication. Having the default file I/O mode being =20 > incompatible with the rest of the API is hard to imagine even for =20 > seasoned Windows programmer. > >>> IIUC this is what every std conforming iostream library does by =20 >>> default. Well to be precise, it is what it asks the OS to do. =20 >>> I.e. the OS is responsible for the mapping between in memory =20 >>> representation and on disk representation of file contents. On =20 >>> Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 >>> plattform where the in memory representation uses anything but a =20= >>> single '\n' for line endings. Come to think of this does the C =20 >>> standard actually specify this? >> There is no such requirement for C to map line endings (or C++) =20 >> in the standard. > > AFAICS the standard (either one) is mostly silent on what text mode =20= > and binary mode means. I wonder if that should be considered a defect. > > My statement seems to correspond to existing practice though. > > > We recently reviewed the definition of a line ending in the >> standard committee because GCC treats treats the following as a =20 >> single comment line where [sp] is any sequence of white space =20 >> characters: >> // some comment \[sp] >> a =3D b; >> The \ character quotes an "end of line" which gcc defines to be =20 >> any white space followed by \n. I don't recall what the Windows =20 >> iostream library actually does here - I think they convert the =20 >> other way on the way in (\n -> \r\n) but I'd have to go look again. > > I have to reread the thread but I think this is actually a =20 > different issue. It involves the interpretation of source files by =20 > the compiler and the compiler is free to use any I/O scheme to read =20= > them. > > >>> >>> Given this there seem to be two strategies to deal with this. >>> >>> a) Require every file to be in the plattforms native encoding =20 >>> and let the OS do the work. The program logic will always assume =20= >>> line endings to be '\n'. >> This isn't workable because as noted above the "in memory" =20 >> representation at some point will have to be for the platform (\r=20 >> \n on Windows) for the OS controls to function properly. > > As said above I think you are wrong here. > >> The on disk representation can be anything because files get =20 >> moved from platform to platform. > > Agreed if you want to tolerate this a) is not workable. > >>> >>> b) Try to detect the line ending encoding on input and do the =20 >>> mapping "native-encoding" -> '\n' yourself on input (Experience =20 >>> shows that this is complicated and a frequent source of bugs). =20 >>> The rest of the program logic will still assume line endings to =20 >>> be '\n'. When writing the file you can either write to the =20 >>> original encoding, that is only if you memorized that, or to the =20= >>> native encoding. >> In general I'd rather simply be agnostic as much as possible and =20 >> convert only when necessary > > Lets say you modify the file contents and then write them to disk =20 > what line endings do you use in your modifications? > >> I can see three valid behaviors for the begin application - first =20 >> it should properly read _any_ line endings then it should: > > That is a reasonable requirement. As a user I would not ask for =20 > that, but it is nice to have. > >> 1. Always write platform line endings >> 2. Always write Unix line endings as a canonical form (probably a =20 >> bad thing because many Windows apps can't read this correctly) >> 3. Always preserve line endings styles of the file >> So I'd vote for 1 with the caveat that we define \n to be the =20 >> "platform line endings" for Mac instead of the classic Mac OS \r. > > Agreed. > >> Can you send me a file where the line endings have been "doubled" =20 >> - I'd like to see what they actually are (please ZIP the file or =20 >> the email system will attempt to convert line endings...). The =20 >> Adobe e- mail server is picky about letting in zip files so change =20= >> the extension to .zap or send it to sea...@ma.... > > Will do. > > Thomas > > --=20 > Thomas Witt > wi...@ac... |
From: Thomas W. <wi...@ac...> - 2005-12-30 00:15:25
|
Sean, Sean Parent wrote: > New hypothesis, > > After looking at the code there is the following line in > express_viewer.hpp > --- > adobe::replace( filtered, '\r', '\n' ); > --- > > My guess is that this is there to handle \r to \n conversion on the Mac > - but this is also getting applied on Windows. > > The result is that the file which was \r\n windows line endings now has > \n\n line endings which are then getting converted by the windows > stream code to \r\n\r\n which is what I'm seeing in the final file - Yup, I think that's what happens. In my local copy I use text mode for reading and I've removed that line. Works as long as you don't have files from other plattforms. Thomas -- Thomas Witt wi...@ac... |
From: Thomas W. <wi...@ac...> - 2006-01-03 22:20:33
|
Sean, On Dec 29, 2005, at 1:26 PM, Thomas Witt wrote: > Sean, > > Sean Parent wrote: >> On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: >>> Sean, >>> >>> On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: >>> > >> The Begin app has to convert to platform line endings at some >> point because the OS controls deal with it that way. > > I think you are wrong here. I am pretty sure Windows deals in '\n' > only. Here is an excerpt from the MSDN docs regarding fopen. > Well I don't quite know how to put it, it seems in the end you are right. Even .Net controls seem to require CRLF in order to function correctly. This is puzzling on so many levels. I'll be hiding behind a copy of the "Unix Haters Handbook" for a while. Humbly-yours Thomas Thomas Witt wi...@ac... |