From: Sean P. <sp...@ad...> - 2005-12-29 19:18:08
|
On Dec 29, 2005, at 9:14 AM, Thomas Witt wrote: > Sean, > > On Dec 28, 2005, at 10:44 PM, Sean Parent wrote: > >> The parsers are all line ending agnostic. For the Begin app - I'd =20 >> assume (I haven't looked at that part of the code) it converts =20 >> into platform line endings on the way in - and probably leaves =20 >> them that way. > > Hmm=85 to me this seems like a complicated and brittle way to deal =20 > with this. Let me explain=85 The parsers being line ending agnostic (I happen to think all code =20 dealing with text files should be) - is the correct thing. The Begin =20 app has to convert to platform line endings at some point because the =20= OS controls deal with it that way. > >> On Windows this would convert /n to /r/n - which may be what you =20 >> are seeing? > >> I _really_ dislike any iostream libraries which try to auto-=20 >> convert line endings > > IIUC this is what every std conforming iostream library does by =20 > default. Well to be precise, it is what it asks the OS to do. I.e. =20 > the OS is responsible for the mapping between in memory =20 > representation and on disk representation of file contents. On =20 > Windows this will mean mapping "\r\n" to "\n". I haven't seen a =20 > plattform where the in memory representation uses anything but a =20 > single '\n' for line endings. Come to think of this does the C =20 > standard actually specify this? There is no such requirement for C to map line endings (or C++) in =20 the standard. We recently reviewed the definition of a line ending in =20= the standard committee because GCC treats treats the following as a =20 single comment line where [sp] is any sequence of white space =20 characters: // some comment \[sp] a =3D b; The \ character quotes an "end of line" which gcc defines to be any =20 white space followed by \n. I don't recall what the Windows iostream =20 library actually does here - I think they convert the other way on =20 the way in (\n -> \r\n) but I'd have to go look again. > > Given this there seem to be two strategies to deal with this. > > a) Require every file to be in the plattforms native encoding and =20 > let the OS do the work. The program logic will always assume line =20 > endings to be '\n'. This isn't workable because as noted above the "in memory" =20 representation at some point will have to be for the platform (\r\n =20 on Windows) for the OS controls to function properly. The on disk =20 representation can be anything because files get moved from platform =20 to platform. > > b) Try to detect the line ending encoding on input and do the =20 > mapping "native-encoding" -> '\n' yourself on input (Experience =20 > shows that this is complicated and a frequent source of bugs). The =20 > rest of the program logic will still assume line endings to be =20 > '\n'. When writing the file you can either write to the original =20 > encoding, that is only if you memorized that, or to the native =20 > encoding. In general I'd rather simply be agnostic as much as possible and =20 convert only when necessary - if our current logic to convert on =20 reading for Windows is correct (it may not be, I have not review it) =20 and what you are seeing is simply \n getting converted to \r\n then I =20= don't have any issues - if you are actually seeing extra blank lines =20 in the Begin app then we have something wrong. I can see three valid behaviors for the begin application - first it =20 should properly read _any_ line endings then it should: 1. Always write platform line endings 2. Always write Unix line endings as a canonical form (probably a bad =20= thing because many Windows apps can't read this correctly) 3. Always preserve line endings styles of the file On the Mac I tend to prefer answer 2 or 3 because the tool set is =20 agnostic - but if you consider the Unix side now to be the "native" =20 representation on the Mac then 2 and 1 become the same. On Windows I =20 prefer 1 because this allows you to use other tools on the platform =20 with files saved from begin. So I'd vote for 1 with the caveat that =20 we define \n to be the "platform line endings" for Mac instead of the =20= classic Mac OS \r. Sean > > In my experience b) is rarely worth the added complexity, but this =20 > pretty much depends on your use case. I've used frequently used a) =20 > in cross plattform projects without problems even with users =20 > working on two plattforms simultaneously. To get back to my first =20 > sentence. To me the key part of both solutions is to only use the =20 > "native" C++ encoding '\n' internally as it simplifies the code and =20= > ensures interoperability with third party code like widget libraries. > > What do you think? > >> - Often these will just do things like convert /r to /n... which =20 >> would double your line endings. We'll look into it. >> >> I did fix some code related to line endings awhile ago - are you =20 >> running the 1.0.11 release? > > Unless I am missing something it's CVS Head. Can you send me a file where the line endings have been "doubled" - =20 I'd like to see what they actually are (please ZIP the file or the =20 email system will attempt to convert line endings...). The Adobe e-=20 mail server is picky about letting in zip files so change the =20 extension to .zap or send it to sea...@ma.... Sean > > Regards > > Thomas > > Thomas Witt > wi...@ac... > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through =20 > log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD =20 > SPLUNK! > http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dclick > _______________________________________________ > Adobe-source-devel mailing list > Ado...@li... > https://lists.sourceforge.net/lists/listinfo/adobe-source-devel |