|
From: Paul L. <sa2...@cy...> - 2011-03-20 17:03:03
|
I'd like to handle Unix-style as well as DOS-style line endings on the MinGW version of my code. Does anyone have any thoughts on this? The problem is that I'd prefer to open input files in text mode. When your open a NL-only file in text mode, however, fpos/fseek don't work on the file. If you get a file position, and then seek to it, you end up somewhere else in the file. This works for CR/NL files, of course, but not NL-only files. I've also tried fgetpos/fsetpos, but they have exactly the same problem. I've also tried _setmode, to set the file mode to binary before the fpos and back to text after the fseek. This works for NL-only files, but fails for CR/NL files. MSDN also says that you should call _setmode before doing any file I/O, so I wouldn't want to do this anyway. AFAICT, if I want to deal with "text" files that are NL-only, and I want to fpos/fseek on those files, then I must open them in binary mode. Unfortunately, when I do this, I get other errors in my code, which I haven't looked at yet. I guess I may just have to fix these instead. I suppose one point of view is that you shouldn't open NL-only files in "text" mode on Windows, because they're not what Windows considers to be "text". Now that I've written it down, I suppose that's the only logical answer. Thoughts? Thanks - Paul |
|
From: LRN <lr...@gm...> - 2011-03-20 17:38:08
|
On 20.03.2011 20:02, Paul Leder wrote: > I'd like to handle Unix-style as well as DOS-style line endings on the > MinGW version of my code. Does anyone have any thoughts on this? > > The problem is that I'd prefer to open input files in text mode. When > your open a NL-only file in text mode, however, fpos/fseek don't work on > the file. If you get a file position, and then seek to it, you end up > somewhere else in the file. This works for CR/NL files, of course, but > not NL-only files. > > I've also tried fgetpos/fsetpos, but they have exactly the same problem. > > I've also tried _setmode, to set the file mode to binary before the fpos > and back to text after the fseek. This works for NL-only files, but > fails for CR/NL files. MSDN also says that you should call _setmode > before doing any file I/O, so I wouldn't want to do this anyway. > > AFAICT, if I want to deal with "text" files that are NL-only, and I want > to fpos/fseek on those files, then I must open them in binary mode. > Unfortunately, when I do this, I get other errors in my code, which I > haven't looked at yet. I guess I may just have to fix these instead. > > I suppose one point of view is that you shouldn't open NL-only files in > "text" mode on Windows, because they're not what Windows considers to be > "text". Now that I've written it down, I suppose that's the only logical > answer. > > Thoughts? > > Thanks - > > Paul > A) Convert all files to CRLF before using them (unix2dos) B) Write an IO layer that abstracts your code from line ending specifics (i.e. does the same thing "text" mode ought to do, but for both CRLF and LF files) C) Fix your code to work with text files of both types. Answer = A || B || C; |
|
From: Earnie <ea...@us...> - 2011-03-21 16:14:18
|
Paul Leder wrote: > I'd like to handle Unix-style as well as DOS-style line endings on the > MinGW version of my code. Does anyone have any thoughts on this? > > The problem is that I'd prefer to open input files in text mode. When > your open a NL-only file in text mode, however, fpos/fseek don't work on > the file. If you get a file position, and then seek to it, you end up > somewhere else in the file. This works for CR/NL files, of course, but > not NL-only files. > I think you're backwards here, or I'm understanding you backwards. The text mode I/O will remove the CR making fpos/fseek worthless. You want to open the files in binary mode and handle the dangling CR at the end of the strings. > I've also tried fgetpos/fsetpos, but they have exactly the same problem. > > I've also tried _setmode, to set the file mode to binary before the fpos > and back to text after the fseek. This works for NL-only files, but > fails for CR/NL files. MSDN also says that you should call _setmode > before doing any file I/O, so I wouldn't want to do this anyway. > Yea, you need to leave them in binary mode. > AFAICT, if I want to deal with "text" files that are NL-only, and I want > to fpos/fseek on those files, then I must open them in binary mode. > Unfortunately, when I do this, I get other errors in my code, which I > haven't looked at yet. I guess I may just have to fix these instead. > Yes, you should fix the other errors. It is a long known issue. The other errors may be related to dangling CR which will remain in the strings when read. > I suppose one point of view is that you shouldn't open NL-only files in > "text" mode on Windows, because they're not what Windows considers to be > "text". Now that I've written it down, I suppose that's the only logical > answer. > The only problem I know for NL only files is notepad because it reads the data in binary mode expecting to find a CR to put the data on the next line and some versions of MSVC editor. I've never heard of fpos/fseek having an issue with NL only files, rather the reverse. But things may have changed for newer versions of the OS and/or MSVCRT? -- Earnie -- http://www.for-my-kids.com |
|
From: Keith M. <kei...@us...> - 2011-03-21 21:25:31
|
On 20/03/11 17:02, Paul Leder wrote: > I'd like to handle Unix-style as well as DOS-style line endings on the > MinGW version of my code. Does anyone have any thoughts on this? FWIW, my 2p. > The problem is that I'd prefer to open input files in text mode. >From a POSIX perspective, the MS distinction between _O_BINARY and _O_TEXT is an aberration; indeed these I/O modes aren't even defined for POSIX versions of GCC. If you value portability of your code, (and your desire to handle POSIX line endings seamlessly, alongside MS endings suggests that you may), don't rely on MS text mode. Ever. Write your code to explicitly handle CRLF as equivalent to NL instead. > When your open a NL-only file in text mode, however, fpos/fseek > don't work on the file. If you get a file position, and then seek to > it, you end up somewhere else in the file. This works for CR/NL > files, of course, but not NL-only files. I'm confused. It has always been my understanding that fpos/fseek cannot be used reliably with any file opened in text mode, because your application never sees the file content as it really is; fseek sets position in terms of raw byte count within the physical file, but because not all of those bytes are actually seem by the application, it is uncertain if fpos computes position on a consistent basis. > I've also tried fgetpos/fsetpos, but they have exactly the same > problem. Since these presumably rely on the same underlying system code, that's hardly surprising. > I've also tried _setmode, to set the file mode to binary before the fpos > and back to text after the fseek. This works for NL-only files, but > fails for CR/NL files. MSDN also says that you should call _setmode > before doing any file I/O, so I wouldn't want to do this anyway. No, you shouldn't do this. > AFAICT, if I want to deal with "text" files that are NL-only, and I > want to fpos/fseek on those files, then I must open them in binary > mode. Unfortunately, when I do this, I get other errors in my code, > which I haven't looked at yet. I guess I may just have to fix these > instead. I guess you should probably fix them regardless. > I suppose one point of view is that you shouldn't open NL-only files > in "text" mode on Windows, because they're not what Windows considers > to be "text". Now that I've written it down, I suppose that's the > only logical answer. > > Thoughts? My POV: don't ever rely on MS text mode. Write your code as if all files are "binary", then you won't have to rewrite it, if/when you want to use it on a POSIX system, where MS text mode is meaningless. -- Regards, Keith. |