From: <rob...@dw...> - 2010-09-29 13:50:56
|
In my program I use fread and fsetpos to increase the speed to access data in file (I record a set of file positions). I have some problems with not ascii filename. To open file I use fopen that needs an ascii file name. So for unicode file name I try _wfopen: I can read line by line, but I am not able to use fread function (it's slow) In the documentation (also glib) I read that there is a difference in FILE struct used by _wfopen. Is there a way to memorize positions in a file, to seek to positions and to read a buffer (like fread)? Thanks Roberto |
From: Manuel M. <mm...@ce...> - 2010-09-30 20:13:26
|
> In my program I use fread and fsetpos to increase the speed to access data > in file (I record a set of file positions). > I have some problems with not ascii filename. > > To open file I use fopen that needs an ascii file name. > So for unicode file name I try _wfopen: I can read line by line, but I am > not able to use fread function (it's slow) > > In the documentation (also glib) I read that there is a difference in FILE > struct used by _wfopen. > Is there a way to memorize positions in a file, to seek to positions and to > read a buffer (like fread)? > > Thanks > > Roberto > Here you have a list of all possible stream I/O functions in MSWindows http://msdn.microsoft.com/en-us/library/c565h7xx%28v=vs.71%29.aspx The most remarkable is that fread reads _unformatted_ data from stream and Unicode IS _formatted_ data (well, really 'encoded' data). If you know that the data is UTF-16 encoded, each char uses exactly two bytes. And so, you can calculate the right positions. But be aware that UTF-8 encoding is a variable-num-of-bytes-for-each-char (between 1 and 5). So positions (in bytes) for data are not available until you read and 'understand' all the bytes for that char. Perhaps storing just line-ends positions is enough for you? Manolo |
From: Martin M. <vi...@gm...> - 2010-09-30 21:18:48
|
2010/9/30 Manuel Martín <mm...@ce...> > If you know that the data is UTF-16 encoded, each char uses exactly two > bytes. And so, you can calculate the right positions. > Not really. One 16-bit value is "code unit". One char is called "code point" (not exactly, see later). Code point is a 32bit value. One code unit can encode 65536 characters, called "Basic multiligual plane", but there are also other characters, whose code point value is higher. For those you need two code units, 32 bits, called "surrogate pair". And it is even more complicated than that. Not all code points represent characters. Some of them represent interpunction (like ´ in á, ˇ in č, etc.), some of them change direction of text flow (eg. right-to-left hebrew word cited in left-to-right english text), etc. For details and unsimplified description, see: http://en.wikipedia.org/wiki/UTF-16 > In the documentation (also glib) I read that there is a difference in FILE > struct used by _wfopen. Aren't you by any chance mixing fopen() from glibc with _wfopen() from Win32API? According to MSDN, they both operate on same structure, readable by Win32API fread(). See: http://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.71).aspx |
From: Tor L. <tm...@ik...> - 2010-09-30 21:27:52
|
> The most remarkable is that fread reads _unformatted_ data from stream > and Unicode IS _formatted_ data (well, really 'encoded' data). But did the original posten say he would be reading any kind of Unicode data? He just mentioned needing to provide the file *name* in Unicode. > If you know that the data is UTF-16 encoded, each char uses exactly two > bytes. And so, you can calculate the right positions. Be careful with your terminology here. "char" means (at least in the context of this list) a C or C++ "char", i.e. a byte. In UTF-16, a Unicode *character* (code point) uses two or four bytes. > But be aware that UTF-8 encoding is a variable-num-of-bytes-for-each-char (between 1 and 5). There has been no mention at all of UTF-8 in this (short) thread so far, why do you think it would be relevant? > So positions (in bytes) for data are not available until you read and 'understand' all > the bytes for that char. Again, don't say "char" if you mean "character". --tml |
From: Tor L. <tm...@ik...> - 2010-09-30 21:31:48
|
> Aren't you by any chance mixing fopen() from glibc with _wfopen() from > Win32API? According to MSDN, they both operate on same structure, readable > by Win32API fread(). What does glibc, the GNU C library, nowadays supported (as far as I know) only for Linux, has to do with this thread or MinGW in general? Both fopen() and _wfopen() are in the Microsoft C libraries. (Of which MinGW-compiled code uses the msvcrt.dll variant, the "system" C library.) With Win32 API one usually means the lower level functions, like CreateFile() and ReadFile(). --tml |
From: Martin M. <vi...@gm...> - 2010-09-30 22:20:03
|
On Thu, Sep 30, 2010 at 11:31 PM, Tor Lillqvist <tm...@ik...> wrote: > > Aren't you by any chance mixing fopen() from glibc with _wfopen() from > > Win32API? According to MSDN, they both operate on same structure, > readable > > by Win32API fread(). > > What does glibc, the GNU C library, nowadays supported (as far as I > know) only for Linux, has to do with this thread or MinGW in general? Correct, my bad. Because of seeing MinGW supporting POSIX stuff not supported by MSVC, I got this wrong impression somehow. Anyway, in this case it seems that it's okay to use fsetpos() / fgetpos() / fread() on both files opened by fopen() and _wfopen(), and fread() should do just what you want - read unformatted bytes into buffer. Are you sure it is fread() that slows you down? |
From: Tor L. <tm...@ik...> - 2010-09-29 14:11:27
|
> To open file I use fopen that needs an ascii file name. Actually, system codepage, which is more (for double-byte code pages, a *lot* more) than just ASCII. > So for unicode file name I try _wfopen: I can read line by line, but I am > not able to use fread function (it's slow) For fread() it doesn't matter at all whether the stream has been opened with fopen() or _wfopen(). > In the documentation (also glib) I read that there is a difference in FILE > struct used by _wfopen. What documentation, what does GLib have to do with fopen(), _wfopen() and fread()? > Is there a way to memorize positions in a file, to seek to positions and to > read a buffer (like fread)? Well, ftell() and fseek()? (Or, if you want to be able to handle files over 2 GB, _ftelli64() and _fseeki64().) Perhaps you should show us some minimal code sample that exhibits your actual problem. --tml |
From: <rob...@dw...> - 2010-10-04 07:43:52
|
Thanks for your suggestions. I found my error: I open file with _wfopen and then I reopen with fopen. Excuse me for mistake. Roberto -----Messaggio originale----- Da: Tor Lillqvist [mailto:tm...@ik...] Inviato: mercoledì 29 settembre 2010 16:11 A: MinGW Users List Oggetto: Re: [Mingw-users] fread and _wfopen > To open file I use fopen that needs an ascii file name. Actually, system codepage, which is more (for double-byte code pages, a *lot* more) than just ASCII. > So for unicode file name I try _wfopen: I can read line by line, but I > am not able to use fread function (it's slow) For fread() it doesn't matter at all whether the stream has been opened with fopen() or _wfopen(). > In the documentation (also glib) I read that there is a difference in > FILE struct used by _wfopen. What documentation, what does GLib have to do with fopen(), _wfopen() and fread()? > Is there a way to memorize positions in a file, to seek to positions > and to read a buffer (like fread)? Well, ftell() and fseek()? (Or, if you want to be able to handle files over 2 GB, _ftelli64() and _fseeki64().) Perhaps you should show us some minimal code sample that exhibits your actual problem. --tml ---------------------------------------------------------------------------- -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ MinGW-users mailing list Min...@li... This list observes the Etiquette found at http://www.mingw.org/Mailing_Lists. We ask that you be polite and do the same. Disregard for the list etiquette may cause your account to be moderated. _______________________________________________ You may change your MinGW Account Options or unsubscribe at: https://lists.sourceforge.net/lists/listinfo/mingw-users |