Menu

#855 Possible fread() malfunction of GCC 7.3.0

v1.0 (example)
open
nobody
None
5
2020-10-03
2020-09-27
Sanmayce
No

Hi,
by mistake firstly I reported the issue to GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

Regarding a C source loading an entire 3.3GB file and checksumming it.

First, I use Intel v15.0 and GCC v7.3.0, on Windows 64bit.
For my dismay I encountered that Intel's binary loads and reports the correct checksum, whereas GCC's binary fails, after comparing the loaded content I saw that GCC loads all the file into a malloc-ed pool but without the last ~860 bytes?!

If you need to reproduce the issue - the two binaries (GCC and Intel) and the C source as well are here:
www.sanmayce.com/Nakamichi/Satanichi_aka_Nakamichi_2020-Jun-09_BUG_ZEROED-END.zip

The file being loaded is the Human Genome:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GCA_000001405.28_GRCh38.p13_genomic.fna.gz

This bug never appeared with files 1GB or less in size, my guess, this is a clue.

These are the files:

06/11/2020  09:16 AM         1,316,439 Nakamichi_Ryuugan-ditto-1TB_btree.c
06/15/2019  02:37 AM     3,313,087,324 NCBI_FTP_Homo_sapiens_(human)_GCA_000001405.28_GRCh38.p13_genomic.fna
06/15/2019  02:37 AM     3,313,087,324 q
01/07/2018  05:26 PM           191,644 Satanichi_GCC730_64bit.exe
06/11/2020  09:16 AM           198,144 Satanichi_ICL150_64bit.exe

As you can see below, the same file is loaded differently into malloc-ed pool:

D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_GCC730_64bit.exe q w 20 888 i
...
Allocating Source-Buffer 3,159 MB ...
Allocating Target-Buffer 3,191 MB ...
Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xc1d4,3f7f
...
D:\Satanichi_aka_Nakamichi_2020-Jun-09>

D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_ICL150_64bit.exe q w 20 888 i
Allocating Source-Buffer 3,159 MB ...
Allocating Target-Buffer 3,191 MB ...
Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x81bd,fe4b
...
D:\Satanichi_aka_Nakamichi_2020-Jun-09>

If you need more info, will add it...
Very much I would like to know what causes this anomaly/bug.

Georgi

Related

Bugs: #855

Discussion

  • Kai Tietz

    Kai Tietz - 2020-09-28

    Hello,

    I am pretty sure that our fread/fwrite routines work pretty well. The
    underlying issue here seems to be the missing use of "binary"-mode on
    the fopen-call. Not sure what Intel's runtime assumes as default mode
    on open for fopen, but MS' variant assumes text-mode, if no "b" is
    within the mode-option.

    Cheers,
    Kai

    Am So., 27. Sept. 2020 um 04:59 Uhr schrieb Sanmayce
    sanmayce@users.sourceforge.net:


    [bugs:#855] Possible fread() malfunction of GCC 7.3.0

    Status: open
    Group: v1.0 (example)
    Created: Sun Sep 27, 2020 02:59 AM UTC by Sanmayce
    Last Updated: Sun Sep 27, 2020 02:59 AM UTC
    Owner: nobody

    Hi,
    by mistake firstly I reported the issue to GCC:
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

    Regarding a C source loading an entire 3.3GB file and checksumming it.

    First, I use Intel v15.0 and GCC v7.3.0, on Windows 64bit.
    For my dismay I encountered that Intel's binary loads and reports the correct checksum, whereas GCC's binary fails, after comparing the loaded content I saw that GCC loads all the file into a malloc-ed pool but without the last ~860 bytes?!

    If you need to reproduce the issue - the two binaries (GCC and Intel) and the C source as well are here:
    www.sanmayce.com/Nakamichi/Satanichi_aka_Nakamichi_2020-Jun-09_BUG_ZEROED-END.zip

    The file being loaded is the Human Genome:
    ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GCA_000001405.28_GRCh38.p13_genomic.fna.gz

    This bug never appeared with files 1GB or less in size, my guess, this is a clue.

    These are the files:

    06/11/2020 09:16 AM 1,316,439 Nakamichi_Ryuugan-ditto-1TB_btree.c
    06/15/2019 02:37 AM 3,313,087,324 NCBI_FTP_Homo_sapiens_(human)_GCA_000001405.28_GRCh38.p13_genomic.fna
    06/15/2019 02:37 AM 3,313,087,324 q
    01/07/2018 05:26 PM 191,644 Satanichi_GCC730_64bit.exe
    06/11/2020 09:16 AM 198,144 Satanichi_ICL150_64bit.exe

    As you can see below, the same file is loaded differently into malloc-ed pool:

    D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_GCC730_64bit.exe q w 20 888 i
    ...
    Allocating Source-Buffer 3,159 MB ...
    Allocating Target-Buffer 3,191 MB ...
    Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xc1d4,3f7f
    ...
    D:\Satanichi_aka_Nakamichi_2020-Jun-09>

    D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_ICL150_64bit.exe q w 20 888 i
    Allocating Source-Buffer 3,159 MB ...
    Allocating Target-Buffer 3,191 MB ...
    Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x81bd,fe4b
    ...
    D:\Satanichi_aka_Nakamichi_2020-Jun-09>

    If you need more info, will add it...
    Very much I would like to know what causes this anomaly/bug.

    Georgi


    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mingw-w64/bugs/855/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

     

    Related

    Bugs: #855

    • Sanmayce

      Sanmayce - 2020-09-28

      The same said the GCC guy in the link above:
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215#c4

      (In reply to Andrew Pinski from comment #2)

      You need b if you don't want \r\n to be turned into just \n.

      At 11,945th line I use:

          if ((fp = fopen(argv[1], "rb")) == NULL) {
              printf("Nakamichi: Can't open '%s' file.\n", argv[1]); exit(13);
          }
      

      1] As far as I investigated, the problem is that fread() reads less (around 860 bytes) than specified.

      2] Also, as I wrote:

      This bug never appeared with files 1GB or less in size, my guess, this is a clue.

      These two facts suggest the problem is not a mere binary mode issue, something weird happens...

       

      Last edit: Sanmayce 2020-09-28
    • Sanmayce

      Sanmayce - 2020-09-28

      For what is worth the OS is Windows 7 Pro, also the quick checking of a textual file shows the GCC binary works "PROPERLY":

      D:\Satanichi_aka_Nakamichi_2020-Jun-09>dir
      
      06/11/2020  09:16 AM            55,397 lzsse2.cpp
      06/11/2020  09:16 AM         1,316,439 Nakamichi_Ryuugan-ditto-1TB_btree.c
      09/06/2019  11:50 AM       969,050,128 NCBI_FTP_Homo_sapiens_(human)_GCA_000001405.28_GRCh38.p13_genomic.fna.Nakamichi
      06/15/2019  02:37 AM     3,313,087,324 q
      01/07/2018  05:26 PM           191,644 Satanichi_GCC730_64bit.exe
      06/11/2020  09:16 AM           198,144 Satanichi_ICL150_64bit.exe
      
      D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_gcc730_64bit.exe lzsse2.cpp w 888 i
      
      Allocating Source-Buffer 0 MB ...
      Allocating Target-Buffer 32 MB ...
      Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x4d5d,b103
      Leprechaun: Size of input file: 55,397
      
      D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_icl150_64bit.exe lzsse2.cpp w 888 i
      
      Allocating Source-Buffer 0 MB ...
      Allocating Target-Buffer 32 MB ...
      Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x4d5d,b103
      Leprechaun: Size of input file: 55,397
      
      D:\Satanichi_aka_Nakamichi_2020-Jun-09>
      
       
    • Sanmayce

      Sanmayce - 2020-10-02

      I am pretty sure that our fread/fwrite routines work pretty well.

      Kai, I'm using MinGW packages that have certainly that problem, are you 100% sure yours is working with e.g. 2+GB in one go?!

      Still cannot figure out where the problem is, added the return value:

          if ((fp = fopen(argv[1], "rb")) == NULL) {
              printf("Nakamichi: Can't open '%s' file.\n", argv[1]); exit(13);
          }
      ...
          BUG_totalREAD = fread(SourceBlock, 1, SourceSize, fp); // Bug in MinGW, 2020-Oct-01
          fclose(fp);
      

      According to https://man7.org/linux/man-pages/man3/fread.3.html the return value is zero if error occurred.
      In fact, it reads into the pool 3,313,087,324 - (~860) bytes?!

      Tried the latest MinGW package from here: https://nuwen.net/mingw.html
      It behaves the same way as the GCC730 one:

      D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_GCC920_64bit.exe "q" "q.Nakamichi" 20 888 i
      Allocating Source-Buffer 3,159 MB ...
      Allocating Target-Buffer 3,191 MB ...
      Read by fread() 0 bytes ...
      Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xc1d4,3f7f
      Leprechaun: Size of input file: 3,313,087,324
      
      D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_ICL150_64bit.exe "q" "q.Nakamichi" 20 888 i
      Allocating Source-Buffer 3,159 MB ...
      Allocating Target-Buffer 3,191 MB ...
      Read by fread() 3,313,087,324 bytes ...
      Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x81bd,fe4b
      Leprechaun: Size of input file: 3,313,087,324
      

      The files:

      D:\Satanichi_aka_Nakamichi_2020-Jun-09>dir
      06/11/2020  09:16 AM            55,397 lzsse2.cpp
      09/06/2019  11:50 AM       969,050,128 NCBI_FTP_Homo_sapiens_(human)_GCA_000001405.28_GRCh38.p13_genomic.fna.Nakamichi
      06/15/2019  02:37 AM     3,313,087,324 q
      09/13/2019  04:31 AM           400,076 Satanichi_GCC920_64bit.exe
      10/02/2020  05:22 AM           198,144 Satanichi_ICL150_64bit.exe
      

      Again, working with 2-GB file (with lots of \r\n):

      D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_ICL150_64bit.exe "lzsse2.cpp" "lzsse2.cpp.Nakamichi" 20 888 i
      Allocating Source-Buffer 0 MB ...
      Allocating Target-Buffer 32 MB ...
      Read by fread() 55,397 bytes ...
      Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x4d5d,b103
      Leprechaun: Size of input file: 55,397
      
      D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_GCC920_64bit.exe "lzsse2.cpp" "lzsse2.cpp.Nakamichi" 20 888 i
      Allocating Source-Buffer 0 MB ...
      Allocating Target-Buffer 32 MB ...
      Read by fread() 55,397 bytes ...
      Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x4d5d,b103
      Leprechaun: Size of input file: 55,397
      

      Don't know who else to ask, anyone?!

       
      • Kai Tietz

        Kai Tietz - 2020-10-02

        Hello,

        fread actually returns a size_t. So maximal size of read per attempt
        is on 32-bit 2^32-1, and on 64-bit 2^64-1 bytes.

        If you want to process bigger data, I would recommend to operate here
        in junks. The maximal read bytes by the streaming API depends on the
        used open() variant for the file-descriptor. That is indeed a bit
        annoying. So we provide here an alternative for fopen(). You could
        try to replace fopen-call by fopen64()-call instead. See also there is
        the fsetpos64, fgetpos64, _fseeki64, _ftelli64. ftello64, fseeko64,
        ftello64, ... API, which can be pretty helpful to get proper support
        for files >=2GB on Windows.

        Hope this helps
        Kai

        Am Fr., 2. Okt. 2020 um 06:47 Uhr schrieb Sanmayce
        sanmayce@users.sourceforge.net:

        I am pretty sure that our fread/fwrite routines work pretty well.

        Kai, I'm using MinGW packages that have certainly that problem, are you 100% sure yours is working with e.g. 2+GB in one go?!

        Still cannot figure out where the problem is, added the return value:

        if ((fp = fopen(argv[1], "rb")) == NULL) {
            printf("Nakamichi: Can't open '%s' file.\n", argv[1]); exit(13);
        }
        

        ...
        BUG_totalREAD = fread(SourceBlock, 1, SourceSize, fp); // Bug in MinGW, 2020-Oct-01
        fclose(fp);

        According to https://man7.org/linux/man-pages/man3/fread.3.html the return value is zero if error occurred.
        In fact, it reads into the pool 3,313,087,324 - (~860) bytes?!

        Tried the latest MinGW package from here: https://nuwen.net/mingw.html
        It behaves the same way as the GCC730 one:

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_GCC920_64bit.exe "q" "q.Nakamichi" 20 888 i
        Allocating Source-Buffer 3,159 MB ...
        Allocating Target-Buffer 3,191 MB ...
        Read by fread() 0 bytes ...
        Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xc1d4,3f7f
        Leprechaun: Size of input file: 3,313,087,324

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_ICL150_64bit.exe "q" "q.Nakamichi" 20 888 i
        Allocating Source-Buffer 3,159 MB ...
        Allocating Target-Buffer 3,191 MB ...
        Read by fread() 3,313,087,324 bytes ...
        Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x81bd,fe4b
        Leprechaun: Size of input file: 3,313,087,324

        The files:

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>dir
        06/11/2020 09:16 AM 55,397 lzsse2.cpp
        09/06/2019 11:50 AM 969,050,128 NCBI_FTP_Homo_sapiens_(human)_GCA_000001405.28_GRCh38.p13_genomic.fna.Nakamichi
        06/15/2019 02:37 AM 3,313,087,324 q
        09/13/2019 04:31 AM 400,076 Satanichi_GCC920_64bit.exe
        10/02/2020 05:22 AM 198,144 Satanichi_ICL150_64bit.exe

        Again, working with 2-GB file (with lots of \r\n):

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_ICL150_64bit.exe "lzsse2.cpp" "lzsse2.cpp.Nakamichi" 20 888 i
        Allocating Source-Buffer 0 MB ...
        Allocating Target-Buffer 32 MB ...
        Read by fread() 55,397 bytes ...
        Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x4d5d,b103
        Leprechaun: Size of input file: 55,397

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>TIMER64 Satanichi_GCC920_64bit.exe "lzsse2.cpp" "lzsse2.cpp.Nakamichi" 20 888 i
        Allocating Source-Buffer 0 MB ...
        Allocating Target-Buffer 32 MB ...
        Read by fread() 55,397 bytes ...
        Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x4d5d,b103
        Leprechaun: Size of input file: 55,397

        Don't know who else to ask, anyone?!


        [bugs:#855] Possible fread() malfunction of GCC 7.3.0

        Status: open
        Group: v1.0 (example)
        Created: Sun Sep 27, 2020 02:59 AM UTC by Sanmayce
        Last Updated: Mon Sep 28, 2020 01:32 PM UTC
        Owner: nobody

        Hi,
        by mistake firstly I reported the issue to GCC:
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

        Regarding a C source loading an entire 3.3GB file and checksumming it.

        First, I use Intel v15.0 and GCC v7.3.0, on Windows 64bit.
        For my dismay I encountered that Intel's binary loads and reports the correct checksum, whereas GCC's binary fails, after comparing the loaded content I saw that GCC loads all the file into a malloc-ed pool but without the last ~860 bytes?!

        If you need to reproduce the issue - the two binaries (GCC and Intel) and the C source as well are here:
        www.sanmayce.com/Nakamichi/Satanichi_aka_Nakamichi_2020-Jun-09_BUG_ZEROED-END.zip

        The file being loaded is the Human Genome:
        ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GCA_000001405.28_GRCh38.p13_genomic.fna.gz

        This bug never appeared with files 1GB or less in size, my guess, this is a clue.

        These are the files:

        06/11/2020 09:16 AM 1,316,439 Nakamichi_Ryuugan-ditto-1TB_btree.c
        06/15/2019 02:37 AM 3,313,087,324 NCBI_FTP_Homo_sapiens_(human)_GCA_000001405.28_GRCh38.p13_genomic.fna
        06/15/2019 02:37 AM 3,313,087,324 q
        01/07/2018 05:26 PM 191,644 Satanichi_GCC730_64bit.exe
        06/11/2020 09:16 AM 198,144 Satanichi_ICL150_64bit.exe

        As you can see below, the same file is loaded differently into malloc-ed pool:

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_GCC730_64bit.exe q w 20 888 i
        ...
        Allocating Source-Buffer 3,159 MB ...
        Allocating Target-Buffer 3,191 MB ...
        Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xc1d4,3f7f
        ...
        D:\Satanichi_aka_Nakamichi_2020-Jun-09>

        D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_ICL150_64bit.exe q w 20 888 i
        Allocating Source-Buffer 3,159 MB ...
        Allocating Target-Buffer 3,191 MB ...
        Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x81bd,fe4b
        ...
        D:\Satanichi_aka_Nakamichi_2020-Jun-09>

        If you need more info, will add it...
        Very much I would like to know what causes this anomaly/bug.

        Georgi


        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mingw-w64/bugs/855/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

         

        Related

        Bugs: #855

        • Sanmayce

          Sanmayce - 2020-10-03

          Thanks for the fopen64 workaround, but the issue remains unaddressed!

          So maximal size of read per attempt is on 32-bit 2^32-1, and on 64-bit 2^64-1 bytes

          Obviously, not in practice, so in order to avoid problems documentations should state this issue.

           

Log in to post a comment.