Menu

#212 autogen fails with -D_FORTIFY_SOURCE3

autogen
closed
nobody
None
1
2022-04-19
2022-04-01
No

Fails here:

[  259s] >stress-run_ag> MALLOC_CHECK_=3
[  259s] >stress-run_ag> /home/abuild/rpmbuild/BUILD/autogen-5.18.16/agen5/autogen --trace=every '--trace-out=>>stress-aglog-x-30963.log' -L/home/abuild/rpmbuild/BUILD/autogen-5.18.16/autoopts/tpl -b stress -
[  259s] *** buffer overflow detected ***: terminated
[  259s] ./defs: line 200: 31075 Aborted                 MALLOC_CHECK_=3 ${AGexe} ${opts} "$@"
[  259s] >stress-> test 134 -eq 0
[  259s] >stress-> failure autogen failed

fails due to defLoad.c:

    rem_sz = data_sz+4+sizeof(*base_ctx);
    base_ctx = (scan_ctx_t *)AGALOC(rem_sz, "file buf");
    memset(VOIDP(base_ctx), 0, rem_sz);
    base_ctx->scx_line = 1;
    rem_sz = data_sz;

    /*

     *  Our base context will have its currency pointer set to this
     *  input.  It is also a scanning pointer, but since this buffer
     *  is never deallocated, we do not have to remember the initial
     *  value.  (It may get reallocated here in this routine, tho...)
     */
    data =
        base_ctx->scx_scan =
        base_ctx->scx_data = (char *)(base_ctx + 1);
    base_ctx->scx_next     = NULL;

...    for (;;) {
        size_t rdct = fread(VOIDP(data), (size_t)1, rem_sz, fp);

as seen base_ctx is allocated with size rem_sz , but then fread is called for data (which is base_ctx + 1). Thus it's off by one error.

Discussion

  • Siddhesh Poyarekar

    That use definitely looks a bit odd; (char *)(base_ctx + 1) is actually not just 1 byte off, it's sizeof (scan_ctx_t) off, leading to fread going beyond bounds before sizeLeft is 0.

    As an aside, it seems like the code is looking to use scan_ctx_t with a flex array at the end, so it might make sense to simply add a char data[]; at the end of that struct.

     

    Last edit: Siddhesh Poyarekar 2022-04-01
  • Martin Liška

    Martin Liška - 2022-04-11

    Our analysis is not correct:

        rem_sz = data_sz+4+sizeof(*base_ctx);
        base_ctx = (scan_ctx_t *)AGALOC(rem_sz, "file buf");
        memset(VOIDP(base_ctx), 0, rem_sz);
        base_ctx->scx_line = 1;
        rem_sz = data_sz;
    

    Note rem_sz is assigned data_sz, then data = (base_ctx + 1) means that data has room for rem_sz + 4. Thus fread(VOIDP(data), (size_t)1, rem_sz, fp); should be fine. Let me isolate a test-case.

     
  • Martin Liška

    Martin Liška - 2022-04-11

    Please close this issue.

     
  • Bruce Korb

    Bruce Korb - 2022-04-11
    • status: open --> closed
     
  • Bruce Korb

    Bruce Korb - 2022-04-11

    This exposes a GCC bug

     
  • Bruce Korb

    Bruce Korb - 2022-04-11

    P.S. the GCC bug's cause might be the temporary use of rem_sz for the allocation size. In the GIT sources, I've changed those 3 lines to 5, but I've not compiled or tested yet.

        {
            size_t sz = data_sz+sizeof(long)+sizeof(*base_ctx);
            base_ctx = (scan_ctx_t *)AGALOC(sz, "file buf");
            memset(VOIDP(base_ctx), 0, sz);
        }
    
     

    Last edit: Bruce Korb 2022-04-11
  • Siddhesh Poyarekar

    Thanks to Martin's reduced reproducer in the gcc bug[1], I can now see what's going on. The trouble is that gcc is unable to associate the reallocation in:

    p = AGREALOC(VOIDP(base_ctx), dataSize + 4 + sizeof(*base_ctx),
                 "expand f buf");                                  
    

    with pzData when p == base_ctx because the pointer is not explicitly updated. This happens to work with the glibc malloc implementation but strictly speaking, the comparison if (p != base_ctx) { is invoking undefined behaviour as the C standard states that the pointer to an object (in this case, base_ctx) has an indeterminate value beyond its lifetime and hence using it (even its numeric value) may invoke undefined behaviour.

    The most correct fix for this should be this:

    diff --git a/agen5/defLoad.c b/agen5/defLoad.c
    index 594f609f..4fb7ff76 100644
    --- a/agen5/defLoad.c
    +++ b/agen5/defLoad.c
    @@ -527,16 +527,12 @@ read_defs(void)
                              "expand f buf");
    
                 /*
    
    -             *  The buffer may have moved.  Set the data pointer at an
    -             *  offset within the new buffer and make sure our base pointer
    -             *  has been corrected as well.
    +        *  Set the data pointer at an offset within the new buffer and
    +        *  make sure our base pointer has been corrected as well.
                  */
    -            if (p != base_ctx) {
    -                p->scx_scan = \
    -                    p->scx_data = (char *)(p + 1);
    -                pzData = p->scx_data + dataOff;
    -                base_ctx = p;
    -            }
    +       p->scx_scan = p->scx_data = (char *)(p + 1);
    +       pzData = p->scx_data + dataOff;
    +       base_ctx = p;
             }
         }
    

    since it avoids the undefined behaviour and reliably updates pzData and base_ctx but if this code is performance-sensitive, minimally updating base_ctx and pzData unconditionally should be sufficient. The undefined behaviour would continue to exist though and there's no guarantee that future changes to compilers won't stumble on this.

    All that said though, I know that this is a fairly common micro-optimization idiom to avoid updating pointers when the reallocated block hasn't moved. I'll probably try adding in some more diagnostics to gcc 13 to make at least a small subset of this kind of idiom work but I don't know how far I can take it.

    [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105217

     
  • Bruce Korb

    Bruce Korb - 2022-04-19

    If the putative undefined behavior actually becomes programmatically incorrect, then the compiler/OS implementation will be grossly incorrect. Meanwhile, this is only ever used if the defs are being read from a pipe or other source wherein you cannot determine its size. It should be pretty rare that it's ever used and when it's used, it will likely squander several microseconds. IOW, I'll make your suggested change. :) thank you.

     
  • Martin Liška

    Martin Liška - 2022-04-19

    Thanks for the applied commit.

    I think it contains a small typo:

    diff --git a/agen5/defLoad.c b/agen5/defLoad.c
    index 99258cd6..c3f13254 100644
    --- a/agen5/defLoad.c
    +++ b/agen5/defLoad.c
    @@ -470,7 +470,7 @@ read_defs(void)
          */
         {
             size_t sz = data_sz + sizeof(long) + sizeof(*base_ctx);
    
    -        base_ctx = (scan_ctx_t *)AGALOC(rsz, "file buf");
    +        base_ctx = (scan_ctx_t *)AGALOC(sz, "file buf");
             memset(VOIDP(base_ctx), 0, sz);
         }
         base_ctx->scx_line = 1;
    

    as reported by:

    [   15s] defLoad.c:468:41: error: 'rsz' undeclared (first use in this function); did you mean 'sz'?
    [   15s]   468 |         base_ctx = (scan_ctx_t *)AGALOC(rsz, "file buf");
    [   15s]       |                                         ^~~
    
     
  • Bruce Korb

    Bruce Korb - 2022-04-19

    Thank you. I can't get it to build myself. The underlying infrastructure has been "improved" to the point where I have to adapt to the changes. Fixed now.

     

Log in to post a comment.

MongoDB Logo MongoDB