gnuplot / Bugs / #2797 function block: SEGV occurs due to invalid context call in a function block

Ethan Merritt - 2025-05-15

Cases 1, 2, and 5 are dumb mistakes on my part. In each case the code sets the interlock flag inside_plot_command correctly. But it stupidly forgets to check whether it was already set, which was the whole point of the interlock. So as written it only protected in one direction; e.g. it would catch an attempt to plot inside set palette but didn't catch the converse set palette inside plot.

Case 3 - I'm still looking at this one.

Case 4
On your machine that segfaults from exit error, does it also segfault if that command is issued directly? I suspect that has nothing to do with function blocks, but rather a failure to handle a null error message string.

I don't get a segfault here, nor do I catch any problem in valgrind or gdb. However passing a NULL message string to vsnprintf() is very suspect. I'm not sure whether it is sufficient to check for a missing error message in exit_command() or whether it is better to worry about other call sites as well and put an explicit test for NULL in int_error().

Last edit: Ethan Merritt 2025-05-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hiroki Motoyoshi - 2025-05-15
  
  Indeed, Case 4 caused a SEGV independently of the function block, simply due to exit error. It seems the issue is that a NULL value is being passed to "int_error(NO_CARET, try_to_get_string());". I think it would be better if "exit_command()" checked the return value of "try_to_get_string()" and for example, passed an empty string ("") if it is NULL. Also, since the second argument of int_error() is a format string, I believe it should be restricted to a string format such as "%s". Otherwise, unexpected behavior may occur if the message string contains format specifiers—for example, exit error "%p %g %i %lf".
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ethan Merritt - 2025-05-15
    
    It is intentional that the message string can consist of a format specifier with passed paramenters . That's the point of using stdarg and vsnprintf. Here is an example from internal.c
    
    int_error(NO_CARET, "function %s requires %d variables", udf->udf_name, udf->dummy_num);
    
    On my machine the va_start() implementation seems to accept a NULL format with no complaint, but I don't see an explicit statement about that in the man page. So yeah, it's probably best to check in both the caller and in int_error.
    
    Last edit: Ethan Merritt 2025-05-16
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Hiroki Motoyoshi - 2025-05-15
      
      The warn command has also uncontrolled format string problem.
      For example, warn "%p %g %i %lf". This also comes from giving input from the user to the format string in "warn_command()" and should be corrected.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Ethan Merritt - 2025-05-16
        
        I think I understand your point now. The problem is not with int_error() or int_warn() per se, but the commands warn "message" and exit error "message" should act via int_error(caret, "%s", message) rather than int_error(caret, message). Right?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Hiroki Motoyoshi - 2025-05-16
        
        You're right!
        
        There is another place where the format string for the standard C "*printf()" functions is directly specified by the user:
        
        set mouse mouseformat
        
        For example, on my macOS environment, the following script results in a "Bus error: 10". Since the implementation of "*printf()" may differ between operating systems or development environments, this may not reproduce on other systems.
        
        NARGS=10000 set term wxt fmt = ""; do for [k=1:NARGS] {fmt = fmt."%x" } set mouse mouseformat fmt plot 0 pause -1
        
        Letting the user provide raw format strings for "*printf()" introduces a potential vulnerability and should be addressed.
        
        In the case of set mouse mouseformat, I believe it would be safer — if possible — to internally utilize the f_sprintf() routine. This function is aware of the number of arguments and is more robust against malformed format strings, since the formatting process is kept under control.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hiroki Motoyoshi - 2025-05-15

Here is the LLDB output from the PC (macOS on arm64) where a SEGV occurs in Case 3.

Note: On other PC (macOS on intel x86_64 built with MacPort library), case 3 doesn't cause SEGV.

case3_lldb_results.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hiroki Motoyoshi - 2025-05-16
  
  In conclusion, Case 3 is also a bug (accessing freed memory).
  
  This occurs because f(x) is redefined during its own evaluation, causing the original action table (AT) of f(x) to be deallocated via "free_at()".
  
  With typical "free(3)" behavior, the memory block is often left unchanged immediately after being freed. As a result, the evaluation of the original f(x)'s AT may continue without immediate issues, even though it has already been deallocated.
  
  The reason this bug becomes apparent on macOS is:
  
  The "free(3)" function fully zeroes many blocks immediately after in macOS 13.
  
  Here is the man page for "free(3)" on macOS 15.4.1:
  https://man.freebsd.org/cgi/man.cgi?query=malloc\&sektion=3\&manpath=macOS+15.4.1
  See the description of "MallocZeroOnFree" in the ENVIRONMENT section.
  
  You can force this bug to surface (even on platforms other than macOS) by modifying "real_free_at()" in "eval.c" as follows:
  
  diff --git a/src/eval.c b/src/eval.c index cf096447d..0f02641d5 100644 --- a/src/eval.c +++ b/src/eval.c @@ -848,6 +848,8 @@ real_free_at(struct at_type *at_ptr) for (i=0; i<at_ptr->a_count; i++) { struct at_entry *a = &(at_ptr->actions[i]); free_action_entry(a); + /* Let's try assigning 0 (PUSH) to a->index, assuming that free(at_ptr) will be called later. */ + a->index = 0; } free(at_ptr);
  
  Although direct function redefinition during evaluation is not normally possible, it becomes feasible when going through a function block. Some form of restriction is needed to prevent this.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ethan Merritt - 2025-05-16
    
    The thing that is confusing me is that the plot does switch over to the new function for points with x>0 (on a system where it doesn't segfault of course). That means the program is calling the new function, not the old function, once it gets past the point where the substitution occurs. How does that happen? So far as I can see the plot structure stores the function pointer only once at the beginning of the plot (plot2d.c:3762). So it should be using that old function throughout, not the new one.
    
    /* Used to generate samples and to optimize watchpoints */ this_plot->plot_function.at = at_ptr;
    
    As to whether this really needs to be fixed, I am not sure. I don't think the user can reasonably expect it to be possible to redefine a function while inside evaluation of that same function. I do think it is useful and reasonable to define functions inside a function block, however, so I don't like the alternative of forbidding that altogether.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Hiroki Motoyoshi - 2025-05-16
      
      Case 3 may have caused some confusion as an example.
      I believe the following simpler reproduction script is easier to understand:
      
      (3')
      
      function $setup() <<end f(x) = 1 return 0 end f(x) = 1 + $setup() print f(1)
      
      I hardly imagine anyone else would write such a script, but it is a new behavior brought about by the introduction of the function block.
      
      It is not necessary to prohibit all function definitions inside a function block.
      What should be prohibited, I believe, is the redefinition of a function with the same name while that function is being executed.
      
      Last edit: Hiroki Motoyoshi 2025-05-16
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Ethan Merritt - 2025-05-16
        
        I can think of two conceptual approaches to deal with this if it really is necessary, but they are both non-trivial.
        
        option 1) Instead of really freeing the at_type action table structure of a function in routine free_at(), it would save the entire structure somewhere for actual deletion later. That would mean adding a bunch of bookkeeping routines to manage this collection of dead function structures, all for the extremely rare case of a redefinition that could lead to use-after-free.
        
        option 2) When evaluate_at() is called to execute a function it could walk recursively through the entire list of actions to be performed looking for CALL or CALLN. It could then inspect the argument to that call to find the function name. A model for this is the code in disp_at(). Then what? Set a flag (might have to add a new field somewhere) to show that function must not be redefined will the evaluation proceeds? Then somehow clear all those flags afterwards.
        
        Both options seem expensive. The cost would be incurred for every function evaluation while the benefit is seen only in handling the error path for something that can be documented as not permitted. Can you think of a simpler approach?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Hiroki Motoyoshi - 2025-05-17
        
        Option 3)
        
        Although this would expand the scope of restriction, I would like to propose disallowing function definitions inside a function block if that block may be called from within a function.
        
        To implement this behavior, I suggest introducing two flags (tentative names):
        
        A flag in the fblock structure: called_from_function
        
        A global variable: prohibit_function_definition
        
        These names are provisional — please feel free to suggest better, more appropriate names.
        
        The processing logic using these flags would be as follows:
        
        During function definition
        If a function block is encountered, set the called_from_function flag of that function block.
        
        During function block evaluation
        If the called_from_function flag is set:
        
        Before evaluation, set the global variable prohibit_function_definition to true.
        
        After evaluation, restore prohibit_function_definition to its original state.
        
        When defining a function via define()
        If prohibit_function_definition is set, terminate with an error.
        
        The main concern is whether the global variable prohibit_function_definition can be reliably set and restored, especially in the presence of nested or recursive evaluations.
        
        This approach does not degrade the performance of normal function evaluation.
        
        As for the scope of restriction, I believe the impact is minimal, since I cannot think of any practical use case where defining a function inside a function block that might be called from within another function would be necessary.
        
        Whether it is acceptable that the prohibition rules within a function block change dynamically depending on how the block is used (i.e., whether it is called from a function).
        
        Last edit: Hiroki Motoyoshi 2025-05-17
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Ethan Merritt - 2025-05-17
        
        We crossed paths. I came up with a different option 3 and started coding it. I think it's simpler than yours although I haven't yet convinced myself it is 100% foolproof.
        
        add a field at->recursion_depth, initialized to zero when the function structure is allocated
        
        every time the function is invoked by execute_at increment the depth before stepping through the action table and decrement it afterwards.
        
        the only place that actually looks at this field is define. If there is already a function with that name and its depth counter is > 0, then issue an error rather than continuing with the definition.
        
        This works for the two sample cases you provided.
        It still needs a cleanup routine that resets the flag for all defined functions if there is an error, since the error might have occurred during function evaluation.
        edit: added the cleanup function
        
        Last edit: Ethan Merritt 2025-05-17
        
        0001-prevent-use-after-free-caused-by-redefining-an-activ.patch
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Hiroki Motoyoshi - 2025-05-17
        
        Thank you.
        Your approach is preferable because it uses only the function itself, independent of the function block.
        
        I’ve also found that redefining or undefining an array within a function block in the array initialization causes the segmentation error (accessing freed memory) with a similar logic. However, this might need to be reported as a separate ticket.
        
        Last edit: Hiroki Motoyoshi 2025-05-17
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2025-06-04

Status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

function block: SEGV occurs due to invalid context call in a function block

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2797 function block: SEGV occurs due to invalid context call in a function block

Discussion