Menu

#2802 Memory error in arrays passed as function parameters

closed-fixed
nobody
None
2025-06-04
2025-05-24
No

When an array is passed as a function parameter, it is passed by reference. If a global variable holding the array referenced by the function parameter is overwritten during the function evaluation, it results in the array pointer being freed, leading to an “accessing freed memory” error. This is a variation of Bug#2798, but since the fix requires a significantly different mechanism, I have created it as a separate issue. I believe this issue may be the last remaining case of memory errors caused by overwriting variables that store arrays.

The following script demonstrates this issue. The array stored in variable "A" is passed to the function parameter "x", and "A" is overwritten during the function's execution:

function $print(var) <<end
  print var
end

f(x) = (A=1, $print(x), x)
#        ^ array entity refered by dummy variable x is freed at here.

array A[10]
print f(A)

On my macOS environment, this script triggers a memory error such as a segmentation fault once every few times. There may be environments where this error doesn't occur visibly, making it a latent bug.

To prevent this issue, I have devised the following mechanism:

  1. Introduce "reference counting" for arrays. When an array is created, its reference count is initialized to 0.

  2. The reference count is incremented when an array is assigned to a function parameter, and decremented when the function scope is exited. This ensures that at the top level, all arrays maintain a reference count of 0.

  3. During function evaluation, if a variable that already holds an array is about to be overwritten, and the original array's reference count is greater than 0, an error should be raised and execution shoud be interrupted.

  4. If execution is interrupted by an error, all arrays within user-defined variables will have their reference counts reset to 0 as part of the cleanup process.

A trial patch implementing this mechanism is attached.

Key points of the patch:

  • The reference count was somewhat forcibly added to the struct value using a union mechanism. It’s worth discussing whether this approach is appropriate.

  • Since array overwrites can occur both inside functions and function blocks, reference count handling must be trapped in both contexts.

  • The error cleanup logic follows the same pattern as the function reference count handling.

I would be grateful if you could consider it.

As a supplementary note, once this mechanism is in place, it opens the possibility for enhancing function blocks to support pass-by-reference for arrays and for allowing the nested arrays.

1 Attachments

Discussion

  • Ethan Merritt

    Ethan Merritt - 2025-05-25

    Big holiday weekend here. I won't have time to seriously look at this until later this week. Only preliminary comments at this point.

    1) At first glance the change to f_calln() looks wrong. It loops over N dummy parameters checking each one to see if it is an array, but instead of incrementing the reference count for that array it always increments the count for dummy_values[0] rather than dummy_values[i]. Typo?

    2) I can reproduce the memory corruption from the sample script in the comment, but only when I use an executable built from the current tip of 6.1 (commit e251220c). I don't see any corruption when I run the script in gnuplot 6.0.2 nor when I run it against a 6.1 executable built prior to your recent changes (commit 2789beab). So I am wondering whether this indicates a problem with one of the earlier commits rather than requiring new code.

    3) This approach is similar to what I was working on in parallel to your earlier patches, except that I was trying to keep the reference count in the udvt_entry structure rather than in the array content. That added an extra level of reference chasing, but allowed the code in f_assign() and friends to get back to the current state of the array by name. I stopped working on that when the line-count of changes exceeded that of your already-working patches, but now I'm thinking that I should go back and see if we were touching all the same places in the original code.

     
    • Hiroki Motoyoshi

      Thank you for your reply.

      1)

      You're right, my apologies — they are typo.

      2)

      It's natural that the script in the original post does not trigger a memory error on versions prior to 6.0.2. This is because the issue in commit 4e3d9cc8 ("invalid stack underflow") causes evaluation to stop before reaching the problem described in this issue.
      Therefore, the original script only runs correctly starting from commit 4e3d9cc8 or later.

      Below is another version of the script that works even with versions prior to 6.0.2:

      function $setup() <<end
        A=1
      end
      
      function $print(var) <<end
        print var
      end
      
      f(x) = ($setup(), $print(x), x)
      
      array A[10]
      print f(A)
      

      3)

      I was hesitant to add a new member directly to udvt_entry. However, if that approach (udvt_entry) is acceptable, it would actually be preferable.

      Currently, the issue with passing by reference only arises with arrays, but there's a possibility that other types of reference-passed objects might be introduced in the future.
      Therefore, a more general approach (udvt_entry) would be desirable.

       

      Last edit: Hiroki Motoyoshi 2025-05-26
      • Ethan Merritt

        Ethan Merritt - 2025-05-30

        I have gone ahead and reverted the earlier series of changes that detected array corruption by comparing the address of the value array with a new set that uses an interlock field to prevent this kind of corruption from happening. Your fix for this bug (2802) then works after modifying it to use the reference count via
        array[0].array_header->udv_refcount.

        More description in the commit comments.

        I think it now works and covers all the cases described in bugs 2798 and 2802. If/when reference counting seems useful for variable types other than ARRAY, this should make it very easy to implement.

        But I am not comfortable taking this large a change back into the stable version immediately. So 6.0.3 will just come with a note that memory corruption is possible if a named array is overwritten in a user function or function block.

         
        • Hiroki Motoyoshi

          Thank you for the new fix.

          The new reference counting and interlocking using LOCK/UNLOCK look promising. I hadn't thought of the LOCK/UNLOCK approach myself. I think this mechanism could also be useful for enabling pass-by-reference for arrays in function block parameters.

          This may be unrelated to function parameters, so I’m not sure if this is the right place to bring it up, but ... let me just point out one pattern that your latest fix doesn't seem to cover:

          function $index() <<end
            A=1
            return 1
          end
          f(x) = (A[$index()]=-999)
          
          array A[3] = [1,2,3]
          
          print f(1)
          
           
          • Ethan Merritt

            Ethan Merritt - 2025-05-30

            Yeah, an array A is not protected during evaluation of the index i needed to retrieve element A[i] inside a function where A is a dummy argument. I actually did look at this but decided that it would require too much rearrangement of the code in
            parse_array_assignment_expression() to be worth it for such a weird case. But I've been through so many versions of the code that maybe I'm remembering analysis of an earlier version and it would be less painful now.

            I'll throw in another comment that isn't really relevant to these bugs. I think you mentioned somewhere earlier in the discussions that the restriction "no nested arrays" could be relaxed. That restriction was a design decision rather than an implementation issue. I really didn't want people to use nested arrays to dummy up multi-dimension matrices. It would be so tempting that the syntax would look just like C language arrays: A[i][j] and so on. And it would even work. But it would be horribly slow. If gnuplot ever needs 2D matrix operations I'd rather add them as a distinct variable type and write shims for accessing BLAS numerical routines.

             
            • Hiroki Motoyoshi

              I'll throw in another comment ....

              It mentioned the possibility of introducing reference counting, not as a Feature Request currently. I don’t intend to perform linear algebra or image processing using arrays in gnuplot either. Rather, I have in mind use cases where each element of an array represents structured records (e.g., [x, y], [time, value, label], etc.).

              That said, assigning one array to another would require new syntax, and using such arrays as input data for plot would also require changes to how column(N) is handled internally. For example,

                array A = [[1,5,10,3], [2,6,8,2], ....]  ### or some other convenient notation
                plot A using 1:2:4:3 with points pt variable lc variable 
              
               
  • Ethan Merritt

    Ethan Merritt - 2025-06-04
    • Status: open --> closed-fixed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.