Menu

#595 Improvement of datablock size retrieval performance

None
closed-accepted
nobody
None
5
2026-01-12
2026-01-10
No

There is a performance issue in gnuplot's datablock implementation that becomes significant with large datablocks.

Datablocks are implemented as arrays of strings. The datablock structure lacks a field to store the array size, so every time the size needs to be determined, it must be retrieved by linearly searching for the NULL terminator and counting elements. This causes N array references just to obtain the size of an N-row datablock. While individual array references are lightweight, their accumulation significantly impacts performance.

The performance problem manifests when writing data to datablocks using set table $datablock or set print $datablock. The performance degradation is negligible for small datablocks but becomes severe with datablocks containing tens of thousands of rows.

Here is a sample script demonstrating the issue:

# Sample size (500*500 = 250,000 data points)
n = 500
set samples n
set isosamples n

print "Generate sample data to TEMPORARY FILE"
  t1 = time(0.0)
  set table "temp.dat"
  splot "++" using 1:2:($1+$2)
  unset table
  t2 = time(0.0)
print strftime("Done.  (%.3Ss)", t2-t1)

print "Generate sample data to DATABLOCK"
  t1 = time(0.0)
  set table $data
  replot
  unset table
  t2 = time(0.0)
print strftime("Done.  (%.3Ss)", t2-t1)

Execution results on a machine with a fast SSD:

Generate sample data to temporary file
Done.  (00.313s)
Generate sample data to datablock
Done.  (09.510s)

Writing to a datablock is 30 times slower than writing to a file. As n increases, this difference becomes even more pronounced because writing an N-row datablock requires N calls to datablock_size(), resulting in O(N²) array references. This issue also affects wrapper libraries in programming languages that use datablocks to pass data from programs to gnuplot.

One solution to this problem is to add a single integer field to the datablock structure to store the row count. Fortunately, the union in struct value appears to have room to accommodate the datablock size. This would make row count retrieval O(1) and improve N-row datablock construction from O(N²) to O(N).

Would you please consider this optimization?

Discussion

  • Ethan Merritt

    Ethan Merritt - 2026-01-11

    I had to wrap the character array in a new structure

       typedef struct data_array {
                struct array_header header;
                char **data;
       } data_array;
    

    Commit 42849f98
    It was not too messy, although it introduces additional steps in initializing the data block. I think I found everywhere that need modification, but I almost missed a spot in the "test palette" code so I worry a bit that I might also have missed some other non-obvious case.

    Timing is now equivalent for your test cases:

    [~/git/gnuplot/src] ./gnuplot ~/temp/slow.gp
    Generate sample data to TEMPORARY FILE
    Done. (00.648s)
    Generate sample data to DATABLOCK
    Done. (00.643s)

     
    • Ethan Merritt

      Ethan Merritt - 2026-01-11

      Sorry, that should have been commit 7f0dfb24

       
  • Hiroki Motoyoshi

    Thank you for accepting the feature request and implementing the fix!
    I really appreciate it. This will make working with large datablocks much more practical.

     
  • Ethan Merritt

    Ethan Merritt - 2026-01-12
    • status: open --> closed-accepted
    • Group: -->
     

Log in to post a comment.