gnuplot / Feature Requests / #595 Improvement of datablock size retrieval performance

A portable, multi-platform, command-line driven graphing utility

#595 Improvement of datablock size retrieval performance

Milestone: None

Status: closed-accepted

Owner: nobody

Labels: None

Priority: 5

Updated: 2026-01-12

Created: 2026-01-10

Creator: Hiroki Motoyoshi

Private: No

There is a performance issue in gnuplot's datablock implementation that becomes significant with large datablocks.

Datablocks are implemented as arrays of strings. The datablock structure lacks a field to store the array size, so every time the size needs to be determined, it must be retrieved by linearly searching for the NULL terminator and counting elements. This causes N array references just to obtain the size of an N-row datablock. While individual array references are lightweight, their accumulation significantly impacts performance.

The performance problem manifests when writing data to datablocks using set table $datablock or set print $datablock. The performance degradation is negligible for small datablocks but becomes severe with datablocks containing tens of thousands of rows.

Here is a sample script demonstrating the issue:

# Sample size (500*500 = 250,000 data points)
n = 500
set samples n
set isosamples n

print "Generate sample data to TEMPORARY FILE"
  t1 = time(0.0)
  set table "temp.dat"
  splot "++" using 1:2:($1+$2)
  unset table
  t2 = time(0.0)
print strftime("Done.  (%.3Ss)", t2-t1)

print "Generate sample data to DATABLOCK"
  t1 = time(0.0)
  set table $data
  replot
  unset table
  t2 = time(0.0)
print strftime("Done.  (%.3Ss)", t2-t1)

Execution results on a machine with a fast SSD:

Generate sample data to temporary file
Done.  (00.313s)
Generate sample data to datablock
Done.  (09.510s)

Writing to a datablock is 30 times slower than writing to a file. As n increases, this difference becomes even more pronounced because writing an N-row datablock requires N calls to datablock_size(), resulting in O(N²) array references. This issue also affects wrapper libraries in programming languages that use datablocks to pass data from programs to gnuplot.

One solution to this problem is to add a single integer field to the datablock structure to store the row count. Fortunately, the union in struct value appears to have room to accommodate the datablock size. This would make row count retrieval O(1) and improve N-row datablock construction from O(N²) to O(N).

Would you please consider this optimization?

Discussion

Ethan Merritt - 2026-01-11

I had to wrap the character array in a new structure

typedef struct data_array { struct array_header header; char **data; } data_array;

Commit 42849f98
It was not too messy, although it introduces additional steps in initializing the data block. I think I found everywhere that need modification, but I almost missed a spot in the "test palette" code so I worry a bit that I might also have missed some other non-obvious case.

Timing is now equivalent for your test cases:

[~/git/gnuplot/src] ./gnuplot ~/temp/slow.gp
Generate sample data to TEMPORARY FILE
Done. (00.648s)
Generate sample data to DATABLOCK
Done. (00.643s)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ethan Merritt - 2026-01-11
  
  Sorry, that should have been commit 7f0dfb24
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hiroki Motoyoshi - 2026-01-12

Thank you for accepting the feature request and implementing the fix!
I really appreciate it. This will make working with large datablocks much more practical.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2026-01-12

status: open --> closed-accepted

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Improvement of datablock size retrieval performance

A portable, multi-platform, command-line driven graphing utility

Group

Searches

Help

#595 Improvement of datablock size retrieval performance

Discussion