gnuplot / Patches / #615 New command "merge"

A portable, multi-platform, command-line driven graphing utility

#615 New command "merge"

Milestone: Version 5

Status: closed

Owner: Ethan Merritt

Labels: None

Priority: 5

Updated: 2023-10-02

Created: 2013-03-28

Creator: Ethan Merritt

Private: No

Gnuplot can easily plot data from multiple files in the same plot.
plot "A.dat", "B.dat"
It can also combine values from different columns of a single input file.
plot "A.dat" using 1:($2+$3)
A frequently asked question is "how can I combine values from columns in multiple input files?". The answer is that you can't, you must instead process the data outside gnuplot so that all the needed values are in the same file.

The new "merge" command overcomes this long-standing limitation. It allows you to merge coordinate values from component graphs in the previous plot command into a single data block. Here is the documentation entry for this first draft of a patch implementing the merge command:

Syntax:
merge $DATABLOCK {format "format-specifier"}
+
The merge command extracts certain values from the preceding plot command
and stores them in a named data block. X coordinate values are extracted from
the those used by the first graph described by the plot command. If subsequent
graphs in the plot command do not share these same X coordinate values, it is
reported as an error. Y coordinate values are extracted from each graph in the
plot command. The resulting named data block holds a row of column headers
taken from the graph titles, following by one row for each X coordinate value.
The data block holds one column of X coordinate values followed by one column
of the corresponding Y coordinate values for each graph in the plot.
+
Coordinate values are written using format "%10g" by default. A different
format specifier may be provided in the merge command.
+
Example:
plot 'file1.dat' using 1:2 title "A", \
'file2.dat' using 1:2:3 with labels title "B", \
'file3.dat' using 1:4:5 with points pointsize variable title "C"
merge $ABC
... other commands ...
set key autotitle columnhead
plot for [yval=2:4] $ABC using 1:yval
set style data yerrorlines
plot $ABC using 1:(column("A")):(fabs(column("B")-column("A")))
+
The second plot, using values stored in the data block named $ABC, will contain
the three graphs reproducing the same y coordinates displayed in the original
plot command. Note that other aspects of the original plot (individual point
labels in the second graph, point sizes in the third graph) are not maintained.
+
The third plot illustrates combining information drawn from two different files
within a single graph, i.e. column 2 from file1.dat and column 2 from file2.dat.
Note that this would not have been possible without a merge operation.

==========

Unresolved issues (please comment):

Currently the same x-values must appear in all of the merged plots.
It seems desirable to allow missing data points. How?
If the preceding plot contained both data and function graphs, the merge
command will probably complain about mis-matched x values. This is an
example of the above issue. Should it simply ignore functions?
Currently the merge command ignores xrange. Should it?
Is there a better name for the new command? I considered
"fill" "keep" "store" "extract" "combine" among other possibilities.
The interactive terminals now allow you to toggle individual graphs on/off.
Should a subsequent "merge" command only merge data from graphs that have
not been toggled off?
My own likely use of this option would require tracking error values associated
with each Y value. Since this first version of the patch only tracks a single
value per point, producing a merged data block containing both the Y values and
the error values is somewhat cumbersome. Should the basic command have an
option to merge 2 data values from each existing graph?

1 Attachments

merge_datablock_27mar2013.patch

Discussion

Christoph Bersch - 2013-08-02

Hi,

I had a look at this new merge command, not from the concrete implementation but rather the concept.

I would have expected a command, which really merges two files before any plotting, akin to the paste command line tool.

That would be a bit like the stats command which retrieves some data information (e.g. ranges, number of records) which can be used for plotting and which before that was only possible by first plotting e.g. to a table or to /dev/null and then using the gnuplot-internal variables for the next, real plot.

Maybe the syntax could be something like

merge $db 'file1' using 1:3 'file2' using 2:4 plot $db using 1:($2+$3+$4)

That may also be combined with reading datablocks from a file to extend the volatile keyword, see e.g. bug #1233.

Or what is the reason that merge must be preceeded by a plot command?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Ratzsch - 2013-10-24

I´d recommend to not use "plot" to load the datasets, but implement a "load" command that creates a matrix variable containing the data, eg.

load filename1 datamatrix1 using 1:2
load filename2 datamatrix2 using 1::2

The matrix variables could contain any number of columns. The merge command then takes each line from matrix one, and checks if the value in the selected column appears in the other matrix. If so, the respective lines are combined in a third matrix:

merge datamatrix1[0] AND datamatrix2[0] to andmatrix[0]

(OR/XOR/NOT could then also be implemented )

If no columns are specified with "merge", then only line numbers are compared. This would resolve most of the issues mentioned. Drawback is that only two datasets can be combined at a time.

Here´s of course a slippery slope to implement a new matlab/octave. ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

magiccpp - 2015-04-18

Currently the same x-values must appear in all of the merged plots.
It seems desirable to allow missing data points. How?

see https://github.com/magiccpp1/merge_curves/wiki

you could utilize the interpolation to calculate it. i.e.
file dataset1:
1.2 5.0
2.3 3.5
3.8 3.0

file dataset2:
0.8 2.0
1.5 3.0
2.5 4.0

output:
0.8 0.0 2.0 # contribution from 1st array is 0, it is out of x range, from 2nd is 2.0
1.2 5.0 7.57 #5.0 + 2.0 + (1.2-0.8) * (3.0-2.0)/(1.5-0.8) contribution from 1st array is 5.0, from 2nd is 2.57
1.5 4.6 7.6 #3.0 + 5.0 + (1.5 - 1.2) * (3.5 - 5.0) / (2.3 - 1.2) contribution from 1st array is 4.6, 2nd is 3.0
2.3 ...
2.5 ...
3.8 3.0 0.0 #contribution from 2nd array is 0, it is out of x range.

Last edit: magiccpp 2015-04-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-10-02

status: open --> closed

Group: --> Version 5
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

New command "merge"

A portable, multi-platform, command-line driven graphing utility

Group

Searches

Help

#615 New command "merge"

Discussion