#615 New command "merge"

open
None
5
2015-04-18
2013-03-28
No

Gnuplot can easily plot data from multiple files in the same plot.
plot "A.dat", "B.dat"
It can also combine values from different columns of a single input file.
plot "A.dat" using 1:($2+$3)
A frequently asked question is "how can I combine values from columns in multiple input files?". The answer is that you can't, you must instead process the data outside gnuplot so that all the needed values are in the same file.

The new "merge" command overcomes this long-standing limitation. It allows you to merge coordinate values from component graphs in the previous plot command into a single data block. Here is the documentation entry for this first draft of a patch implementing the merge command:

  • Syntax:
  • merge $DATABLOCK {format "format-specifier"}
    +
  • The merge command extracts certain values from the preceding plot command
  • and stores them in a named data block. X coordinate values are extracted from
  • the those used by the first graph described by the plot command. If subsequent
  • graphs in the plot command do not share these same X coordinate values, it is
  • reported as an error. Y coordinate values are extracted from each graph in the
  • plot command. The resulting named data block holds a row of column headers
  • taken from the graph titles, following by one row for each X coordinate value.
  • The data block holds one column of X coordinate values followed by one column
  • of the corresponding Y coordinate values for each graph in the plot.
    +
  • Coordinate values are written using format "%10g" by default. A different
  • format specifier may be provided in the merge command.
    +
  • Example:
  • plot 'file1.dat' using 1:2 title "A", \
  • 'file2.dat' using 1:2:3 with labels title "B", \
  • 'file3.dat' using 1:4:5 with points pointsize variable title "C"
  • merge $ABC
  • ... other commands ...
  • set key autotitle columnhead
  • plot for [yval=2:4] $ABC using 1:yval
  • set style data yerrorlines
  • plot $ABC using 1:(column("A")):(fabs(column("B")-column("A")))
    +
  • The second plot, using values stored in the data block named $ABC, will contain
  • the three graphs reproducing the same y coordinates displayed in the original
  • plot command. Note that other aspects of the original plot (individual point
  • labels in the second graph, point sizes in the third graph) are not maintained.
    +
  • The third plot illustrates combining information drawn from two different files
  • within a single graph, i.e. column 2 from file1.dat and column 2 from file2.dat.
  • Note that this would not have been possible without a merge operation.

==========

Unresolved issues (please comment):

  • Currently the same x-values must appear in all of the merged plots.
    It seems desirable to allow missing data points. How?

  • If the preceding plot contained both data and function graphs, the merge
    command will probably complain about mis-matched x values. This is an
    example of the above issue. Should it simply ignore functions?

  • Currently the merge command ignores xrange. Should it?

  • Is there a better name for the new command? I considered
    "fill" "keep" "store" "extract" "combine" among other possibilities.

  • The interactive terminals now allow you to toggle individual graphs on/off.
    Should a subsequent "merge" command only merge data from graphs that have
    not been toggled off?

  • My own likely use of this option would require tracking error values associated
    with each Y value. Since this first version of the patch only tracks a single
    value per point, producing a merged data block containing both the Y values and
    the error values is somewhat cumbersome. Should the basic command have an
    option to merge 2 data values from each existing graph?

1 Attachments

Discussion

  • Christoph Bersch

    Hi,

    I had a look at this new merge command, not from the concrete implementation but rather the concept.

    I would have expected a command, which really merges two files before any plotting, akin to the paste command line tool.

    That would be a bit like the stats command which retrieves some data information (e.g. ranges, number of records) which can be used for plotting and which before that was only possible by first plotting e.g. to a table or to /dev/null and then using the gnuplot-internal variables for the next, real plot.

    Maybe the syntax could be something like

    merge $db 'file1' using 1:3 'file2' using 2:4
    plot $db using 1:($2+$3+$4)
    

    That may also be combined with reading datablocks from a file to extend the volatile keyword, see e.g. bug #1233.

    Or what is the reason that merge must be preceeded by a plot command?

     
  • Karl Ratzsch

    Karl Ratzsch - 2013-10-24

    I´d recommend to not use "plot" to load the datasets, but implement a "load" command that creates a matrix variable containing the data, eg.

    load filename1 datamatrix1 using 1:2
    load filename2 datamatrix2 using 1::2

    The matrix variables could contain any number of columns. The merge command then takes each line from matrix one, and checks if the value in the selected column appears in the other matrix. If so, the respective lines are combined in a third matrix:

    merge datamatrix1[0] AND datamatrix2[0] to andmatrix[0]

    (OR/XOR/NOT could then also be implemented )

    If no columns are specified with "merge", then only line numbers are compared. This would resolve most of the issues mentioned. Drawback is that only two datasets can be combined at a time.

    Here´s of course a slippery slope to implement a new matlab/octave. ;-)

     
  • magiccpp

    magiccpp - 2015-04-18

    Currently the same x-values must appear in all of the merged plots.
    It seems desirable to allow missing data points. How?

    see https://github.com/magiccpp1/merge_curves/wiki

    you could utilize the interpolation to calculate it. i.e.
    file dataset1:
    1.2 5.0
    2.3 3.5
    3.8 3.0

    file dataset2:
    0.8 2.0
    1.5 3.0
    2.5 4.0

    output:
    0.8 0.0 2.0 # contribution from 1st array is 0, it is out of x range, from 2nd is 2.0
    1.2 5.0 7.57 #5.0 + 2.0 + (1.2-0.8) * (3.0-2.0)/(1.5-0.8) contribution from 1st array is 5.0, from 2nd is 2.57
    1.5 4.6 7.6 #3.0 + 5.0 + (1.5 - 1.2) * (3.5 - 5.0) / (2.3 - 1.2) contribution from 1st array is 4.6, 2nd is 3.0
    2.3 ...
    2.5 ...
    3.8 3.0 0.0 #contribution from 2nd array is 0, it is out of x range.

     
    Last edit: magiccpp 2015-04-19

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks