Feature request:
Sometimes it might be necessary, to apply some operations on two datasets which have the same or similar range, but do not have the same x-values or x-value spacing, for example, adding, subtracting, multiplying or dividing two spectra.
You would have to resample the data:
Of course you can do this with external tools and you can do this even with a cumbersome gnuplot workaround (see https://stackoverflow.com/q/54362441).
However, since there are already the options for smooth
: csplines
, acsplines
, mcsplines
and bezier
,
would it be eventually possible (without too much effort) to add another option linear
?
Thanks for consideration.
Code: (just for illustration of the existing smooth options)
### feature request: linear interpolation/resampling reset session $Data <<EOD 1 1 2 7 3 5 10 3 EOD set samples 21 set table $csplines plot $Data u 1:2 w l smooth csplines set table $acsplines plot $Data u 1:2 w l smooth acsplines set table $mcsplines plot $Data u 1:2 w l smooth mcsplines set table $bezier plot $Data u 1:2 w l smooth bezier unset table plot $Data u 1:2 w lp lw 3 lc 0 pt 7 ti "Data", \ $bezier u 1:2 w lp pt 7 ti "bezier", \ $csplines u 1:2 w lp pt 7 ti "csplines", \ $mcsplines u 1:2 w lp pt 7 ti "mcsplines", \ $acsplines u 1:2 w lp pt 7 ti "acsplines", \ ### end of code
I think this Feature Request is reasonable. I would like to have this feature, so I implemented it.
A patch file is attached.
Any comments are welcome.
About 'smooth linear' filter
I imagine 'smooth linear' is rarely used for smoothing during drawing. Instead, as the title of this Feature Request suggests, it will most often used for data resampling. Therefore, I did not position 'smooth linear' as a derivative of existing spline interpolations, and I implemented it consciously so that it is easy to use as a resampling tool.
Use cases, not necessarily limited to linear interpolation, include
Input data
The data in the first column (x-axis) must increase or decrease monotonically. During interpolation, if non-monotonic data is detected, the waring message "Non-monotonic x data was found in 'smooth linear'" is produced. In that case, the data processing will continue, but the output will not be as expected.
If "filledcurves between" is selected as the plotting style, the data in the third column will be interpolated as well as the second column (See 'example4.gp').
Specifying the sampling range
The 'smooth linear' filter samples a given range of data at equal intervals. The sampling range can be specified according to the following rules,
This rule is different from the behavior of other splines interpolations. From the document (help smooth),
This behavior is convenient for drawing, but not useful as a tool for resampling. If 'smooth linear' followed this rule, I would not use it. Here is why I would like to have such different rule: For any input data, output resampled with the same sampling range and the same number of samples will always contain the same number of rows and can be compared row by row (See 'example1.gp').
Please check the following script to see how it works.
Handling of NaN and blank lines in data.
If you want to fill missing values (NaN) with linear interpolation, use "set datafile missing NaN" (See 'example5.gp').
Abbreviation
I boldly made the abbreviation of "smooth linear" to "smooth l". If allowed, leave it as it is.
Sample scripts
The following sample scripts are also attached.
example1.gp : Comparison of series with different sampling intervals
example2.gp : Fill in some intervals with filledcurves
example3.gp : Generate data for yticlabel() instead of 'set link' if the inverse function is not analytical
example4.gp : Smooth path on 'with lines'
example5.gp : Fill in missing values
Revised patch file (v2) with the following changes
I am dubious about the mathematical validity of using linear sampling to achieve the stated aim of the request: "adding, subtracting, multiplying or dividing two spectra".
This is pushing the boundary of my area of expertise, but so far as I know the proper way to do this is via convolution. This requires calculating the Fourier transform of each spectrum and then operating in the dual space. Now it is true that calculating the transform using an FFT of uniformly sampled data makes this easy, but if your data is not uniformly sampled then linear interpolation onto a fixed grid is not a good way to proceed. This is where I hit the limit of my own knowledge of best practices, but I refer you to this related question: https://scicomp.stackexchange.com/q/593/36096
I believe that gnuplot could be used to implement one of the methods referred to there, but I would expect it to be easier to use a more specialized signal processing package instead.
Now it may be that resampling by linear interpolation does have valid uses, but before looking at any code it would help me a lot if someone could suggest pointers to reference material, textbooks, tutorials, journal articles, wiki pages, whatever, that document what problems it would properly be applied to. Ideally there would be real-world test cases that any new code could be run against to validate the implementation.
Thank you for your comment.
When I read the original post, the feature I wanted was not the ability of arithmetic computation between two spectra, but simply linear resampling. I may have posted this in the wrong place. I did not want to discuss in depth with you about the arithmetic computation between two spectra.
As for linear resampling, it is the most basic interpolation of observed data. To begin with, drawing discrete data with lines is itself linear interpolation. My 'sample3.gp' is an realistic example of what I consider the most important in this Feature. The idea is to get an inverted grid of monotonic observed data. I prepared pseudo data for this example due to licensing issues, but the process would be the same with real data.
Last edit: Hiroki Motoyoshi 2023-10-01
My starting point is that gnuplot should help you to visualize your data and to present it clearly to others. From that perspective it is important to show the actual data points; I do not like the idea of resampling, especially when there are only a few data points.
Example 1:
I would prefer to plot it this way:
Example 2:
There are several ways to plot this. One convenient one is to use your recent
hsteps
style:Example 3:
If the y1 and y2 scales are not analytically related then the plot is improper; a plotted point cannot be correctly placed on both axes simultaneously. A different representation is needed to present such data. Here is one possibility, using the same data. Since it is hard to fit much information in densely spaced labels, this might be a case where hypertext labels would be preferred.
I do not understand the intent of example 4. Example 5's "automatically filling in missing data" is exactly what I feel gnuplot should not do. I think this is an example of why resampling is not a good idea.
Thank you for your comments.
Does this mean that this is a general opinion, not limited to 'smooth linear'?
I understand that if the data is unequally spaced or the sampling is out of phase with the original equally spaced data, the "smooth cspline" will not accurately indicate the data points.
In meteorology, there are situations where data with different sampling rates need to be integrated or unevenly spaced data is compared on a fixed grid. For the preliminary analysis in such cases, the emphasis in the analysis is often on capturing differences, trends, periods, and rates of change from the figures rather than the numerical precision of interpolation. Resampling (interpolation) is employed in such scenarios, and linear interpolation has become one of the convenient tools for resampling. I believe it is an important role of gnuplot to support such analysis.
Thank you. It certainly drew beautifully. I would not have thought of this method.
I see your point, and I slightly differ in my perspective regarding the various uses of gnuplot. gnuplot can serve not only for creating graphs for publication but also for generating numerous quick visualizations and conducting preliminary analyses.
To give some background on sample3, the data dealt with in the sample3 are meteorological observations of the upper atmosphere by radiosondes. Both air pressure and altitude are observed values, and theoretically, they have a monotonic relationship. The vertical profile of the air temperature is the main objective of sample 3, and the vertical axis may be either the air pressure or the altitude. Even if it is a linear interpolated value, the altitude value is displayed because we want to know it as a reference value. I don't think such a display method can be called IMPROPER (linear interpolation is often used in my field).
Also, you might think you could do the same thing with 'smooth cspline', but it doesn't work in practice due to implementation issues.
My explanation is insufficient. This example emulates a
smooth path
with linear interpolation, not cubic spline interpolation.This behavior is not exclusive to "smooth linear"; we can replace the "smooth linear" part with "smooth cspline" and observe the same data filling phenomenon which gnuplot should not do. Note that in example5, data filling occurs only if
set datafile missing NaN
is set.Anyway, as to whether resampling, if helpful, should be done outside of gnuplot, I can only say that I wish gnuplot had such a feature. If the idea of resampling within gnuplot is acceptable, I think linear interpolation, which is not smooth but does not cause surprises (overshoot), is a reliable and important tool as a first step in analysis.
Since the discussion has become more a matter of philosophy than implementation, I think it would be better to bring this up on the mailing list rather than continue the comments here. I don't want to rule out a feature that might in fact help many people, but on the other hand I don't want to make it easy to do something that is poor practice when it is already possible, even if more complicated, to do something else that would be better.
You are working with data that is far outside my area of expertise, so I am largely unfamiliar with both the needs and the standard practices for visualization and analysis.
For what it's worth, I think "smooth cspline" is also inappropriate as an extrapolation method. Even for interpolation it can exhibit severe overshoot, which is why I thought it was important to add "smooth mcs" (splines with monotonic constraints) as an alternative.
I think that is a good idea. If I could get the original poster's opinion, there may be a use for it that I am unaware of. Also, it may be a feature that most people don't need.
The 'mcspline' story was very informative. I would like to use it in the right situation.
Thank you again.
Thank you, Motoyoshi-san, for your effort.
The feature request was simply about the ability to perform mathematical operation between two datasets/datablocks which don't have identical x-values.
I know, I could basically achieve this task with any programming language, but starting from a datablock, going via an external tool and possible external files sounded cumbersome to me if, instead, there could be a simple gnuplot option (e.g.
smooth linear
).Actually, from gnuplot 5.4.0 on (June 2020), I can get the job done by "seriously misusing"
smooth zsort
.See https://stackoverflow.com/a/77674192/7295599