gnuplot / Feature Requests / #518 Linear resampling of data?

Hiroki Motoyoshi - 2023-09-30

I think this Feature Request is reasonable. I would like to have this feature, so I implemented it.
A patch file is attached.
Any comments are welcome.

About 'smooth linear' filter

I imagine 'smooth linear' is rarely used for smoothing during drawing. Instead, as the title of this Feature Request suggests, it will most often used for data resampling. Therefore, I did not position 'smooth linear' as a derivative of existing spline interpolations, and I implemented it consciously so that it is easy to use as a resampling tool.

Use cases, not necessarily limited to linear interpolation, include

Data resampling

Comparison of series with different sampling intervals

Fill in some intervals with filledcurves

Finding the inverse function of monotonic data

Generate data for yticlabel() instead of 'set link' if the inverse function is not analytical

Smooth path on 'with lines'

Fill in missing values

Input data

The data in the first column (x-axis) must increase or decrease monotonically. During interpolation, if non-monotonic data is detected, the waring message "Non-monotonic x data was found in 'smooth linear'" is produced. In that case, the data processing will continue, but the output will not be as expected.

If "filledcurves between" is selected as the plotting style, the data in the third column will be interpolated as well as the second column (See 'example4.gp').

Specifying the sampling range

The 'smooth linear' filter samples a given range of data at equal intervals. The sampling range can be specified according to the following rules,

If a range is explicitly specified in the plot, it is used.

If a range is explicitly specified by 'set xrange', it is used.

If auto-scaling is set for the x-axis, the x-range of the data itself is used.

If the data range is smaller than the specified range, the outside of the data range is padded with NaN.

This rule is different from the behavior of other splines interpolations. From the document (help smooth),

If autoscale is not in effect, and a spline curve is being generated, sampling of the spline fit is done across the intersection of the x range covered by the input data and the fixed abscissa range defined by set xrange.

This behavior is convenient for drawing, but not useful as a tool for resampling. If 'smooth linear' followed this rule, I would not use it. Here is why I would like to have such different rule: For any input data, output resampled with the same sampling range and the same number of samples will always contain the same number of rows and can be compared row by row (See 'example1.gp').

Please check the following script to see how it works.

$data <<EOD 0 0 1 1 2 5 3 5 4 4 5 7 6 8 EOD reset print "Ex.1) set xrange [0:10]" set xrange [0:10] set sample 21 set table plot $data smooth linear unset table reset print "Ex.2) plot [2:5] ..." set sample 7 set table plot [2:5] $data smooth linear unset table reset print "Ex.3) auto scaling" set xrange [*:*] set sample 13 set table plot $data smooth linear unset table

Handling of NaN and blank lines in data.

If there is a blank line, the points between the points before and after the blank line are padded with NaN.

If the y-value of input data contains NaN, the interval's interpolated value on both sides will be padded with NaN.

If you want to fill missing values (NaN) with linear interpolation, use "set datafile missing NaN" (See 'example5.gp').

Abbreviation

I boldly made the abbreviation of "smooth linear" to "smooth l". If allowed, leave it as it is.

Sample scripts

The following sample scripts are also attached.

example1.gp : Comparison of series with different sampling intervals
example2.gp : Fill in some intervals with filledcurves
example3.gp : Generate data for yticlabel() instead of 'set link' if the inverse function is not analytical
example4.gp : Smooth path on 'with lines'
example5.gp : Fill in missing values

example1.gp

example2.gp

example3.gp

example4.gp

example5.gp

gnuplot_smooth_linear.patch
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hiroki Motoyoshi - 2023-09-30
  
  Revised patch file (v2) with the following changes
  
  Placed codes in 'filters.{c,h}' instead of 'interpol.{c,h}'
  
  Use 'cp_extend(plot,0)' instead of 'free(plot->points)'
  
  gnuplot_smooth_linear_v2.patch
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-10-01

I am dubious about the mathematical validity of using linear sampling to achieve the stated aim of the request: "adding, subtracting, multiplying or dividing two spectra".

This is pushing the boundary of my area of expertise, but so far as I know the proper way to do this is via convolution. This requires calculating the Fourier transform of each spectrum and then operating in the dual space. Now it is true that calculating the transform using an FFT of uniformly sampled data makes this easy, but if your data is not uniformly sampled then linear interpolation onto a fixed grid is not a good way to proceed. This is where I hit the limit of my own knowledge of best practices, but I refer you to this related question: https://scicomp.stackexchange.com/q/593/36096

I believe that gnuplot could be used to implement one of the methods referred to there, but I would expect it to be easier to use a more specialized signal processing package instead.

Now it may be that resampling by linear interpolation does have valid uses, but before looking at any code it would help me a lot if someone could suggest pointers to reference material, textbooks, tutorials, journal articles, wiki pages, whatever, that document what problems it would properly be applied to. Ideally there would be real-world test cases that any new code could be run against to validate the implementation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hiroki Motoyoshi - 2023-10-01
  
  Thank you for your comment.
  
  When I read the original post, the feature I wanted was not the ability of arithmetic computation between two spectra, but simply linear resampling. I may have posted this in the wrong place. I did not want to discuss in depth with you about the arithmetic computation between two spectra.
  
  As for linear resampling, it is the most basic interpolation of observed data. To begin with, drawing discrete data with lines is itself linear interpolation. My 'sample3.gp' is an realistic example of what I consider the most important in this Feature. The idea is to get an inverted grid of monotonic observed data. I prepared pseudo data for this example due to licensing issues, but the process would be the same with real data.
  
  Last edit: Hiroki Motoyoshi 2023-10-01
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2023-10-17

My starting point is that gnuplot should help you to visualize your data and to present it clearly to others. From that perspective it is important to show the actual data points; I do not like the idea of resampling, especially when there are only a few data points.

Example 1:
I would prefer to plot it this way:

# # Alternative method using current gnuplot # set xrange [0:10] set style line 101 lc "black" pt 7 plot $data1 with filledcurves above y=0, \ $data2 with filledcurves above y=0 fc bgnd,\ $data1 with lp ls 101, $data2 with lp ls 101

Example 2:
There are several ways to plot this. One convenient one is to use your recent hsteps style:

set style fill solid set key left reverse array bars = [2.0, 5.0] plot bars using 2:(99):(1) with hsteps pillars noautoscale notitle, \ $data using 1:2 with filledcurve x2 fc bgnd notitle, \ $data using 1:2 with lp pt 7 lc "black" title "bars masked by $data"

Example 3:
If the y1 and y2 scales are not analytically related then the plot is improper; a plotted point cannot be correctly placed on both axes simultaneously. A different representation is needed to present such data. Here is one possibility, using the same data. Since it is hard to fit much information in densely spaced labels, this might be a case where hypertext labels would be preferred.

set xlabel "Pressure (hPa)" set ylabel "Height (m)" set yrange [250:450] set key samplen 0 left Left reverse plot $data using 1:3:(sprintf("%d",int($3))) with labels \ font ",10" rotate by 45 point pt 7 left \ title "Temperature (K)"

I do not understand the intent of example 4. Example 5's "automatically filling in missing data" is exactly what I feel gnuplot should not do. I think this is an example of why resampling is not a good idea.

example1.alt

example1.png

example2.alt

example2.png

example3.alt

example3.png
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hiroki Motoyoshi - 2023-10-18
  
  Thank you for your comments.
  
  My starting point is that gnuplot should help you to visualize your data and to present it clearly to others. From that perspective it is important to show the actual data points; I do not like the idea of resampling, especially when there are only a few data points.
  
  Does this mean that this is a general opinion, not limited to 'smooth linear'?
  I understand that if the data is unequally spaced or the sampling is out of phase with the original equally spaced data, the "smooth cspline" will not accurately indicate the data points.
  
  In meteorology, there are situations where data with different sampling rates need to be integrated or unevenly spaced data is compared on a fixed grid. For the preliminary analysis in such cases, the emphasis in the analysis is often on capturing differences, trends, periods, and rates of change from the figures rather than the numerical precision of interpolation. Resampling (interpolation) is employed in such scenarios, and linear interpolation has become one of the convenient tools for resampling. I believe it is an important role of gnuplot to support such analysis.
  
  example1.alt
  example2.alt
  
  Thank you. It certainly drew beautifully. I would not have thought of this method.
  
  expamle3:
  
  I see your point, and I slightly differ in my perspective regarding the various uses of gnuplot. gnuplot can serve not only for creating graphs for publication but also for generating numerous quick visualizations and conducting preliminary analyses.
  
  To give some background on sample3, the data dealt with in the sample3 are meteorological observations of the upper atmosphere by radiosondes. Both air pressure and altitude are observed values, and theoretically, they have a monotonic relationship. The vertical profile of the air temperature is the main objective of sample 3, and the vertical axis may be either the air pressure or the altitude. Even if it is a linear interpolated value, the altitude value is displayed because we want to know it as a reference value. I don't think such a display method can be called IMPROPER (linear interpolation is often used in my field).
  
  Also, you might think you could do the same thing with 'smooth cspline', but it doesn't work in practice due to implementation issues.
  
  I do not understand the intent of example 4. Example 5's "automatically filling in missing data" is exactly what I feel gnuplot should not do. I think this is an example of why resampling is not a good idea.
  
  example4
  
  My explanation is insufficient. This example emulates a smooth path with linear interpolation, not cubic spline interpolation.
  
  example5
  
  This behavior is not exclusive to "smooth linear"; we can replace the "smooth linear" part with "smooth cspline" and observe the same data filling phenomenon which gnuplot should not do. Note that in example5, data filling occurs only if set datafile missing NaN is set.
  
  Anyway, as to whether resampling, if helpful, should be done outside of gnuplot, I can only say that I wish gnuplot had such a feature. If the idea of resampling within gnuplot is acceptable, I think linear interpolation, which is not smooth but does not cause surprises (overshoot), is a reliable and important tool as a first step in analysis.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ethan Merritt - 2023-10-18
    
    Since the discussion has become more a matter of philosophy than implementation, I think it would be better to bring this up on the mailing list rather than continue the comments here. I don't want to rule out a feature that might in fact help many people, but on the other hand I don't want to make it easy to do something that is poor practice when it is already possible, even if more complicated, to do something else that would be better.
    
    You are working with data that is far outside my area of expertise, so I am largely unfamiliar with both the needs and the standard practices for visualization and analysis.
    
    For what it's worth, I think "smooth cspline" is also inappropriate as an extrapolation method. Even for interpolation it can exhibit severe overshoot, which is why I thought it was important to add "smooth mcs" (splines with monotonic constraints) as an alternative.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Hiroki Motoyoshi - 2023-10-18
      
      I think that is a good idea. If I could get the original poster's opinion, there may be a use for it that I am unaware of. Also, it may be a feature that most people don't need.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Hiroki Motoyoshi - 2023-10-19
      
      For what it's worth, I think "smooth cspline" is also inappropriate as an extrapolation method. Even for interpolation it can exhibit severe overshoot, which is why I thought it was important to add "smooth mcs" (splines with monotonic constraints) as an alternative.
      
      The 'mcspline' story was very informative. I would like to use it in the right situation.
      
      Thank you again.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

theozh - 2024-01-04

Thank you, Motoyoshi-san, for your effort.
The feature request was simply about the ability to perform mathematical operation between two datasets/datablocks which don't have identical x-values.
I know, I could basically achieve this task with any programming language, but starting from a datablock, going via an external tool and possible external files sounded cumbersome to me if, instead, there could be a simple gnuplot option (e.g. smooth linear).

Actually, from gnuplot 5.4.0 on (June 2020), I can get the job done by "seriously misusing" smooth zsort.
See https://stackoverflow.com/a/77674192/7295599

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Linear resampling of data?

A portable, multi-platform, command-line driven graphing utility

Group

Searches

Help

#518 Linear resampling of data?

Discussion

About 'smooth linear' filter

Input data

Specifying the sampling range

Handling of NaN and blank lines in data.

Abbreviation

Sample scripts