Menu

#518 Linear resampling of data?

open
nobody
None
5
2024-01-04
2020-12-08
theozh
No

Feature request:
Sometimes it might be necessary, to apply some operations on two datasets which have the same or similar range, but do not have the same x-values or x-value spacing, for example, adding, subtracting, multiplying or dividing two spectra.
You would have to resample the data:
Of course you can do this with external tools and you can do this even with a cumbersome gnuplot workaround (see https://stackoverflow.com/q/54362441).

However, since there are already the options for smooth: csplines, acsplines, mcsplines and bezier,
would it be eventually possible (without too much effort) to add another option linear?
Thanks for consideration.

Code: (just for illustration of the existing smooth options)

### feature request: linear interpolation/resampling
reset session

$Data <<EOD
1    1
2    7
3    5
10   3
EOD

set samples 21
set table $csplines
    plot $Data u 1:2 w l smooth csplines
set table $acsplines
    plot $Data u 1:2 w l smooth acsplines
set table $mcsplines
    plot $Data u 1:2 w l smooth mcsplines
set table $bezier
    plot $Data u 1:2 w l smooth bezier
unset table

plot $Data u 1:2 w lp lw 3 lc 0 pt 7 ti "Data", \
     $bezier u 1:2 w lp pt 7 ti "bezier", \
     $csplines u 1:2 w lp pt 7 ti "csplines", \
     $mcsplines u 1:2 w lp pt 7 ti "mcsplines", \
     $acsplines u 1:2 w lp pt 7 ti "acsplines", \
### end of code
1 Attachments

Discussion

  • Hiroki Motoyoshi

    I think this Feature Request is reasonable. I would like to have this feature, so I implemented it.
    A patch file is attached.
    Any comments are welcome.

    About 'smooth linear' filter

    I imagine 'smooth linear' is rarely used for smoothing during drawing. Instead, as the title of this Feature Request suggests, it will most often used for data resampling. Therefore, I did not position 'smooth linear' as a derivative of existing spline interpolations, and I implemented it consciously so that it is easy to use as a resampling tool.

    Use cases, not necessarily limited to linear interpolation, include

    • Data resampling
      • Comparison of series with different sampling intervals
      • Fill in some intervals with filledcurves
    • Finding the inverse function of monotonic data
      • Generate data for yticlabel() instead of 'set link' if the inverse function is not analytical
      • Smooth path on 'with lines'
    • Fill in missing values
    Input data

    The data in the first column (x-axis) must increase or decrease monotonically. During interpolation, if non-monotonic data is detected, the waring message "Non-monotonic x data was found in 'smooth linear'" is produced. In that case, the data processing will continue, but the output will not be as expected.

    If "filledcurves between" is selected as the plotting style, the data in the third column will be interpolated as well as the second column (See 'example4.gp').

    Specifying the sampling range

    The 'smooth linear' filter samples a given range of data at equal intervals. The sampling range can be specified according to the following rules,

    • If a range is explicitly specified in the plot, it is used.
    • If a range is explicitly specified by 'set xrange', it is used.
    • If auto-scaling is set for the x-axis, the x-range of the data itself is used.
    • If the data range is smaller than the specified range, the outside of the data range is padded with NaN.

    This rule is different from the behavior of other splines interpolations. From the document (help smooth),

    If autoscale is not in effect, and a spline curve is being generated, sampling of the spline fit is done across the intersection of the x range covered by the input data and the fixed abscissa range defined by set xrange.

    This behavior is convenient for drawing, but not useful as a tool for resampling. If 'smooth linear' followed this rule, I would not use it. Here is why I would like to have such different rule: For any input data, output resampled with the same sampling range and the same number of samples will always contain the same number of rows and can be compared row by row (See 'example1.gp').

    Please check the following script to see how it works.

    $data <<EOD
    0 0
    1 1
    2 5
    3 5
    4 4
    5 7
    6 8
    EOD
    
    reset
    print "Ex.1) set xrange [0:10]"
    set xrange [0:10]
    set sample 21
    set table 
      plot $data smooth linear
    unset table
    
    reset
    print "Ex.2) plot [2:5] ..."
    set sample 7
    set table 
      plot [2:5] $data smooth linear
    unset table
    
    reset
    print "Ex.3) auto scaling"
    set xrange [*:*]
    set sample 13
    set table 
      plot $data smooth linear
    unset table
    
    Handling of NaN and blank lines in data.
    • If there is a blank line, the points between the points before and after the blank line are padded with NaN.
    • If the y-value of input data contains NaN, the interval's interpolated value on both sides will be padded with NaN.

    If you want to fill missing values (NaN) with linear interpolation, use "set datafile missing NaN" (See 'example5.gp').

    Abbreviation

    I boldly made the abbreviation of "smooth linear" to "smooth l". If allowed, leave it as it is.

    Sample scripts

    The following sample scripts are also attached.

    example1.gp : Comparison of series with different sampling intervals
    example2.gp : Fill in some intervals with filledcurves
    example3.gp : Generate data for yticlabel() instead of 'set link' if the inverse function is not analytical
    example4.gp : Smooth path on 'with lines'
    example5.gp : Fill in missing values

     
    • Hiroki Motoyoshi

      Revised patch file (v2) with the following changes

      • Placed codes in 'filters.{c,h}' instead of 'interpol.{c,h}'
      • Use 'cp_extend(plot,0)' instead of 'free(plot->points)'
       
  • Ethan Merritt

    Ethan Merritt - 2023-10-01

    I am dubious about the mathematical validity of using linear sampling to achieve the stated aim of the request: "adding, subtracting, multiplying or dividing two spectra".

    This is pushing the boundary of my area of expertise, but so far as I know the proper way to do this is via convolution. This requires calculating the Fourier transform of each spectrum and then operating in the dual space. Now it is true that calculating the transform using an FFT of uniformly sampled data makes this easy, but if your data is not uniformly sampled then linear interpolation onto a fixed grid is not a good way to proceed. This is where I hit the limit of my own knowledge of best practices, but I refer you to this related question: https://scicomp.stackexchange.com/q/593/36096

    I believe that gnuplot could be used to implement one of the methods referred to there, but I would expect it to be easier to use a more specialized signal processing package instead.

    Now it may be that resampling by linear interpolation does have valid uses, but before looking at any code it would help me a lot if someone could suggest pointers to reference material, textbooks, tutorials, journal articles, wiki pages, whatever, that document what problems it would properly be applied to. Ideally there would be real-world test cases that any new code could be run against to validate the implementation.

     
    • Hiroki Motoyoshi

      Thank you for your comment.

      When I read the original post, the feature I wanted was not the ability of arithmetic computation between two spectra, but simply linear resampling. I may have posted this in the wrong place. I did not want to discuss in depth with you about the arithmetic computation between two spectra.

      As for linear resampling, it is the most basic interpolation of observed data. To begin with, drawing discrete data with lines is itself linear interpolation. My 'sample3.gp' is an realistic example of what I consider the most important in this Feature. The idea is to get an inverted grid of monotonic observed data. I prepared pseudo data for this example due to licensing issues, but the process would be the same with real data.

       

      Last edit: Hiroki Motoyoshi 2023-10-01
  • Ethan Merritt

    Ethan Merritt - 2023-10-17

    My starting point is that gnuplot should help you to visualize your data and to present it clearly to others. From that perspective it is important to show the actual data points; I do not like the idea of resampling, especially when there are only a few data points.

    Example 1:
    I would prefer to plot it this way:

    #
    # Alternative method using current gnuplot
    #
    set xrange [0:10]
    set style line 101 lc "black" pt 7
    
    plot $data1 with filledcurves above y=0, \
          $data2 with filledcurves above y=0 fc bgnd,\
         $data1 with lp ls 101, $data2 with lp ls 101
    

    Example 2:
    There are several ways to plot this. One convenient one is to use your recent hsteps style:

    set style fill solid
    set key left reverse
    array bars = [2.0, 5.0]
    
    plot bars using 2:(99):(1) with hsteps pillars   noautoscale notitle, \
         $data using 1:2 with filledcurve x2 fc bgnd notitle, \
         $data using 1:2 with lp pt 7 lc "black"     title "bars masked by $data"
    

    Example 3:
    If the y1 and y2 scales are not analytically related then the plot is improper; a plotted point cannot be correctly placed on both axes simultaneously. A different representation is needed to present such data. Here is one possibility, using the same data. Since it is hard to fit much information in densely spaced labels, this might be a case where hypertext labels would be preferred.

    set xlabel "Pressure (hPa)"
    set ylabel "Height (m)"
    set yrange [250:450]
    set key samplen 0 left Left reverse
    
    plot $data using 1:3:(sprintf("%d",int($3))) with labels \
         font ",10" rotate by 45 point pt 7 left \
         title "Temperature (K)"
    

    I do not understand the intent of example 4. Example 5's "automatically filling in missing data" is exactly what I feel gnuplot should not do. I think this is an example of why resampling is not a good idea.

     
    • Hiroki Motoyoshi

      Thank you for your comments.

      My starting point is that gnuplot should help you to visualize your data and to present it clearly to others. From that perspective it is important to show the actual data points; I do not like the idea of resampling, especially when there are only a few data points.

      Does this mean that this is a general opinion, not limited to 'smooth linear'?
      I understand that if the data is unequally spaced or the sampling is out of phase with the original equally spaced data, the "smooth cspline" will not accurately indicate the data points.

      In meteorology, there are situations where data with different sampling rates need to be integrated or unevenly spaced data is compared on a fixed grid. For the preliminary analysis in such cases, the emphasis in the analysis is often on capturing differences, trends, periods, and rates of change from the figures rather than the numerical precision of interpolation. Resampling (interpolation) is employed in such scenarios, and linear interpolation has become one of the convenient tools for resampling. I believe it is an important role of gnuplot to support such analysis.

      example1.alt
      example2.alt

      Thank you. It certainly drew beautifully. I would not have thought of this method.

      expamle3: 

      I see your point, and I slightly differ in my perspective regarding the various uses of gnuplot. gnuplot can serve not only for creating graphs for publication but also for generating numerous quick visualizations and conducting preliminary analyses. 

      To give some background on sample3, the data dealt with in the sample3 are meteorological observations of the upper atmosphere by radiosondes. Both air pressure and altitude are observed values, and theoretically, they have a monotonic relationship. The vertical profile of the air temperature is the main objective of sample 3, and the vertical axis may be either the air pressure or the altitude. Even if it is a linear interpolated value, the altitude value is displayed because we want to know it as a reference value. I don't think such a display method can be called IMPROPER (linear interpolation is often used in my field). 

      Also, you might think you could do the same thing with 'smooth cspline', but it doesn't work in practice due to implementation issues.

      I do not understand the intent of example 4. Example 5's "automatically filling in missing data" is exactly what I feel gnuplot should not do. I think this is an example of why resampling is not a good idea.

      example4

      My explanation is insufficient. This example emulates a smooth path with linear interpolation, not cubic spline interpolation.

      example5

      This behavior is not exclusive to "smooth linear"; we can replace the "smooth linear" part with "smooth cspline" and observe the same data filling phenomenon which gnuplot should not do. Note that in example5, data filling occurs only if set datafile missing NaN is set. 

      Anyway, as to whether resampling, if helpful, should be done outside of gnuplot, I can only say that I wish gnuplot had such a feature. If the idea of resampling within gnuplot is acceptable, I think linear interpolation, which is not smooth but does not cause surprises (overshoot), is a reliable and important tool as a first step in analysis.

       
      • Ethan Merritt

        Ethan Merritt - 2023-10-18

        Since the discussion has become more a matter of philosophy than implementation, I think it would be better to bring this up on the mailing list rather than continue the comments here. I don't want to rule out a feature that might in fact help many people, but on the other hand I don't want to make it easy to do something that is poor practice when it is already possible, even if more complicated, to do something else that would be better.

        You are working with data that is far outside my area of expertise, so I am largely unfamiliar with both the needs and the standard practices for visualization and analysis.

        For what it's worth, I think "smooth cspline" is also inappropriate as an extrapolation method. Even for interpolation it can exhibit severe overshoot, which is why I thought it was important to add "smooth mcs" (splines with monotonic constraints) as an alternative.

         
        • Hiroki Motoyoshi

          I think that is a good idea. If I could get the original poster's opinion, there may be a use for it that I am unaware of. Also, it may be a feature that most people don't need.

           
        • Hiroki Motoyoshi

          For what it's worth, I think "smooth cspline" is also inappropriate as an extrapolation method. Even for interpolation it can exhibit severe overshoot, which is why I thought it was important to add "smooth mcs" (splines with monotonic constraints) as an alternative.

          The 'mcspline' story was very informative. I would like to use it in the right situation.

          Thank you again.

           
  • theozh

    theozh - 2024-01-04

    Thank you, Motoyoshi-san, for your effort.
    The feature request was simply about the ability to perform mathematical operation between two datasets/datablocks which don't have identical x-values.
    I know, I could basically achieve this task with any programming language, but starting from a datablock, going via an external tool and possible external files sounded cumbersome to me if, instead, there could be a simple gnuplot option (e.g. smooth linear).

    Actually, from gnuplot 5.4.0 on (June 2020), I can get the job done by "seriously misusing" smooth zsort.
    See https://stackoverflow.com/a/77674192/7295599

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.