Menu

#2446 gnuplot's zsort (Windows) does not preserve order

None
closed-accepted
nobody
2022-02-02
2021-06-03
theozh
No

Since gnuplot 5.4.0 there is the option smooth zsort.
This can be used to sort a datablock by column(s), however, under Windows it doesn't seem to preserve order. (see also https://stackoverflow.com/q/67801386/7295599)

Minimal example:

### "bug": Windows zsort does not preserve previous order
reset session

$Data <<EOD
2   4
1   4
2   3
1   3
EOD

set table $Data1
    plot $Data  u 1:2:2 smooth zsort
set table $Data2
    plot $Data1 u 1:2:1 smooth zsort
unset table

print $Data2
### end of code

Result: (with gnuplot 5.4.1 under Win10)

# Curve 0 of 1, 4 points
# Curve title: "$Data_1 using 1:2:1"
# x y type
 1  3  i
 1  4  i
 2  4  i
 2  3  i

Expected result: (which you apparently get under Linux but not under Windows)

# Curve 0 of 1, 4 points
# Curve title: "$Data_1 using 1:2:1"
# x y type
 1  3  i
 1  4  i
 2  3  i
 2  4  i

Discussion

  • Ethan Merritt

    Ethan Merritt - 2021-06-03

    Can you back up and explain what problem you are trying to solve? I know I suggested it in response to your question on StackOverflow, but this is not at all what smooth zsort was intended for.
    Can we first explore what other tools might be more appropriate, or what options are possible if we do change zsort?

    For example, you said on S/O that you would rather not write out a temporary file. Why? There are already gnuplot commands that write a temp file behind the scenes (e.g. test palette). Would it be an acceptable solution to your problem if gnuplot's zsort wrote a temp file and used the system "sort" command to pipe it back in?

     
  • theozh

    theozh - 2021-06-04

    (same comment on SO)
    well, writing to a temporary file certainly works. I hope there will be no file opening, writing, closing and re-reading timing issues. If I have data in (fast) RAM, why should I write it to (slow) HDD just for sorting it? If you just sort once, the longer time might not be an issue. If you have to sort again and again, this would go towards data processing and analysis, and gnuplot doesn't want to be a tool for such tasks. My simple hope was that zsort (although is was not intended for sorting like that) still would work. However, now the Windows implemenation of zsort thwarts this plan.

    I know I can always use Python or other programming languages for preparing the data for gnuplot, although I would prefer a gnuplot-only solution. So, admittedly, besides platform independence, I do not have a good argument for making Windows zsort working the same way as Linux zsort does.

     
    • Ethan Merritt

      Ethan Merritt - 2021-06-04

      Well, if you are considering Python as an alternative then clearly speed is not a priority :-) The issue of what bits do or do not ever make it all the way to a physical disk as opposed to RAM is outside the scope of gnuplot. That part is up to the operating system or the file system. On linux it is likely that the temp file would be created in a memory-backed virtual file system; I have no idea about Windows.

      To clarify, there is nothing in the zsort code in gnuplot that is different between linux and Windows. But the zsort code calls a standard system library routine named "qsort". The standard spec for qsort does not require that it use any particular algorithm for sorting, and depending on what algorithm that particular system library chose to use it may or may not preserve existing order among "equal" elements.

      I see two downsides to modifying gnuplot's zsort to make such a guarantee internally.
      1) The primary purpose for smooth zsort is to provide a filter for huge data sets (millions of points). It really is supposed to be fast, and adding additional constraints would slow it down, if only slightly.
      2) The with points plot style that zsort operates on can potentially carry along 7 properties associated with each point: x y z color pointtype pointsize pointcharacter. This uses every available slot in the data structure. In order to add an associated property "original sequence number" we would have to expand the data structure, which would correspondingly increase the memory footprint for these huge data sets. Or, I suppose, sacrifice one of the existing properties (e.g. "you cannot zsort data sets that have a pointchar property").

      Neither of these points is an absolute show-stopper, but it would require demonstration that the benefit outweigh the disadvantages. Does your actual use case involve huge data sets? If so then I am interested in working out the best solution. If not then I think you are unlikely to see much of a performance hit from using a temp file, whereas I already know that the upper limits on the data set size gnuplot can reasonably handle are subject to the two concerns above.

       

      Last edit: Ethan Merritt 2021-06-04
  • Bastian Märkisch

    • labels: sort, zsort, windows, smooth --> sort, zsort, smooth, Windows
    • Group: -->
    • Priority: -->
     
  • Ethan Merritt

    Ethan Merritt - 2022-02-02
    • status: open --> closed-accepted
     

Log in to post a comment.