Menu

#2136 Incorrect number of columns reported in stats command in datablock

None
closed-fixed
nobody
2019-05-21
2019-02-19
No

This is semi-related to [bugs:#2129] in that this bug appears to be what was tickling the bug that was causing the segfault.

Basically, loading a file of tab-separated multicolumn data and then plotting a subset of the columns into a datablock appears to preserve the original number of tab-separators in the new datablock which appears to cause the stats command to inconsistantly report the number of columns.

Steps to reproduce below. Notice that the reported matrix is [1 X 10] but the value of 'STATS_columns' is 5.

> printf "" > in.tsv; for i in {1..10}; do printf "%i\tfoo%i\ta\tb\tc\n" $i $i >> in.tsv; done
> head -n3 in.tsv
1   foo1    a   b   c
2   foo2    a   b   c
3   foo3    a   b   c
> gnuplot -d -e "set datafile separator tab; set table \$BLOCK; plot 'in.tsv' using (column(1)) with table; unset table; stats \$BLOCK matrix; show variables"

* FILE: 
  Records:           10
  Out of range:       0
  Invalid:            0
  Column headers:     0
  Blank:              9
  Data Blocks:        1

* MATRIX: [1 X 10] 
  Mean:               5.5000
  Std Dev:            2.8723
  Sample StdDev:      3.0277
  Skewness:           0.0000
  Kurtosis:           1.7758
  Avg Dev:            2.5000
  Sum:               55.0000
  Sum Sq.:          385.0000

  Mean Err.:          0.9083
  Std Dev Err.:       0.6423
  Skewness Err.:      0.7746
  Kurtosis Err.:      1.5492

  Minimum:            1.0000 [ 0 0 ]
  Maximum:           10.0000 [ 0 9 ]
  COG:                0.0000      6.0000

    User and default variables:
    pi = 3.14159265358979
    GNUTERM = "x11"
    NaN = NaN
    $BLOCK = <10 line data block>
    STATS_records = 10
    STATS_invalid = 0
    STATS_headers = 0
    STATS_blank = 9
    STATS_blocks = 1
    STATS_outofrange = 0
    STATS_columns = 5
    STATS_mean = 5.5
    STATS_stddev = 2.87228132326901
    STATS_ssd = 3.02765035409749
    STATS_skewness = 0.0
    STATS_kurtosis = 1.77575757575758
    STATS_adev = 2.5
    STATS_mean_err = 0.908295106229247
    STATS_stddev_err = 0.642261628933256
    STATS_skewness_err = 0.774596669241483
    STATS_kurtosis_err = 1.54919333848297
    STATS_sum = 55.0
    STATS_sumsq = 385.0
    STATS_min = 1.0
    STATS_max = 10.0
    STATS_index_min_x = 0
    STATS_index_min_y = 0
    STATS_index_max_x = 0
    STATS_index_max_y = 9
    STATS_size_x = 1
    STATS_size_y = 10

Gnuplot built from branch-5-2-stable:

> git log -n1
commit 668bcbee7d760388eebc2d30611837b0ba76789b (HEAD -> branch-5-2-stable, origin/branch-5-2-stable)
Author: Bastian Maerkisch <bmaerkisch@web.de>
Date:   Tue Feb 19 08:34:09 2019 +0100

    Always update mouse variables on bound keys

    Previously, this was only done if the "allwindows" option was used.
    Bug #2133
 > uname -a
Darwin pinion.local 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64

Related

Bugs: #2129

Discussion

  • Ethan Merritt

    Ethan Merritt - 2019-02-19

    What you are continuing to discover is that the "stats" command was not designed with the keyword "matrix" in mind. The "matrix" keyword triggers a separate data input path that bypasses the normal read-one-line-of-ascii-data-at-a-time subroutine. Unfortunately that bypassed subroutine tracks a number of things that later get loaded into STATS_foo variables, and even more unfortunately the matrix input subroutine doesn't track these at all. So the value of STATS_columns you see comes from the most recent non-matrix plot command.

    This example also illustrates a larger sort point that the STATS_foo variables are not guaranteed to be current. They should be wiped clean at the start of every "stats" command so that values not generated by the current command do not exist at all. For example, if a command stats $FOO using 2:3 is followed by a command stats $BAR using 4, the no-longer current values of STATS_correlation, STATS_sumxy, etc are still present and may be misinterpreted as belonging to the most recent command.

    Fixing this has been on my TODO list for a long time, but it turns out to be tricky because the "name" option to stats changes where the values are stored and we don't yet know that on entry.

     
  • Ethan Merritt

    Ethan Merritt - 2019-02-20
    • status: open --> pending-fixed
    • Group: -->
    • Priority: -->
     
  • Mike Tegtmeyer

    Mike Tegtmeyer - 2019-02-22

    Thanks again Ethan. Just now had the opportunity to test. Confirmed the fix. I appreciate all your time and dedication.

    • Mike
     
  • Ethan Merritt

    Ethan Merritt - 2019-05-21
    • Status: pending-fixed --> closed-fixed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.