Thread: segmentation fault when using matrix uniform with index

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

gnuplot-beta

segmentation fault when using matrix uniform with index

From: Juhász P. <pet...@gm...> - 2020-03-19 22:17:34

Hi,

Today I've tried to create a heatmap from a CSV-ish file that contained
more than one blocks of data, and I've naively thought that I can
combine `plot ... index N` with `plot ... matrix rowheaders
columnheaders w image`. Admittedly, it's an edge case, but it's not
explicitly mentioned as illegal either.

Unfortunately, it doesn't work:
for small files it produces an empty plot with some error messages, and
for large files it crashes.

The following one-liner will create a file that can be used to
reproduce the crash:

perl -le '$N=100; $,=v9; for (1..2) {print "x",qw(a b)x$N; print
"a",(1,2)x$N for 1..$N; print"" for 1..2}' > /tmp/foo

then in gnuplot,

plot '/tmp/foo' i 0 matrix rowheaders columnheaders w image

Decrease the value of $N in the one-liner to generate a file that
doesn't crash, for me the threshold was 26.

gnuplot version: 5.5 patchlevel 0 last modified 2019-12-22

The crash happens in graphics.c:process_image.

So it appears that `index` doesn't cooperate with `matrix` - I can
accept that this combination is not supported but then there should be
a note about the fact in the documentation. And it shouldn't crash in
any case.

best regards,
Peter Juhasz

Re: segmentation fault when using matrix uniform with index

From: Ethan A M. <me...@uw...> - 2020-03-20 03:56:10

On Thursday, 19 March 2020 15:17:21 PDT Juhász Péter wrote:
> Hi,
> 
> Today I've tried to create a heatmap from a CSV-ish file that contained
> more than one blocks of data, and I've naively thought that I can
> combine `plot ... index N` with `plot ... matrix rowheaders
> columnheaders w image`. Admittedly, it's an edge case, but it's not
> explicitly mentioned as illegal either.
> 
> Unfortunately, it doesn't work:
> for small files it produces an empty plot with some error messages, and
> for large files it crashes.
> 
> The following one-liner will create a file that can be used to
> reproduce the crash:
> 
> perl -le '$N=100; $,=v9; for (1..2) {print "x",qw(a b)x$N; print
> "a",(1,2)x$N for 1..$N; print"" for 1..2}' > /tmp/foo
> 
> then in gnuplot,
> 
> plot '/tmp/foo' i 0 matrix rowheaders columnheaders w image
> 
> Decrease the value of $N in the one-liner to generate a file that
> doesn't crash, for me the threshold was 26.
> 
> gnuplot version: 5.5 patchlevel 0 last modified 2019-12-22
> 
> The crash happens in graphics.c:process_image.
> 
> So it appears that `index` doesn't cooperate with `matrix` - I can
> accept that this combination is not supported but then there should be
> a note about the fact in the documentation. And it shouldn't crash in
> any case.

I agree that the program should never crash just because the command
or the data is not exactly as expected.  So there certainly is a bug.
But I don't think it is a question of 'matrix' not recognizing 'index.

I think the problem is that it is not well defined what it means to
have column headers in a file with multiple data blocks.

The program seems to think that there is a single set of column headers
on the first line and does not expect them to reappear in front of each
data block.  If I take the large test file generated by your perl jiffy and
comment out the second row of headers, then the program seems to
operate acceptably.

I will try to figure out why an unexpect line of headers causes
a segfault, but beyond that I don't know which is more common:

    1 line of column headers applying to the entire file
or
  a separate line of column headers before each data block.

Thoughts from anyone who deals with this kind of data file?

	Ethan
> 
> best regards,
> Peter Juhasz

Re: segmentation fault when using matrix uniform with index

From: Juhász P. <pet...@gm...> - 2020-03-20 20:05:33

On Thu, 2020-03-19 at 20:21 -0700, Ethan A Merritt wrote:
> On Thursday, 19 March 2020 15:17:21 PDT Juhász Péter wrote:
> > Hi,
> > 
> > Today I've tried to create a heatmap from a CSV-ish file that
> > contained
> > more than one blocks of data, and I've naively thought that I can
> > combine `plot ... index N` with `plot ... matrix rowheaders
> > columnheaders w image`. Admittedly, it's an edge case, but it's not
> > explicitly mentioned as illegal either.
> > 
> > Unfortunately, it doesn't work:
> > for small files it produces an empty plot with some error messages,
> > and
> > for large files it crashes.
> > 
> > The following one-liner will create a file that can be used to
> > reproduce the crash:
> > 
> > perl -le '$N=100; $,=v9; for (1..2) {print "x",qw(a b)x$N; print
> > "a",(1,2)x$N for 1..$N; print"" for 1..2}' > /tmp/foo
> > 
> > then in gnuplot,
> > 
> > plot '/tmp/foo' i 0 matrix rowheaders columnheaders w image
> > 
> > Decrease the value of $N in the one-liner to generate a file that
> > doesn't crash, for me the threshold was 26.
> > 
> > gnuplot version: 5.5 patchlevel 0 last modified 2019-12-22
> > 
> > The crash happens in graphics.c:process_image.
> > 
> > So it appears that `index` doesn't cooperate with `matrix` - I can
> > accept that this combination is not supported but then there should
> > be
> > a note about the fact in the documentation. And it shouldn't crash
> > in
> > any case.
> 
> I agree that the program should never crash just because the command
> or the data is not exactly as expected.  So there certainly is a bug.
> But I don't think it is a question of 'matrix' not recognizing
> 'index.
> 

You're right! Indeed, it's the `columnheaders` mode, not `matrix` in
general that triggers the crash.

> I think the problem is that it is not well defined what it means to
> have column headers in a file with multiple data blocks.
> 
> The program seems to think that there is a single set of column
> headers
> on the first line and does not expect them to reappear in front of
> each
> data block.  If I take the large test file generated by your perl
> jiffy and
> comment out the second row of headers, then the program seems to
> operate acceptably.

Indeed. If I omit `columnheaders`, I get a warning about missing or
undefined values and the image will contain a black row, but it doesn't
crash. With `columnheaders`, I get a crash with `index 0` but not with
`index 1`. Probably there is an off-by-one error somewhere that gets
exposed by the requirement of column headers.

> 
> I will try to figure out why an unexpect line of headers causes
> a segfault, but beyond that I don't know which is more common:
> 
>     1 line of column headers applying to the entire file
> or
>   a separate line of column headers before each data block.
> 
> Thoughts from anyone who deals with this kind of data file?

The CSV (or something-SV) format itself is not well defined, everyone
uses their own home-grown customary formats. In case of a single file
with multiple data blocks:

- from the software implementation standpoint, it makes more sense to
allow and expect one header line per file, at the top of the file,
because that's simpler to explain and simpler to parse;

- from the user experience standpoint, it's better to allow a header
line for every block: they might not even represent the same kind of
data so each block could have a different number of columns with
different headers.

I think there are examples for both in the wild.

best regards,
Peter Juhasz

>