From: Juhász P. <pet...@gm...> - 2020-03-19 22:17:34
|
Hi, Today I've tried to create a heatmap from a CSV-ish file that contained more than one blocks of data, and I've naively thought that I can combine `plot ... index N` with `plot ... matrix rowheaders columnheaders w image`. Admittedly, it's an edge case, but it's not explicitly mentioned as illegal either. Unfortunately, it doesn't work: for small files it produces an empty plot with some error messages, and for large files it crashes. The following one-liner will create a file that can be used to reproduce the crash: perl -le '$N=100; $,=v9; for (1..2) {print "x",qw(a b)x$N; print "a",(1,2)x$N for 1..$N; print"" for 1..2}' > /tmp/foo then in gnuplot, plot '/tmp/foo' i 0 matrix rowheaders columnheaders w image Decrease the value of $N in the one-liner to generate a file that doesn't crash, for me the threshold was 26. gnuplot version: 5.5 patchlevel 0 last modified 2019-12-22 The crash happens in graphics.c:process_image. So it appears that `index` doesn't cooperate with `matrix` - I can accept that this combination is not supported but then there should be a note about the fact in the documentation. And it shouldn't crash in any case. best regards, Peter Juhasz |
From: Ethan A M. <me...@uw...> - 2020-03-20 03:56:10
|
On Thursday, 19 March 2020 15:17:21 PDT Juhász Péter wrote: > Hi, > > Today I've tried to create a heatmap from a CSV-ish file that contained > more than one blocks of data, and I've naively thought that I can > combine `plot ... index N` with `plot ... matrix rowheaders > columnheaders w image`. Admittedly, it's an edge case, but it's not > explicitly mentioned as illegal either. > > Unfortunately, it doesn't work: > for small files it produces an empty plot with some error messages, and > for large files it crashes. > > The following one-liner will create a file that can be used to > reproduce the crash: > > perl -le '$N=100; $,=v9; for (1..2) {print "x",qw(a b)x$N; print > "a",(1,2)x$N for 1..$N; print"" for 1..2}' > /tmp/foo > > then in gnuplot, > > plot '/tmp/foo' i 0 matrix rowheaders columnheaders w image > > Decrease the value of $N in the one-liner to generate a file that > doesn't crash, for me the threshold was 26. > > gnuplot version: 5.5 patchlevel 0 last modified 2019-12-22 > > The crash happens in graphics.c:process_image. > > So it appears that `index` doesn't cooperate with `matrix` - I can > accept that this combination is not supported but then there should be > a note about the fact in the documentation. And it shouldn't crash in > any case. I agree that the program should never crash just because the command or the data is not exactly as expected. So there certainly is a bug. But I don't think it is a question of 'matrix' not recognizing 'index. I think the problem is that it is not well defined what it means to have column headers in a file with multiple data blocks. The program seems to think that there is a single set of column headers on the first line and does not expect them to reappear in front of each data block. If I take the large test file generated by your perl jiffy and comment out the second row of headers, then the program seems to operate acceptably. I will try to figure out why an unexpect line of headers causes a segfault, but beyond that I don't know which is more common: 1 line of column headers applying to the entire file or a separate line of column headers before each data block. Thoughts from anyone who deals with this kind of data file? Ethan > > best regards, > Peter Juhasz |
From: Juhász P. <pet...@gm...> - 2020-03-20 20:05:33
|
On Thu, 2020-03-19 at 20:21 -0700, Ethan A Merritt wrote: > On Thursday, 19 March 2020 15:17:21 PDT Juhász Péter wrote: > > Hi, > > > > Today I've tried to create a heatmap from a CSV-ish file that > > contained > > more than one blocks of data, and I've naively thought that I can > > combine `plot ... index N` with `plot ... matrix rowheaders > > columnheaders w image`. Admittedly, it's an edge case, but it's not > > explicitly mentioned as illegal either. > > > > Unfortunately, it doesn't work: > > for small files it produces an empty plot with some error messages, > > and > > for large files it crashes. > > > > The following one-liner will create a file that can be used to > > reproduce the crash: > > > > perl -le '$N=100; $,=v9; for (1..2) {print "x",qw(a b)x$N; print > > "a",(1,2)x$N for 1..$N; print"" for 1..2}' > /tmp/foo > > > > then in gnuplot, > > > > plot '/tmp/foo' i 0 matrix rowheaders columnheaders w image > > > > Decrease the value of $N in the one-liner to generate a file that > > doesn't crash, for me the threshold was 26. > > > > gnuplot version: 5.5 patchlevel 0 last modified 2019-12-22 > > > > The crash happens in graphics.c:process_image. > > > > So it appears that `index` doesn't cooperate with `matrix` - I can > > accept that this combination is not supported but then there should > > be > > a note about the fact in the documentation. And it shouldn't crash > > in > > any case. > > I agree that the program should never crash just because the command > or the data is not exactly as expected. So there certainly is a bug. > But I don't think it is a question of 'matrix' not recognizing > 'index. > You're right! Indeed, it's the `columnheaders` mode, not `matrix` in general that triggers the crash. > I think the problem is that it is not well defined what it means to > have column headers in a file with multiple data blocks. > > The program seems to think that there is a single set of column > headers > on the first line and does not expect them to reappear in front of > each > data block. If I take the large test file generated by your perl > jiffy and > comment out the second row of headers, then the program seems to > operate acceptably. Indeed. If I omit `columnheaders`, I get a warning about missing or undefined values and the image will contain a black row, but it doesn't crash. With `columnheaders`, I get a crash with `index 0` but not with `index 1`. Probably there is an off-by-one error somewhere that gets exposed by the requirement of column headers. > > I will try to figure out why an unexpect line of headers causes > a segfault, but beyond that I don't know which is more common: > > 1 line of column headers applying to the entire file > or > a separate line of column headers before each data block. > > Thoughts from anyone who deals with this kind of data file? The CSV (or something-SV) format itself is not well defined, everyone uses their own home-grown customary formats. In case of a single file with multiple data blocks: - from the software implementation standpoint, it makes more sense to allow and expect one header line per file, at the top of the file, because that's simpler to explain and simpler to parse; - from the user experience standpoint, it's better to allow a header line for every block: they might not even represent the same kind of data so each block could have a different number of columns with different headers. I think there are examples for both in the wild. best regards, Peter Juhasz > |