If I use the the boxplot style with
set style boxplot frac 1.
plot for [i=2:16] 'data.tmp' u (i):i w boxplot
I get a segmentation fault for some of my data files. Not always, for all files, but the fact that valgrind reports invalid reads does point to the algorithm being faulty.
This is what valgrind actually reports:
==16915== Conditional jump or move depends on uninitialised value(s)
==16915== at 0x808D781: plot_c_bars (graphics.c:4091)
==16915== by 0x80953B6: do_plot (graphics.c:4409)
==16915== by 0x80B36C0: eval_plots (plot2d.c:2842)
==16915== by 0x80656DE: do_line (command.c:596)
==16915== by 0x8065C5C: com_line (command.c:329)
==16915== by 0x80AE22A: main (plot.c:630)
==16915==
==16915== Invalid read of size 8
==16915== at 0x8096298: do_plot (graphics.c:4381)
==16915== by 0x80B36C0: eval_plots (plot2d.c:2842)
==16915== by 0x80656DE: do_line (command.c:596)
==16915== by 0x8065C5C: com_line (command.c:329)
==16915== by 0x80AE22A: main (plot.c:630)
==16915== Address 0x56bfb58 is 0 bytes after a block of size 204,000 free'd
==16915== at 0x4025016: realloc (vg_replace_malloc.c:525)
==16915== by 0x805A207: gp_realloc (alloc.c:313)
==16915== by 0x80B25B2: cp_extend (plot2d.c:139)
==16915== by 0x80B6F71: eval_plots (plot2d.c:1023)
==16915== by 0x80656DE: do_line (command.c:596)
==16915== by 0x8065C5C: com_line (command.c:329)
==16915== by 0x80AE22A: main (plot.c:630)
==16915==
==16915== Invalid read of size 8
==16915== at 0x8096320: do_plot (graphics.c:4386)
==16915== by 0x80B36C0: eval_plots (plot2d.c:2842)
==16915== by 0x80656DE: do_line (command.c:596)
==16915== by 0x8065C5C: com_line (command.c:329)
==16915== by 0x80AE22A: main (plot.c:630)
==16915== Address 0x56e85bc is 12 bytes after a block of size 166,440 alloc'd
==16915== at 0x4025016: realloc (vg_replace_malloc.c:525)
==16915== by 0x805A207: gp_realloc (alloc.c:313)
==16915== by 0x80B25B2: cp_extend (plot2d.c:139)
==16915== by 0x80B6F71: eval_plots (plot2d.c:1023)
==16915== by 0x80656DE: do_line (command.c:596)
==16915== by 0x8065C5C: com_line (command.c:329)
==16915== by 0x80AE22A: main (plot.c:630)
The first one is easy to fix, just insert
candle.type = INRANGE
in plot_boxplot(). But the other two probably means that that part of the function needs to be thought over.
While we are at that, there are two other points to consider:
the values picked out by that part of the function might be incorrect. I tried R's quantile function on some datasets, and compared them to the values picked by gnuplot's plot_boxplot() function (accessed via debug sprintf's). Now there are nine different definitions of percentiles in R, but IIRC the values picked by gnuplot corresponded to none of the methods provided by R.
An aside: I've always found the syntax of "set style boxplot fraction" hard to understand. I think we'd better redefine it in terms of percentiles (after agreeing in a correct definition of percentile, of course), e.g. "set style boxplot percentile .9" would mean that the whisker should extend to the .9×100 = 90th percentile (and to the 10th on the other side).
data file for demonstration
Please describe version of gnuplot, your OS, terminal, binary distribution or built yourself.
For windows build of gnuplot 4.5, I did meet the segmentation fault with
set style boxplot frac 1.
plot for [i=2:16] 'data.tmp' u (i):i w boxplot
using the data upload this item.
Gnuplot version 4.5 (there is no boxplot in 4.4.x), Ubuntu linux 10.04 (both 32 and 64 bit), built by me. The terminal type doesn't seem to be relevant.
> candle.type = INRANGE
Yes, thank you for catching that. Failing to initialize that field was just sloppy.
> the other two probably means that that part of the function needs to be thought over
The error case is when all point have the same y value. I've added a test to prevent walking off the end of the data array at the point where the error is occuring (patch attached), but probably we should catch this sooner. It is pointless to do all the boxplot processing if the data points are all identical.
Fixes the specific problems reported by valgrind
Fixed 2 corner cases in addition to the failure to initialize candle.type
- array overrun if all y values are the same
- array overrun and/or nonsensical plot if there are fewer than 4 y values
So far as I know we don't use percentile notation anywhere else in gnuplot. Why here?
new boxplot functionality: draw points for the mean and a second set of percentiles
> but probably we should catch this sooner.
> It is pointless to do all the boxplot processing if the data
> points are all identical.
If all the points are identical, then there is no need to draw the candlestick at all, just a horizontal line would suffice.
In that case, after the sorting step the equation
plot->points[0].y == plot->points[N-1].y
stands. There could be a test for this just after sorting, and if it's true, then the code could draw the line and return.
> So far as I know we don't use percentile notation anywhere
> else in gnuplot. Why here?
Because it makes sense in the context. After all, the median is just the 50th percentile, and the quartiles are the 25th and 75th. It is logical (at least for me) to specify the extent of the whiskers in terms of percentiles as well.
But I admit that I have an other motive in this case: I want to add new functionality to boxplot, by adding an option to to draw a second set of percentiles as points. I usually use boxplots to display five-number statistics of datasets, that is, I set the whiskers to extend to the maximum and minimum of the dataset. By drawing points at e.g. the 10th and 90th percentile, I could easily show the seven-number statistic of the dataset.
For this extension, it'd be more consistent to have a way to specify the extent of the main whiskers in terms of percentiles as well.
Please see the attached patch. The user-accessible syntax is not final, but you can see what I'm getting at.
I've put a fix for the bug into CVS, but leaving this tracker item open as a patchset for your proposed syntax change
qBgHMh <a href="http://ezgjvsbdxbsm.com/">ezgjvsbdxbsm</a>, [url=http://cmdqalatrmcb.com/]cmdqalatrmcb[/url], [link=http://zwjszghbczjw.com/]zwjszghbczjw[/link], http://yvdyszywjrvt.com/