I would love to see a variable line color option given to the boxplot style.
Consider the commands
set samples 20 unset key plot sample [i=0:10] "+" u (1):1:(0.5):(word("a b c",int($1)%3+1)) with boxplot
This will draw a set of three boxplots in categories "a", "b", and "c". Right now if I want to color these differently, it is necessary to do multiple plots
set samples 20 unset key set table $d plot for [j=0:2] [i=0:10] "+" u (int($1)%3==j?$1:1/0) unset table $d plot for [i=0:2] $d index i u (i+1):1:(0.5) with boxplot lc (0xff<<8*i)
or something similar to this.
I would love to be able to use a command like
plot sample [i=0:10] "+" u (1):1:(0.5):(word("a b c",int($1)%3+1)):(0xff<<8*(int($1)%3)) with boxplot lc rgb variable
to accomplish the same thing.
I'm not sure how difficult this would be to do, as the boxplot style is obviously more complex than most styles as it must group the data and preprocess it before drawing anything. I would expect that extra column would need to agree with the grouping column in the sense that all points with the same value in the grouping column must have the same value in the color column - if they differed, there isn't any sensible way to interpret it.
The long example should read (change the last plot command to use column 2 for the data):
As you note, there would be no reasonable way to handle disagreement between columns 3 (category) and 5 (color). To me this means that adding a new column is the wrong way to go. It would make more sense to use the category index to generate a linetype just as it is currently used to generate an x coordinate. That would be easy to code, but you if you wanted to control the colors individually you would have to reserve a block of successive linetypes for the purpose.
IMHO handling the individual boxplots in separate elements of the plot command, as in "plot for [box=1:N] ... lc box" is not so difficult. Is there something specific about your intended use case that makes it awkward to do this way?
I had used it to before with data using two factor variables to color the boxes different based on one of the factor variables. Something like:
where column 1 has the data and columns 2 and 3 contain factors and column 3 determines the color. Notice here that points in the same category have the same value for that color column (as the color column is partially used to build the category), but multiple categories have the same color. Here using the category to determine the color wouldn't quite work as all boxes would be different colors (unless the linetypes were redefined).
I am basically using that 'plot for' form right now, but it requires me to know how many boxes there are (which I am gettting by using a shell command) whereas the example command above would take care of that itself.
Right now I have commands of the form (note that col2 and col3 are space separated string variables containing the values of the columns):
This gives me all of my desired boxes and the boxes that have the same value in column2 have the same color as well (independent of the value in column3).
This seems more complicated than it can be, but maybe my usecase is also more complicated. I'm using the inline data to get around the bug that I reported when using the 1/0 in the boxplot (which has been fixed already in the development version), otherwise I would just put that test against the columns into the boxplot command.
The attached file contains a script and datafile that shows what I am going for by using the plot for method. The possible values for the columns are hardcoded in this script as the shell command to get these relies on python being installed and configured a certain way.
Last edit: Matthew Halverson 2015-11-01
Here is the result of that script. Notice that there are two factor variables: paint type and region. There is a box for each of the 8 combinations, but they are colored differently according to region.
The command
creates these same boxes, but there is no way to control the color differently. What I wish to be able to do is
to cause all 8 boxes to be plotted but colored differently according to column 2. Of course, the tick labels and other decoration would have to be handled manually and this command could not generate the key.
Since it turned out to be only 2 lines of code, I couldn't resists attaching "lc variable" to the boxplot category. I'm not sure this gets you all the way to your goal, but it gives you something else to play with. If you see any straightforward improvements, please comment.
So, with today's cvs source the following script generates the attached plot
Unsatisfactory things I notice:
+ the key titles don't obviously attach to the left/right set of boxplots
+ it's not clear what color the key titles should use
+ the latex/east extrema are treated as outliers; did you tweak your boxplot params to avoid this?
+ it is awkward to adjust the starting x coordinate manually
This starts to look very much like the set of problems addressed for histogram plots by the "newhistogram" command. Maybe the answer is to add a "newboxplots" command that accepts a title and a starting x coordinate or gap separating it from the previous set. That level of bookkeeping would probably require a lot more than the 2 lines of code I added today.
Yes, I did set fraction 1 on my boxplots.
That new syntax gets me very close to what I wanted and I think may be the closest that I can expect to get without some specialized tricks.
Once I created that graph for an example with the tic marks and the key, I realized that there would almost certainly be some problems with the key and the tic marks if the command that I envisioned were possible. I think the problem may be simple enough without the tic marks and key, but once those are thrown in, it may be too difficult to do without a loop. So this is probably the closest I can expect to get - at least I can create exactly the plot that I want, it just requires a loop.
I think that you are right that I am basically going for a "newboxplots" command. In fact such a command would be much more general than what I was going for. I think it would be a very welcome addition.
Last edit: Matthew Halverson 2015-11-03