## matplotlib-devel

 [matplotlib-devel] Proposed modification to boxplot() From: Sajec, Mike TQO - 2006-02-15 01:30:27 ```I would like to propose modifying the boxplot function as follows: Instead of only accepting an (mxn) matrix (x) and creating (n) boxplots from the columns of (x), optionally in place of the (mxn) matrix accept a list of numeric arrays which can be any length, and create boxplots for each of the arrays in the list. The obvious benefit in doing this is that one can easily create boxplots to compare datasets with differing numbers of data-points. For example: >>from RandomArray import normal >>x1 =3D normal(10,3,100) >>x2 =3D normal(10,5,150) >>x3 =3D normal(10,1,1000) >> >>boxplot([x1, x2, x3]) The code below implements the proposed change by modifying the boxplot function axes.py. It works and is proof of concept if nothing else.=20 ##--------------------------------------------------## ## modifications to the boxplot function in axes.py ## ##--------------------------------------------------## def boxplot(self, x, notch=3D0, sym=3D'b+', vert=3D1, whis=3D1.5, positions=3DNone, widths=3DNone): """ boxplot(x, notch=3D0, sym=3D'+', vert=3D1, whis=3D1.5, positions=3DNone, widths=3DNone) Make a box and whisker plot for each column of x. Or Make a box and whisker plot for each array in list x. =20 The box extends from the lower to upper quartile values of the data, with a line at the median. The whiskers extend from the box to show the range of the data. Flier points are those past the end of the whiskers. notch =3D 0 (default) produces a rectangular box plot. notch =3D 1 will produce a notched box plot sym (default 'b+') is the default symbol for flier points. Enter an empty string ('') if you don't want to show fliers. vert =3D 1 (default) makes the boxes vertical. vert =3D 0 makes horizontal boxes. This seems goofy, but that's how Matlab did it. whis (default 1.5) defines the length of the whiskers as a function of the inner quartile range. They extend to the most extreme data point within ( whis*(75%-25%) ) data range. positions (default 1,2,...,n) sets the horizontal positions of the boxes. The ticks and limits are automatically set to match the positions. widths is either a scalar or a vector and sets the width of each box. The default is 0.5, or 0.15*(distance between extreme positions) if that is smaller. x is either:=20 (1) a Numeric array=20 (2) a list of 1-dimension Numeric arrays of any length Returns a list of the lines added """ =20 =20 if not self._hold: self.cla() holdStatus =3D self._hold whiskers, caps, boxes, medians, fliers =3D [], [], [], [], [] # CASE #1: x is a numeric array if type(x) =3D=3D type(array([0])): x =3D asarray(x) rank =3D len(x.shape) if 1 =3D=3D rank: x.shape =3D -1, 1 row, col =3D x.shape # CASE #2: x is a list of numeric arrays if type(x) =3D=3D type(list([0])): col =3D len(x) # one column for each array #reshape the vectors in list x if necessary =20 for ii in range(len(x)): rank =3D len(x[ii].shape) if 1 =3D=3D rank: x[ii].shape =3D -1, 1 =20 # get some plot info if positions is None: positions =3D range(1, col + 1) if widths is None: distance =3D max(positions) - min(positions) widths =3D min(0.15*max(distance,1.0), 0.5) if isinstance(widths, float) or isinstance(widths, int): widths =3D ones((col,), 'd') * widths # loop through columns, adding each to plot self.hold(True) for i,pos in enumerate(positions): # CASE #1: x is a numeric array=20 if type(x)=3D=3Dtype(array([0])): d =3D x[:,i] # CASE #2: x is a list of numeric arrays =20 if type(x)=3D=3Dtype(list([0])): d =3D x[i][:,0] row =3D len(d) # get median and quartiles q1, med, q3 =3D prctile(d,[25,50,75]) # get high extreme iq =3D q3 - q1 hi_val =3D q3 + whis*iq wisk_hi =3D compress( d <=3D hi_val , d ) if len(wisk_hi) =3D=3D 0: wisk_hi =3D q3 else: wisk_hi =3D max(wisk_hi) # get low extreme lo_val =3D q1 - whis*iq wisk_lo =3D compress( d >=3D lo_val, d ) if len(wisk_lo) =3D=3D 0: wisk_lo =3D q1 else: wisk_lo =3D min(wisk_lo) # get fliers - if we are showing them flier_hi =3D [] flier_lo =3D [] flier_hi_x =3D [] flier_lo_x =3D [] if len(sym) !=3D 0: flier_hi =3D compress( d > wisk_hi, d ) flier_lo =3D compress( d < wisk_lo, d ) flier_hi_x =3D ones(flier_hi.shape[0]) * pos flier_lo_x =3D ones(flier_lo.shape[0]) * pos # get x locations for fliers, whisker, whisker cap and box sides box_x_min =3D pos - widths[i] * 0.5 box_x_max =3D pos + widths[i] * 0.5 wisk_x =3D ones(2) * pos cap_x_min =3D pos - widths[i] * 0.25 cap_x_max =3D pos + widths[i] * 0.25 cap_x =3D [cap_x_min, cap_x_max] # get y location for median med_y =3D [med, med] # calculate 'regular' plot if notch =3D=3D 0: # make our box vectors box_x =3D [box_x_min, box_x_max, box_x_max, box_x_min, box_x_min ] box_y =3D [q1, q1, q3, q3, q1 ] # make our median line vectors med_x =3D [box_x_min, box_x_max] # calculate 'notch' plot else: notch_max =3D med + 1.57*iq/sqrt(row) notch_min =3D med - 1.57*iq/sqrt(row) if notch_max > q3: notch_max =3D q3 if notch_min < q1: notch_min =3D q1 # make our notched box vectors box_x =3D [box_x_min, box_x_max, box_x_max, cap_x_max, box_x_max, box_x_max, box_x_min, box_x_min, cap_x_min, box_x_min, box_x_min ] box_y =3D [q1, q1, notch_min, med, notch_max, q3, q3, notch_max, med, notch_min, q1] # make our median line vectors med_x =3D [cap_x_min, cap_x_max] med_y =3D [med, med] # vertical or horizontal plot? if vert: def doplot(*args): return self.plot(*args) else: def doplot(*args): shuffled =3D [] for i in range(0, len(args), 3): shuffled.extend([args[i+1], args[i], args[i+2]]) return self.plot(*shuffled) whiskers.extend(doplot(wisk_x, [q1, wisk_lo], 'b--', wisk_x, [q3, wisk_hi], 'b--')) caps.extend(doplot(cap_x, [wisk_hi, wisk_hi], 'k-', cap_x, [wisk_lo, wisk_lo], 'k-')) boxes.extend(doplot(box_x, box_y, 'b-')) medians.extend(doplot(med_x, med_y, 'r-')) fliers.extend(doplot(flier_hi_x, flier_hi, sym, flier_lo_x, flier_lo, sym)) # fix our axes/ticks up a little if 1 =3D=3D vert: setticks, setlim =3D self.set_xticks, self.set_xlim else: setticks, setlim =3D self.set_yticks, self.set_ylim newlimits =3D min(positions)-0.5, max(positions)+0.5 setlim(newlimits) setticks(positions) =20 # reset hold status self.hold(holdStatus) return dict(whiskers=3Dwhiskers, caps=3Dcaps, boxes=3Dboxes, medians=3Dmedians, fliers=3Dfliers) ```
 Re: [matplotlib-devel] Proposed modification to boxplot() From: John Hunter - 2006-02-15 14:31:46 ```>>>>> "Sajec," == Sajec, Mike TQO writes: Sajec> I would like to propose modifying the boxplot function as Sajec> follows: Instead of only accepting an (mxn) matrix (x) and Sajec> creating (n) boxplots from the columns of (x), optionally Sajec> in place of the (mxn) matrix accept a list of numeric Sajec> arrays which can be any length, and create boxplots for Sajec> each of the arrays in the list. Since I have never used boxplot I don't feel qualified to comment on this, so I suggest you bring it up on the user's list. If noone objects, and you send me a patch against CVS, I'll include it. Thanks, JDH ```
 [matplotlib-devel] Re: Proposed modification to boxplot() From: Jouni K Seppanen - 2006-02-16 09:13:13 ```"Sajec, Mike TQO" writes: > Instead of only accepting an (mxn) matrix (x) and creating (n) boxplots > from the columns of (x), optionally in place of the (mxn) matrix accept > a list of numeric arrays which can be any length, and create boxplots > for each of the arrays in the list. In my opinion the idea is good. You can get the same result by calling boxplot separately for each array, but then you need to keep track of the positions and fix the axis limits manually afterwards. E.g., x1 = normal(10,3,[100,3]) x2 = normal(10,5,[150,2]) x3 = normal(10,1,[1000,4]) pos = 1 for x in x1, x2, x3: newpos = pos + x.shape[1] boxplot(x, positions=range(pos, newpos)) pos = newpos axis([0.5, pos-0.5, 0, 30]) Your implementation seems to reshape each array to only have one column. I think it would be more useful and less surprising to plot each column of each array in the list, as in the example above. -- Jouni ```