From: David Fokkema <dfokkema@il...>  20070404 14:26:52

Hi group, I have the following ipython 'session': In [23]: data = [0, 0.4, 0.6, 1, 2, 3] In [24]: bins = [0, 1, 2, 3] In [25]: hist(data, bins, align='edge') Out[25]: (array([3, 1, 1, 1]), [0, 1, 2, 3], <a list of 4 Patch objects>) In [26]: hist(data, bins, align='center') Out[26]: (array([3, 1, 1, 1]), [0, 1, 2, 3], <a list of 4 Patch objects>) I would suspect that the histogram output from 'center' should be [2, 2, 1, 1]. Why is this not so? At least, the two should be different? I would say that with edge, my bins would be 01, 12, 23, 34, and that center should give me 0.50.5, 0.51.5, 1.52.5, 2.53.5, but this seems not to be the case??? Any help understanding this would be greatly appreciated! Thanks, David 
From: David Fokkema <dfokkema@il...>  20070404 14:42:47

On Wed, 20070404 at 16:26 +0200, David Fokkema wrote: > Hi group, > > I have the following ipython 'session': > > In [23]: data = [0, 0.4, 0.6, 1, 2, 3] > > In [24]: bins = [0, 1, 2, 3] > > In [25]: hist(data, bins, align='edge') > Out[25]: (array([3, 1, 1, 1]), [0, 1, 2, 3], <a list of 4 Patch > objects>) > > In [26]: hist(data, bins, align='center') > Out[26]: (array([3, 1, 1, 1]), [0, 1, 2, 3], <a list of 4 Patch > objects>) > > > I would suspect that the histogram output from 'center' should be [2, 2, > 1, 1]. Why is this not so? At least, the two should be different? I > would say that with edge, my bins would be 01, 12, 23, 34, and that > center should give me 0.50.5, 0.51.5, 1.52.5, 2.53.5, but this > seems not to be the case??? Any help understanding this would be greatly > appreciated! It seems that the hist function simply calls matplotlib.mlab.hist without any regard to the bins (be they edge or centered values) and passes the plotting through to the 'bar' function. This function places the bar with either the edge at the bin value or the center at the bin value. If I choose center, the result is that my histogram is calculated for edge values but the bars are placed at center values which is completely misleading and wrong! I'd say this is a bug, but I may be overlooking something here... Thanks, David > > Thanks, > > David > > >  > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveysand earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Matplotlibusers mailing list > Matplotlibusers@... > https://lists.sourceforge.net/lists/listinfo/matplotlibusers > 
From: <jks@ik...>  20070406 15:33:09

David Fokkema <dfokkema@...> writes: > If I choose center, the result is that my histogram is calculated > for edge values but the bars are placed at center values which is > completely misleading and wrong! I'd say this is a bug, but I may be > overlooking something here... Looks like a bug to me. Could you file it at http://sf.net/tracker/?group_id=80706&atid=560720 so it isn't forgotten?  Jouni K. Seppänen http://www.iki.fi/jks 
From: David Fokkema <dfokkema@il...>  20070408 17:26:11

On Fri, 20070406 at 18:32 +0300, Jouni K. Sepp=C3=A4nen wrote: > David Fokkema <dfokkema@...> writes: >=20 > > If I choose center, the result is that my histogram is calculated > > for edge values but the bars are placed at center values which is > > completely misleading and wrong! I'd say this is a bug, but I may be > > overlooking something here... >=20 > Looks like a bug to me. Could you file it at > http://sf.net/tracker/?group_id=3D80706&atid=3D560720 > so it isn't forgotten? Well... It couldn't be too hard to fix, I guess... I know python, I tracked down the source, I could try and fix it, right? I think I'll have the time next Tuesday, so hopefully I'll file a bug report with an attached patch, ;) 
From: David Fokkema <dfokkema@il...>  20070410 09:43:33

On Sun, 20070408 at 19:25 +0200, David Fokkema wrote: > On Fri, 20070406 at 18:32 +0300, Jouni K. Sepp=C3=A4nen wrote: > > David Fokkema <dfokkema@...> writes: > >=20 > > > If I choose center, the result is that my histogram is calculated > > > for edge values but the bars are placed at center values which is > > > completely misleading and wrong! I'd say this is a bug, but I may be > > > overlooking something here... > >=20 > > Looks like a bug to me. Could you file it at > > http://sf.net/tracker/?group_id=3D80706&atid=3D560720 > > so it isn't forgotten? >=20 > Well... It couldn't be too hard to fix, I guess... I know python, I > tracked down the source, I could try and fix it, right? I think I'll > have the time next Tuesday, so hopefully I'll file a bug report with an > attached patch, ;) I fixed the bug, I think. At least it's working on my system and I think it is not invasive. Comments please? I'll send it upstream otherwise...  matplotlib/axes.py.orig 20070410 10:58:30.000000000 +0200 +++ matplotlib/axes.py 20070410 11:14:56.000000000 +0200 @@ 4149,7 +4149,7 @@ hist bars """ if not self._hold: self.cla()  n, bins =3D matplotlib.mlab.hist(x, bins, normed) + n, bins =3D matplotlib.mlab.hist(x, bins, normed, align) if width is None: width =3D 0.9*(bins[1]bins[0]) if orientation =3D=3D 'horizontal': patches =3D self.barh(bins, n, height=3Dwidth, left=3Dbottom, align=3Dalign)  matplotlib/mlab.py.orig 20070410 11:16:23.000000000 +0200 +++ matplotlib/mlab.py 20070410 11:24:48.000000000 +0200 @@ 597,7 +597,7 @@ #S =3D 1.0*asum(p*log(p)) return S =20 def hist(y, bins=3D10, normed=3D0): +def hist(y, bins=3D10, normed=3D0, align=3D'edge'): """ Return the histogram of y with bins equally sized bins. If bins is an array, use the bins. Return value is @@ 626,11 +626,16 @@ dy =3D (ymaxymin)/bins=20 bins =3D ymin + dy*arange(bins) =20 + if align =3D=3D 'center': + hw =3D .5*(bins[1]bins[0]) + nbins =3D [xhw for x in bins] + else: + nbins =3D bins =20  n =3D searchsorted(sort(y), bins) + n =3D searchsorted(sort(y), nbins) n =3D diff(concatenate([n, [len(y)]])) if normed:  db =3D bins[1]bins[0] + db =3D nbins[1]nbins[0] return 1/(len(y)*db)*n, bins else: return n, bins Thanks, David 
From: <jks@ik...>  20070410 13:37:54

David Fokkema <dfokkema@...> writes: > I fixed the bug, I think. At least it's working on my system and I think > it is not invasive. Comments please? I'll send it upstream otherwise... Does this handle the case where the user has specified bins of different widths? It looks like you are only using the width of the first bin: > + if align == 'center': > + hw = .5*(bins[1]bins[0]) > + nbins = [xhw for x in bins] > + else: > + nbins = bins At least, I've always thought that unequal bins are allowed, but from the following it seems that the probability density support also makes an incompatible assumption: > if normed: >  db = bins[1]bins[0] > + db = nbins[1]nbins[0] > return 1/(len(y)*db)*n, bins  Jouni K. Seppänen http://www.iki.fi/jks 
From: David Fokkema <dfokkema@il...>  20070410 13:51:51

On Tue, 20070410 at 16:13 +0300, Jouni K. Sepp=C3=A4nen wrote: > David Fokkema <dfokkema@...> writes: >=20 > > I fixed the bug, I think. At least it's working on my system and I thin= k > > it is not invasive. Comments please? I'll send it upstream otherwise... >=20 > Does this handle the case where the user has specified bins of > different widths? It looks like you are only using the width of the > first bin: >=20 > > + if align =3D=3D 'center': > > + hw =3D .5*(bins[1]bins[0]) > > + nbins =3D [xhw for x in bins] > > + else: > > + nbins =3D bins I am only using the first width, indeed. If different widths are allowed (and why not?) some more coding has to be done. On the other hand, I use align =3D 'center' because I have a detector which samples timing information at 20 ns intervals. So, when binning timing information, I only have 0, 20 ns, 40 ns, 60 ns and so forth. Being able to specify those values as the center of my bin instead of calculating edges myself (10, 10, 30, 50, etc.) is very useful. I can't think of an application where you have bins of different widths and you want to center the values... Furthermore, when plotting the histogram, the function calculates the width of the bars only for the first bin and uses that for all bars. So I think this function is highly unstable when you use variable sized bins. Matplotlib.mlab.hist only calculates the histogram and can only use variable sized bins when you want a simple histogram. Maybe add something to this effect in the docstring? >=20 > At least, I've always thought that unequal bins are allowed, but from > the following it seems that the probability density support also makes > an incompatible assumption: >=20 > > if normed: > >  db =3D bins[1]bins[0] > > + db =3D nbins[1]nbins[0] > > return 1/(len(y)*db)*n, bins >=20 It does... David 
From: <jks@ik...>  20070410 16:03:40

David Fokkema <dfokkema@...> writes: > I can't think of an application where you have bins of different > widths and you want to center the values... Actually, now that I think about it, there is not enough information in the bin centers to know the widths of the bins if they may vary. For example, if your bin edges are (2, 4, 8, 16, 32) or (1, 5, 7, 17, 31), you get the same bin centers (3, 6, 12, 24). Perhaps it's best to disallow variablewidth bins when align='center'.  Jouni K. Seppänen http://www.iki.fi/jks 
From: David Fokkema <dfokkema@il...>  20070412 12:32:51

On Tue, 20070410 at 19:03 +0300, Jouni K. Sepp=C3=A4nen wrote: > David Fokkema <dfokkema@...> writes: >=20 > > I can't think of an application where you have bins of different > > widths and you want to center the values... >=20 > Actually, now that I think about it, there is not enough information > in the bin centers to know the widths of the bins if they may vary. > For example, if your bin edges are (2, 4, 8, 16, 32) or (1, 5, 7, 17, > 31), you get the same bin centers (3, 6, 12, 24). Perhaps it's best to > disallow variablewidth bins when align=3D'center'. Of course! Nice example, ;) I've changed the documentation strings. What do you think of this patch? Shall I send it upstream as a bug report with attached patch? David  matplotlib/axes.py.orig 20070412 09:52:47.000000000 +0200 +++ matplotlib/axes.py 20070412 14:26:08.000000000 +0200 @@ 4137,19 +4137,21 @@ n/(len(x)*dbin) =20 align =3D 'edge'  'center'. Interprets bins either as edge  or center values + or center values. If 'center', the bins are interpreted as equally + sized. =20 orientation =3D 'horizontal'  'vertical'. If horizontal, barh will be used and the "bottom" kwarg will be the left edges. =20 width: the width of the bars. If None, automatically compute  the width. + the width. If align =3D 'center', the bins are interpreted as + equally sized. =20 kwargs are used to update the properties of the hist bars """ if not self._hold: self.cla()  n, bins =3D matplotlib.mlab.hist(x, bins, normed) + n, bins =3D matplotlib.mlab.hist(x, bins, normed, align) if width is None: width =3D 0.9*(bins[1]bins[0]) if orientation =3D=3D 'horizontal': patches =3D self.barh(bins, n, height=3Dwidth, left=3Dbottom, align=3Dalign)  matplotlib/mlab.py.orig 20070412 09:52:47.000000000 +0200 +++ matplotlib/mlab.py 20070412 14:25:30.000000000 +0200 @@ 597,7 +597,7 @@ #S =3D 1.0*asum(p*log(p)) return S =20 def hist(y, bins=3D10, normed=3D0): +def hist(y, bins=3D10, normed=3D0, align=3D'edge'): """ Return the histogram of y with bins equally sized bins. If bins is an array, use the bins. Return value is @@ 605,7 +605,12 @@ =20 If normed is False, return the counts in the first element of the return tuple. If normed is True, return the probability density  n/(len(y)*dbin) + n/(len(y)*dbin). If normed is True, the bins are interpreted as + equally sized. + + align =3D 'edge'  'center'. Interprets bins either as edge + or center values. If 'center', the bins are interpreted as equally + sized. =20 If y has rank>1, it will be raveled Credits: the Numeric 22 documentation @@ 626,11 +631,16 @@ dy =3D (ymaxymin)/bins=20 bins =3D ymin + dy*arange(bins) =20 + if align =3D=3D 'center': + hw =3D .5*(bins[1]bins[0]) + nbins =3D [xhw for x in bins] + else: + nbins =3D bins =20  n =3D searchsorted(sort(y), bins) + n =3D searchsorted(sort(y), nbins) n =3D diff(concatenate([n, [len(y)]])) if normed:  db =3D bins[1]bins[0] + db =3D nbins[1]nbins[0] return 1/(len(y)*db)*n, bins else: return n, bins 