Thread: [Numpy-discussion] histogram complete makeover

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I'd like to poll the list to see what people want from numpy.histogram(),
since I'm currently writing a contender.

My main complaints with the current version are:
1. upper outliers are stored in the last bin, while lower outliers are not
counted at all,
2. cannot use weights.

The new histogram function is well under way (it address these issues and
adds an axis keyword),
but I want to know what is the preferred behavior regarding the function
output, and your
willingness to introduce a new behavior that will break some code.

Given a number of bins N and range (min, max), histogram constructs linearly
spaced bin edges
b0 (out-of-range)  | b1 | b2 | b3 | .... | bN | bN+1 out-of-range
and may return:

A.  H = array([N_b0, N_b1, ..., N_bN,  N_bN+1])
The out-of-range values are the first and last values of the array. The
returned array is hence N+2

B.  H = array([N_b0 + N_b1, N_b2, ..., N_bN + N_bN+1])
The lower and upper out-of-range values are added to the first and last bin
respectively.

C.  H = array([N_b1, ..., N_bN + N_bN+1])
Current behavior: the upper out-of-range values are added to the last bin.

D.  H = array([N_b1, N_b2, ..., N_bN]),
Lower and upper out-of-range values are given after the histogram array.

Ideally, the new function would not break the common usage: H =
histogram(x)[0], so this exclude A.  B and C are not acceptable in my
opinion, so only D remains, with the downsize that the outliers are not
returned. A solution might be to add a keyword full_output=False, which when
set to True, returns the out-of-range values in a dictionnary.

Also, the current function returns -> H, ledges
where ledges is the array of left bin edges (N).
I propose returning the complete array of edges (N+1), including the
rightmost edge. This is a little bit impractical for plotting, as the edges
array does not have the same length as the histogram array, but allows the
use of user-defined non-uniform bins.

Opinions, suggestions ?

David

Thread: [Numpy-discussion] histogram complete makeover

A package for scientific computing with Python

numpy-discussion