Re: [Matplotlib-users] Integer equal-width bins for histograms

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yes, I understand there are alternatives -- but I still think a simple,
binned histogram is a fairly basic feature.
KDEs are nice but can easily be overtweaked (if I see one I certainly want
to know how the bandwidth was selected, otherwise it's not better than a
histogram -- even worse, as the issue is now hidden); while CDFs
(essentially, your second proposition) can be useful, some kinds of data
are traditionally represented as histograms and CDFs would only confuse
readers.
Antony

2014-05-30 15:11 GMT-07:00 Mark Voorhies <mar...@uc...>:

> On 05/30/2014 08:25 AM, Antony Lee wrote:
>
>> I can still need to bin data, e.g. when the data range is "large", or at
>> least not small compared to the number of data points.
>> Antony
>>
>
> Two alternatives to histograms that you might consider:
>
> Kernel density estimation (KDE)
>
> * This blog post has a good discussion motivating KDE from issues with bin
> choice in histograms:
>   http://www.mglerner.com/blog/?p=28
> * And this follow up explores the various KDE implementations in the
> "Scientific Python" stack:
>   http://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/
>
> A rank vs. value plot, e.g.:
>
>    plot(sorted(r))
>
> This is horizontal for peaks (lots of copies of similar values) and
> vertical for tails/gaps,
> so it presents the same information as a histogram, but without requiring
> bin choice.
>
> --Mark
>
>
>
>>
>> 2014-05-30 5:03 GMT-07:00 Yoshi Rokuko <yo...@ro...>:
>>
>>  Am Thu, 29 May 2014 14:14:52 -0700
>>> schrieb Antony Lee <ant...@be...>:
>>>
>>>  Hi,
>>>> When histogramming integer data, is there an easy way to tell
>>>> matplotlib that I want a certain number of bins, and each bin to
>>>> cover an equal number of integers (except possibly the last one)?
>>>> (in order to avoid having some bins higher than others merely because
>>>> they cover more integers) I know I can pass in an explicit bins array
>>>> (something like list(range(min, max, (max-min)//n)) + max) but I was
>>>> hoping for something simpler, like hist(data, nbins=42,
>>>> equal_integer_coverage=True). Best,
>>>> Antony
>>>>
>>>
>>> Int data is discrete. For discrete variables you don't need bins, you
>>> don't estimate the frequency distribution you know it exactly by
>>> counting.
>>>
>>> Of course you could do that with the hist function:
>>>
>>>  pl.hist(r, np.arange(min(r)-0.5, max(r)+1.5), histtype='step')
>>>>>>
>>>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Time is money. Stop wasting it! Get your web API in 5 minutes.
>>> www.restlet.com/download
>>> http://p.sf.net/sfu/restlet
>>> _______________________________________________
>>> Matplotlib-users mailing list
>>> Mat...@li...
>>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Time is money. Stop wasting it! Get your web API in 5 minutes.
>> www.restlet.com/download
>> http://p.sf.net/sfu/restlet
>>
>>
>>
>> _______________________________________________
>> Matplotlib-users mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>>
>>
>
>