|
From: Antony L. <ant...@be...> - 2014-05-30 22:23:07
|
Yes, I understand there are alternatives -- but I still think a simple, binned histogram is a fairly basic feature. KDEs are nice but can easily be overtweaked (if I see one I certainly want to know how the bandwidth was selected, otherwise it's not better than a histogram -- even worse, as the issue is now hidden); while CDFs (essentially, your second proposition) can be useful, some kinds of data are traditionally represented as histograms and CDFs would only confuse readers. Antony 2014-05-30 15:11 GMT-07:00 Mark Voorhies <mar...@uc...>: > On 05/30/2014 08:25 AM, Antony Lee wrote: > >> I can still need to bin data, e.g. when the data range is "large", or at >> least not small compared to the number of data points. >> Antony >> > > Two alternatives to histograms that you might consider: > > Kernel density estimation (KDE) > > * This blog post has a good discussion motivating KDE from issues with bin > choice in histograms: > http://www.mglerner.com/blog/?p=28 > * And this follow up explores the various KDE implementations in the > "Scientific Python" stack: > http://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/ > > A rank vs. value plot, e.g.: > > plot(sorted(r)) > > This is horizontal for peaks (lots of copies of similar values) and > vertical for tails/gaps, > so it presents the same information as a histogram, but without requiring > bin choice. > > --Mark > > > >> >> 2014-05-30 5:03 GMT-07:00 Yoshi Rokuko <yo...@ro...>: >> >> Am Thu, 29 May 2014 14:14:52 -0700 >>> schrieb Antony Lee <ant...@be...>: >>> >>> Hi, >>>> When histogramming integer data, is there an easy way to tell >>>> matplotlib that I want a certain number of bins, and each bin to >>>> cover an equal number of integers (except possibly the last one)? >>>> (in order to avoid having some bins higher than others merely because >>>> they cover more integers) I know I can pass in an explicit bins array >>>> (something like list(range(min, max, (max-min)//n)) + max) but I was >>>> hoping for something simpler, like hist(data, nbins=42, >>>> equal_integer_coverage=True). Best, >>>> Antony >>>> >>> >>> Int data is discrete. For discrete variables you don't need bins, you >>> don't estimate the frequency distribution you know it exactly by >>> counting. >>> >>> Of course you could do that with the hist function: >>> >>> pl.hist(r, np.arange(min(r)-0.5, max(r)+1.5), histtype='step') >>>>>> >>>>> >>> >>> ------------------------------------------------------------ >>> ------------------ >>> Time is money. Stop wasting it! Get your web API in 5 minutes. >>> www.restlet.com/download >>> http://p.sf.net/sfu/restlet >>> _______________________________________________ >>> Matplotlib-users mailing list >>> Mat...@li... >>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users >>> >>> >> >> >> ------------------------------------------------------------ >> ------------------ >> Time is money. Stop wasting it! Get your web API in 5 minutes. >> www.restlet.com/download >> http://p.sf.net/sfu/restlet >> >> >> >> _______________________________________________ >> Matplotlib-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-users >> >> > > |