Menu

Histogram

CMRP Software
Attachments
image.png (14289 bytes)
image2.png (24653 bytes)
image3.png (34637 bytes)
image4.png (13147 bytes)
image5.png (52253 bytes)

Histogram

In statistics, a histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous.

in wikipedia - en
see also wikipedia - pt

Algorithm for histogram done in Python (matplotlib)

There'a a direct function to plot an histogram in most 2d plot libraries. In Matplotlib this is done with the function hist. The most basic parameters are the actual data, number of bins and color. The following code will show a series of examples with these and other parameters.

from __future__ import division 
import numpy as np
import matplotlib.pyplot as plt

a = np.random.normal(0,1,1000)

plt.hist(a,bins=30,color='green',alpha=0.3)
plt.show()

And the result is (alpha is the color transparency being 1 completely opaque):

By default the y-axis is the absolute values in each bin. This can be changed with the parameter normed which by definiton is:

This keyword is deprecated in Numpy 1.6 due to confusing/buggy behavior. It will be removed in Numpy 2.0. Use the density keyword instead. If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that this latter behavior is known to be buggy with unequal bin widths; use density instead.

It's important if the you want to compare two data sets that do not have the same number samples (leading to some bar being much taller that other and not necessarily more probable).

from __future__ import division 
import numpy as np
import matplotlib.pyplot as plt

a = np.random.normal(0,1,1000)
b = np.random.normal(0,1,1000)

plt.hist([a,b],bins=30,color=['green','red'],normed=True,alpha=0.3)
plt.grid() # This show a grid in the plot
plt.show()

And the result is:

You can calculate the histogram parameters directly by numpy and than plot the distribution using other tools in matplotlib.

from __future__ import division 
import numpy as np
import matplotlib.pyplot as plt

a = np.random.normal(0,1,1000)
h = np.histogram(a,bins=30,normed=True)

from scipy import interpolate  # using scipy for cubic interpolation of distribution
import scipy

f = scipy.interpolate.interp1d(h[1][:-1],h[0],kind='cubic')
xnew = np.linspace(h[1][0],h[1][-2],1000)
plt.fill_between(xnew,f(xnew),color='green',alpha=0.5)
plt.xlim(h[1].min(),h[1][-2])  # Showing the plot in this x range (showing, not calculating)
plt.ylim(0,h[0].max()+0.1*h[0].max()) # Showing the plot in this y range (showing, not calculating)
plt.grid() # shows grid
plt.show()

And the result is:

There's an histtype argument that let's you change the style of the histogram (known styles are: 'bar','step','stepfilled'). Also you can use range to limit the histogram calculation to a user determined range.

from __future__ import division 
import numpy as np
import matplotlib.pyplot as plt


a = np.random.normal(0,1,1000)

plt.hist(a,bins=30,color='green',histtype='stepfilled',range=[-1,1],alpha=0.3)
plt.show()

The result is:

Also notice that you can also mix histogram plots with other kinds of plots. In the following example we have point, lines and text being plotted in the same window.

from __future__ import division 
import numpy as np
import matplotlib.pyplot as plt


a = np.random.normal(0,1,1000)

h = plt.hist(a,bins=30,color='green',histtype='stepfilled',normed=True,alpha=0.3)
half = (h[1][1]-h[1][0])/2
plt.plot(h[1][:-1]+half,h[0],linestyle='--',color='red',alpha=0.7)
plt.scatter(h[1][:-1]+half,h[0],marker='o',color='red',s=h[0]*900,alpha=0.7)
plt.xlim(h[1].min(),h[1].max())
plt.ylim(0,h[0].max()+0.1*h[0].max())
bbox_props = dict(boxstyle="round",fc="w",ec="0.5",alpha=0.9)
for i in xrange(0,h[0].shape[0],3):
    plt.text(h[1][i]+half,h[0][i],'%.2f'%h[1][i],ha='center',size=15,bbox=bbox_props,rotation=0)
plt.grid()
plt.show()

The result is:

See also