From: chtan <ch...@un...> - 2015-08-26 05:32:03
|
Hi, the outliers in the boxplot do not seem to be drawn in the following extreme scenario: Data Value: 1, Frequency: 5 Data Value: 2, Frequency: 100 Data Value: 3, Frequency: 5 Here, Q1 = Q2 = Q3, so IQR = 0. Data values 1 and 3 are therefore outliers according to the definition in the api (Refer to parameter "whis" under "boxplot": http://matplotlib.org/api/pyplot_api.html <http://matplotlib.org/api/pyplot_api.html> ) But the code below produces a boxplot that shows them as max-min whiskers (rather than fliers): import matplotlib.pyplot as plt data = 100 * [2] + 5 * [1] + 5 * [3] ax = plt.gca() bp = ax.boxplot(data, showfliers=True) for flier in bp['fliers']: flier.set(marker='o', color='gray') <http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_1.png> What I though it would look like is obtained by perturbing half of the data points 2 to 2.000001: <http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_2.png> Is this a bug or I'm not getting something right? rgds marcus -- View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
From: Paul H. <pmh...@gm...> - 2015-08-26 08:08:32
|
Are you running python 2 or python 3? If you're on python 2, what happens if you add "from __future__ import division" to the top of your script? On Tue, Aug 25, 2015 at 10:31 PM, chtan <ch...@un...> wrote: > Hi, > > the outliers in the boxplot do not seem to be drawn in the following > extreme > scenario: > Data Value: 1, Frequency: 5 > Data Value: 2, Frequency: 100 > Data Value: 3, Frequency: 5 > > Here, Q1 = Q2 = Q3, so IQR = 0. > Data values 1 and 3 are therefore outliers according to the definition in > the api > (Refer to parameter "whis" under "boxplot": > http://matplotlib.org/api/pyplot_api.html > <http://matplotlib.org/api/pyplot_api.html> ) > > But the code below produces a boxplot that shows them as max-min whiskers > (rather than fliers): > > import matplotlib.pyplot as plt > data = 100 * [2] + 5 * [1] + 5 * [3] > ax = plt.gca() > bp = ax.boxplot(data, showfliers=True) > for flier in bp['fliers']: > flier.set(marker='o', color='gray') > > <http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_1.png> > > > What I though it would look like is obtained by perturbing half of the data > points 2 to 2.000001: > > <http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_2.png> > > > Is this a bug or I'm not getting something right? > > rgds > marcus > > > > -- > View this message in context: > http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html > Sent from the matplotlib - users mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > |
From: chtan <ch...@un...> - 2015-08-27 03:35:26
|
I'm on python 2. I get the same outputs after adding "from __future__ import division". -- View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46031.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
From: Paul H. <pmh...@gm...> - 2015-08-26 08:16:25
|
Your perturbed and unperturbed scenarios draw the same figure on my machine (mpl v1.4.1). The reason why you don't get any outliers is the following: Boxplot uses matplotlib.cbook.boxplot_stats under the hood to compute where everything will be drawn. If you look in there, you'll see this little nugget: # interquartile range stats['iqr'] = q3 - q1 if stats['iqr'] == 0: whis = 'range' When whis = 'range', the whiskers fall back to extending to the min an max. So that is at least the intent of the code. Open to a different interpretation of what should be happening, though. On Wed, Aug 26, 2015 at 1:08 AM, Paul Hobson <pmh...@gm...> wrote: > Are you running python 2 or python 3? If you're on python 2, what happens > if you add "from __future__ import division" to the top of your script? > > On Tue, Aug 25, 2015 at 10:31 PM, chtan <ch...@un...> wrote: > >> Hi, >> >> the outliers in the boxplot do not seem to be drawn in the following >> extreme >> scenario: >> Data Value: 1, Frequency: 5 >> Data Value: 2, Frequency: 100 >> Data Value: 3, Frequency: 5 >> >> Here, Q1 = Q2 = Q3, so IQR = 0. >> Data values 1 and 3 are therefore outliers according to the definition in >> the api >> (Refer to parameter "whis" under "boxplot": >> http://matplotlib.org/api/pyplot_api.html >> <http://matplotlib.org/api/pyplot_api.html> ) >> >> But the code below produces a boxplot that shows them as max-min whiskers >> (rather than fliers): >> >> import matplotlib.pyplot as plt >> data = 100 * [2] + 5 * [1] + 5 * [3] >> ax = plt.gca() >> bp = ax.boxplot(data, showfliers=True) >> for flier in bp['fliers']: >> flier.set(marker='o', color='gray') >> >> <http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_1.png> >> >> >> What I though it would look like is obtained by perturbing half of the >> data >> points 2 to 2.000001: >> >> <http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_2.png> >> >> >> Is this a bug or I'm not getting something right? >> >> rgds >> marcus >> >> >> >> -- >> View this message in context: >> http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html >> Sent from the matplotlib - users mailing list archive at Nabble.com. >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Matplotlib-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-users >> > > |
From: chtan <ch...@un...> - 2015-08-27 03:43:43
|
Uh, now I understand why it's behaving this way. Tx Paul. >From the documentation, it seems natural to expect the behaviour to be uniform throughout the meaningful range for IQR. How may I go about searching for the responsible code on my own in situations like this? >From the perplexing behaviour to the little nugget in matplotlib.cbook.boxplot_stats, the path isn't clear to me. Any general advice? -- View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46032.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
From: Paul H. <pmh...@gm...> - 2015-08-27 04:37:17
|
Even though I'm familiar with the boxplot source code, I largely use IPython for quick investigations like this. In IPython, doing something like "matplotlib.Axes.boxplot??" shows the full source code for that functions\. Then I saw/remembered that boxplot now just calls matplotlib.cbook.boxplot_stats and passes the results to matplotlib.Axes.bxp. So then I did "matplotlib.cbook.boxplot_stats" to see how the whiskers were computed. -paul On Wed, Aug 26, 2015 at 8:43 PM, chtan <ch...@un...> wrote: > Uh, now I understand why it's behaving this way. Tx Paul. > > >From the documentation, it seems natural to expect the behaviour to be > uniform throughout the meaningful range for IQR. > > How may I go about searching for the responsible code on my own in > situations like this? > >From the perplexing behaviour to the little nugget in > matplotlib.cbook.boxplot_stats, the path isn't clear to me. > > Any general advice? > > > > -- > View this message in context: > http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46032.html > Sent from the matplotlib - users mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > |
From: chtan <ch...@un...> - 2015-08-28 02:44:25
|
Great, thanks! Rgds marcus -- View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46034.html Sent from the matplotlib - users mailing list archive at Nabble.com. |