|
From: Gökhan S. <gok...@gm...> - 2009-10-27 11:56:46
|
Hello,
Consider this sample two columns of data:
999999.9999 999999.9999
999999.9999 999999.9999
999999.9999 999999.9999
999999.9999 1693.9069
999999.9999 1676.1059
999999.9999 1621.5875
651.8040 1542.1373
691.0138 1650.4214
678.5558 1710.7311
621.5777 999999.9999
644.8341 999999.9999
696.2080 999999.9999
Putting into this data into a file say "sample.data" and loading with:
a,b = np.loadtxt('sample.data', dtype="float").T
I[16]: a
O[16]:
array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
6.51804000e+02, 6.91013800e+02, 6.78555800e+02,
6.21577700e+02, 6.44834100e+02, 6.96208000e+02])
I[17]: b
O[17]:
array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069,
1676.1059, 1621.5875, 1542.1373, 1650.4214,
1710.7311, 999999.9999, 999999.9999, 999999.9999])
### interestingly, the second column is loaded as it is but a values
reformed a little. Why this could be happening? Any idea? Anyways, back to
masked arrays:
I[24]: am = ma.masked_values(a, value=999999.9999)
I[25]: am
O[25]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
mask = [ True True True True True True False False False
False False False],
fill_value = 999999.9999)
I[30]: bm = ma.masked_values(b, value=999999.9999)
I[31]: am
O[31]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
mask = [ True True True True True True False False False
False False False],
fill_value = 999999.9999)
So far so good. A few basic checks:
I[33]: am/bm
O[33]:
masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
0.39664667346 -- -- --],
mask = [ True True True True True True False False False
True True True],
fill_value = 999999.9999)
I[34]: mean(am/bm)
O[34]: 0.41266624676580849
Unfortunately, matplotlib.mlab's prctile cannot handle this division:
I[54]: prctile(am/bm, p=[5,25,50,75,95])
O[54]:
array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06,
1.00000000e+06, 1.00000000e+06])
This also results with wrong looking box-and-whisker plots.
Testing further with scipy.stats functions yields expected correct results:
I[55]: stats.scoreatpercentile(am/bm, per=5)
O[55]: 0.40877012449846228
I[49]: stats.scoreatpercentile(am/bm, per=25)
O[49]:
masked_array(data = --,
mask = True,
fill_value = 1e+20)
I[56]: stats.scoreatpercentile(am/bm, per=95)
O[56]:
masked_array(data = --,
mask = True,
fill_value = 1e+20)
Any confirmation?
--
Gökhan
|
|
From: <jos...@gm...> - 2009-10-27 13:25:34
|
On Tue, Oct 27, 2009 at 7:56 AM, Gökhan Sever <gok...@gm...> wrote:
> Hello,
>
> Consider this sample two columns of data:
>
> 999999.9999 999999.9999
> 999999.9999 999999.9999
> 999999.9999 999999.9999
> 999999.9999 1693.9069
> 999999.9999 1676.1059
> 999999.9999 1621.5875
> 651.8040 1542.1373
> 691.0138 1650.4214
> 678.5558 1710.7311
> 621.5777 999999.9999
> 644.8341 999999.9999
> 696.2080 999999.9999
>
> Putting into this data into a file say "sample.data" and loading with:
>
> a,b = np.loadtxt('sample.data', dtype="float").T
>
> I[16]: a
> O[16]:
> array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
> 1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
> 6.51804000e+02, 6.91013800e+02, 6.78555800e+02,
> 6.21577700e+02, 6.44834100e+02, 6.96208000e+02])
>
> I[17]: b
> O[17]:
> array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069,
> 1676.1059, 1621.5875, 1542.1373, 1650.4214,
> 1710.7311, 999999.9999, 999999.9999, 999999.9999])
>
> ### interestingly, the second column is loaded as it is but a values
> reformed a little. Why this could be happening? Any idea? Anyways, back to
> masked arrays:
>
> I[24]: am = ma.masked_values(a, value=999999.9999)
>
> I[25]: am
> O[25]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
> mask = [ True True True True True True False False False
> False False False],
> fill_value = 999999.9999)
>
>
> I[30]: bm = ma.masked_values(b, value=999999.9999)
>
> I[31]: am
> O[31]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
> mask = [ True True True True True True False False False
> False False False],
> fill_value = 999999.9999)
>
>
> So far so good. A few basic checks:
>
> I[33]: am/bm
> O[33]:
> masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
> 0.39664667346 -- -- --],
> mask = [ True True True True True True False False False
> True True True],
> fill_value = 999999.9999)
>
>
> I[34]: mean(am/bm)
> O[34]: 0.41266624676580849
>
> Unfortunately, matplotlib.mlab's prctile cannot handle this division:
>
> I[54]: prctile(am/bm, p=[5,25,50,75,95])
> O[54]:
> array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06,
> 1.00000000e+06, 1.00000000e+06])
>
>
> This also results with wrong looking box-and-whisker plots.
>
>
> Testing further with scipy.stats functions yields expected correct results:
This should not be the correct results if you use scipy.stats.scoreatpercentile,
it doesn't have correct missing value handling, it treats nans or
mask/fill values as regular numbers sorted to the end.
stats.mstats.scoreatpercentile is the corresponding function for
masked arrays.
(BTW I wasn't able to quickly copy and past your example because
MaskedArrays don't seem to have a constructive __repr__, i.e.
no commas)
I don't know anything about the matplotlib story.
Josef
>
> I[55]: stats.scoreatpercentile(am/bm, per=5)
> O[55]: 0.40877012449846228
>
> I[49]: stats.scoreatpercentile(am/bm, per=25)
> O[49]:
> masked_array(data = --,
> mask = True,
> fill_value = 1e+20)
>
> I[56]: stats.scoreatpercentile(am/bm, per=95)
> O[56]:
> masked_array(data = --,
> mask = True,
> fill_value = 1e+20)
>
>
> Any confirmation?
>
>
>
>
>
>
>
> --
> Gökhan
>
> _______________________________________________
> NumPy-Discussion mailing list
> Num...@sc...
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
|
|
From: Gökhan S. <gok...@gm...> - 2009-10-28 13:47:19
|
On Tue, Oct 27, 2009 at 8:25 AM, <jos...@gm...> wrote:
> This should not be the correct results if you use
> scipy.stats.scoreatpercentile,
> it doesn't have correct missing value handling, it treats nans or
> mask/fill values as regular numbers sorted to the end.
>
> stats.mstats.scoreatpercentile is the corresponding function for
> masked arrays.
>
>
Thanks for the suggestion. I forgot the existence of such module. It yields
better results.
I[14]: st.mstats.scoreatpercentile(r, per=25)
O[14]:
masked_array(data = 0.401055201111,
mask = False,
fill_value = 1e+20)
I[17]: st.scoreatpercentile(r, per=25)
O[17]:
masked_array(data = --,
mask = True,
fill_value = 1e+20)
I usually fall into traps using masked arrays. Hopefully I will figure out
these before I make funnier mistakes in my analysis.
Besides, it would be nice to have the "per" argument accepts a sequence
instead of a one item. Like matplotlib's prctile. Using it as: ...(array,
per=[5,25,50,75,95]) in a one call.
> (BTW I wasn't able to quickly copy and past your example because
> MaskedArrays don't seem to have a constructive __repr__, i.e.
> no commas)
>
>
You can copy and paste the sample data from this link. When I copied from a
txt file into gmail into somehow distorted the original look of the data.
http://code.google.com/p/ccnworks/source/browse/trunk/sample.data
> I don't know anything about the matplotlib story.
>
> Josef
>
> >
> > I[55]: stats.scoreatpercentile(am/bm, per=5)
> > O[55]: 0.40877012449846228
> >
> > I[49]: stats.scoreatpercentile(am/bm, per=25)
> > O[49]:
> > masked_array(data = --,
> > mask = True,
> > fill_value = 1e+20)
> >
> > I[56]: stats.scoreatpercentile(am/bm, per=95)
> > O[56]:
> > masked_array(data = --,
> > mask = True,
> > fill_value = 1e+20)
> >
> >
> > Any confirmation?
> >
> >
> >
> >
> >
> >
> >
> > --
> > Gökhan
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > Num...@sc...
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> Num...@sc...
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
--
Gökhan
|