From: Bruce S. <so...@ui...> - 2004-05-13 21:21:12
|
Hi, Raymond D. Hettinger is writing a general statistics module 'statistics.py A collection of functions for summarizing data' that is somewhere in a Python CVS (I can not find the exact reference but it appeared in a fairly recent Python thread). He uses a one-pass algorithm from Knuth for the variance that has good numerical stability. Below is a rather rough version modified from my situation (masked arrays) which uses Knuth's algorithm for the variance. It lacks features like checking dimensions (assumes variance can be computed) and documentation. Regards Bruce Southey import numarray def SummaryStats(Matrix): mshape=Matrix.getshape() nrows=mshape[0] ncols=mshape[1] #print nrows, ncols # Create matrices to hold statistics N_obs =numarray.zeros(ncols, type='Float64') Sum =numarray.zeros(ncols, type='Float64') Var =numarray.zeros(ncols, type='Float64') Min =numarray.zeros(ncols, type='Float64') Max =numarray.zeros(ncols, type='Float64') Mean =numarray.zeros(ncols, type='Float64') AdjM =numarray.zeros(ncols, type='Float64') NewM =numarray.zeros(ncols, type='Float64') DifM =numarray.zeros(ncols, type='Float64') for row in range(nrows): for col in range(ncols): t_value=Matrix[row,col] N_obs[col] = N_obs[col] + 1 Sum[col] = Sum[col] + t_value if t_value > Max[col]: Max[col]=t_value if t_value < Min[col]: Min[col]=t_value if N_obs[col]==1: Mean[col]=t_value AdjM[col]=(t_value-Mean[col])/(N_obs[col])-DifM[col] NewM[col]=Mean[col]+AdjM[col] DifM[col]=(NewM[col]-Mean[col])-AdjM[col] Var[col] = Var[col] + (t_value-Mean[col])*(t_value-NewM[col]) Mean[col] = NewM[col] print 'N_obs\n', N_obs print 'Sum\n', Sum print 'Mean\n', Mean print 'Var\n', Var/(nrows-1) if __name__ == '__main__': MValues=numarray.array([[1,2,1],[3,2,2],[5,1,1],[4,3,2]]) SummaryStats(MValues) ---- Original message ---- >Date: Thu, 13 May 2004 15:42:30 -0400 >From: "Perry Greenfield" <pe...@st...> >Subject: RE: [Numpy-discussion] Getting the indexes of the myarray.min() >To: "Russell E Owen" <rowen@u.washington.edu>, "numarray" <num...@li...> > >> Russell E Owen wrote: >> >> At 9:27 AM -0400 2004-05-13, Perry Greenfield wrote: >> >... One has to trade off the number of such functions >> >against the speed savings. Another example is getting max and min values >> >for an array. I've long thought that this is so often done they could >> >be done in one pass. There isn't a function that does this yet though. >> >> Statistics is another area where multiple return values could be of >> interest -- one may want the mean and std dev, and making two passes >> is wasteful (since some of the same info needs to be computed both >> times). >> >> A do-all function that computes min, min location, max, max location, >> mean and std dev all at once would be nice (especially if the >> returned values were accessed by name, rather than just being a tuple >> of values, so they could be referenced safely and readably). >> >> -- Russell >> >We will definitely add something like this for 1.0 or 1.1. >(but probably for min and max location, it will just be >for the first encountered). > >Perry > > >------------------------------------------------------- >This SF.Net email is sponsored by: SourceForge.net Broadband >Sign-up now for SourceForge Broadband and get the fastest >6.0/768 connection for only $19.95/mo for the first 3 months! >http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion |