You can subscribe to this list here.
| 2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
| 2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
| 2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
| 2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
| 2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
| 2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
|
From: Travis O. <oli...@ie...> - 2005-12-02 18:36:35
|
Perry Greenfield wrote: > > >For us, probably not critical since we have to do some rewriting anyway. >(But it would be nice to retain for a while as deprecated). > > Easy enough to do by defining an actual record array (however, see below). I've been retaining backwards compatibility in other ways while not documenting it. For example, you can actually now pass in strings like 'Int32' for types. >But what about field names that don't map well to attributes? >I haven't had a chance to reread the past emails but I seem to >recall this was a significant issue. That would imply that .field() >would be needed for those cases anyway. > > What I'm referring to as the solution here is a slight modification to what Perry described. In other words, all arrays have the attribute .fields You can set this attribute to a dictionary which will automagically gives field names to any array (this dictionary has ordered lists of 'names', (optionally) 'titles', and "(data-descr, [offset])" lists which defines the mapping. If offset is not given, then the "next-available" offset is assumed. The data-descr is either 1) a data-type or 2) a tuple of (data-type, shape). The data-type is either a defined data-type or alias, or an object with a .fields attribute that provides the same dictionary and an .itemsize attribute that computes the total size of the data-type. You can get this attribute which returns a special fields object (written in Python initially like the flags attribute) that can look up field names like a dictionary, or with attribute access for names that are either 1) acceptable or 2) have a user-provided "python-name" associated with them. Thus, .fields['home address'] would always work but .fields.hmaddr would only work if the user had previously made the association hmaddr -> 'home address' for the data type of this array. Thus 'home address' would be a title but hmaddr would be the name. The records module would simply provide functions for making record arrays and a record data type. Driving my thinking is the concept that the notion of a record array is really a description of the data type of the array (not the array itself). Thus, all the fields information should really just be part of the data type itself. Now, I don't really want to create and register a new data type every time somebody has a new record layout. So, I've been re-thinking the notion of "registering a data-type". It seems to me that while it's O.K. to have a set of pre-defined data types. The notion of data-type ought to be flexible enough to allow the user to define one "on-the-fly". I'm thinking of ways to do this right now. Any suggestions are welcome. -Travis |
|
From: Travis O. <oli...@ie...> - 2005-12-02 17:13:15
|
> I'm not clear as to what the current design objective is and so I'll > try to recap and perhaps expand my pieces in the referenced discussion > to set out the sort of arrangement I would like to see. I have two objectives: 1) Make the core scipy array object flexible enough to support a very good records sub-class. In other works, I wonder if the core scipy array object could be made flexible enough to be used as a decent record array by itself, without adding much difficulty. In the process, I'm really trying to understand how the data-type of an array should be generally considered. An array object that has this generic perspective on data-type is what should go into Python, I believe. 2) Make a (more) useful records subclass of the ndarray object that is perhaps easier for the end-user to use. Involved with this, of course, is making functions that make it easy to create a records sub-class. > > We are moving towards having a multi-dimensional array which can hold > objects of fixed size and type, the smallest being one byte (although > the void would appear to be a collection of no size objects). Most of > the need, and thus the focus, is on numeric objects, ranging in size > from Int8 to Complex64. > > The Record is a fixed size object containing fields. Each field has a > name, an optional title and data of a fixed type (perhaps including > another record instance and maybe arrays of fixed size?). Right, so the record is really another kind of data-type. The concept of the multidimensional array does not need adjustment, but the concept of what constitutes a data-type may need some fixing up. > > In the example below, AddressRecord and PersonRecord would be > sub-classes of Record where the fields are named and, optionally, > field titles given. The names would be consistent with Python naming > whereas the title could be any Python string. I like the notion of titles and names. I think they are both useful. > > The use of attributes raises the possibility that one could have > nested records. For example, suppose one has an address record: Now, I'm in favor of attribute access. But, nested records is possible without attribute access (it's just extra typing). It's the underlying structure that provides the possibility for nested records (and that's what I'm trying to figure out how to support, generally). If I can support this generally in the basic ndarray object by augmenting the notion of data-type as appropriate, then making a subclass that has the nice syntatic sugar is easy. So, there are two issues really. 1) How to think about the data-type of a general ndarray object in order to support nested records in a straightforward way. 2) What should the syntatic sugar of a record array subclass be... I suppose a third is 3) How much of the syntatic sugar should be applied to all ndarray's? -Travis > I see no need to have the attribute 'field' and would like to avoid > the use of strings to identify a record component. This does require > that fields be named as Python identifiers but is this restriction a > killer? For a record array subclass that may be true. But, as was raised by others in the previous thread, there are problems of "name-space" collision with the methods and attributes of the array that would prevent certain names from being used (and any additions to the methods and attributes of the array would cause incompatibilities with some-people's records). But, at this point, I like the readability of the attribute access approach and could support it. -Travis |
|
From: Colin J. W. <cj...@sy...> - 2005-12-02 16:51:18
|
Travis Oliphant wrote: > Christopher Hanley wrote: > >> Hi Travis, >> >> About a year ago (summer 2004) on the numpy distribution list there >> was a lot of discussion of the records interface. I will dig through >> my notes and put together a summary. >> >> > Thanks for the pointers. I had forgotten about that discussion. I > went back and re-read the thread. > > Here's a good link for others to re-read (the end of) this thread: > > http://news.gmane.org/find-root.php?message_id=%3cBD22BAC0.E9EB%25perry%40stsci.edu%3e > > > I think some very good points were made. These points should be > addressed from the context of scipy arrays which now support records > in a very basic way. Because of this, we can support nested records > of records --- but how is this to be presented to the user is still an > open question (i.e. how do you build one...) > > I've finally been converted to believe that the notion of records is > very important because it speaks of how to do the basic (typeless, > mathless) array object that will go into Python correctly If we can > get the general records type done right, then all the other types are > examples of it. > > Thus, I would like to revive discussion of the record object for > inclusion in scipy core. I pretty much agree with the semantics that > Perry described in his final email (is this all implemented in > numarray, yet?), except I would agree with Francesc Alted that a > titles or labels concept should be allowed. > I'm more enthusiastic about code than discussion, so I'm hoping for a > short-lived discussion followed by actual code. I'm ready to do the > implementation this week (I've already borrowed lots of great code > from numarray which makes it easier), but feel free to chime in even > if you read this later. > > In my mind, the discussion about the records array is primarily a > discussion about the records data-type. The way I'm thinking, the > scipy ndarray is a homogeneous collection of the same "thing." The > big change in scipy core is that Numeric used to allow only certain > data types, but now the ndarray can contain an arbitrary "void" data > type. You can also add data-types to scipy core. These data-types > are "almost" full members of the scipy data-type community. The > "almost" is because the N*N casting matrix is not updated (this would > require a re-design of how casting is considered). At some point, > I'd like to fix this wart and make it so that data-types can be added > at will -- I think if we get the record type right, I'll be able to > figure out how to do this. > > We need to add a "record" data-type to scipy. Then, any array can be > of "record" type, and there will be an additional "array scalar" that > is what is returned when selecting a single element from the array. > So, a record array would simply be an array of "records" plus some > extra stuff for dealing with the mapping from field names to actual > segments of the array element (we may decide that this mapping is > general enough that all scipy arrays should have the capability of > assigning names to sub-bytes of its main data-type and means of > accessing those sub-bytes in which case the subclass is unnecessary). > Let me explain further: Right now, the machinery is in place in > scipy_core to get and set in any ndarray (regardless of its data-type) > an arbitrary "field". A "field" in this context is defined as a > sub-section of the basic element making up the array. Generically > the sub-section is defined by an offset and a data-type or a tuple of > a data type and a shape (to allow sub-arrays in a record). What I > understand the user to want is the binding of a name to this generic > sub-section descriptor. > 1) Should we allow that for every scipy ndarray: complex data types > have an obvious binding, would anybody want to name the first two > bytes of their int32 array? I suggest holding off on this one until a > records array is working.... > > 2) Supposing we don't go with number 1, we need to design a record > data type that has this name-binding capability. > > The recarray class in scipy core SVN essentially just does this. > > Question: How important is backwards compatibility with old numarray > specification. In particular, I would go with the .fields access > described by Perry, and eliminate the .field() approach? > I feel that it is not particularly important. Having a good design is the thing to shoot for. > > Thanks for reading and any comments you can make. > > -Travis > I'm not clear as to what the current design objective is and so I'll try to recap and perhaps expand my pieces in the referenced discussion to set out the sort of arrangement I would like to see. We are moving towards having a multi-dimensional array which can hold objects of fixed size and type, the smallest being one byte (although the void would appear to be a collection of no size objects). Most of the need, and thus the focus, is on numeric objects, ranging in size from Int8 to Complex64. The Record is a fixed size object containing fields. Each field has a name, an optional title and data of a fixed type (perhaps including another record instance and maybe arrays of fixed size?). In the example below, AddressRecord and PersonRecord would be sub-classes of Record where the fields are named and, optionally, field titles given. The names would be consistent with Python naming whereas the title could be any Python string. The use of attributes raises the possibility that one could have nested records. For example, suppose one has an address record: addressRecord streetNumber streetName postalCode ... There could then be a personal record: personRecord ... officeAddress homeAddress ... One could address a component as rec.homeAddress.postalCode. Suppose one has a (n, n) array of persons then one could access the data in the following ways: persons[1] all records in the second row persons[:,1] all records in the second column persons[1, 1] return a specific person record persons[1, 1].homeAddress the home address record for a specific person persons[1, 1].homeAddress.postalCode the postal code for a specific person persons.homeAddress.postalCode an (n, n) array containing all postal codes persons.homeAddress.postalCode.title could be 'Zip Code' I see no need to have the attribute 'field' and would like to avoid the use of strings to identify a record component. This does require that fields be named as Python identifiers but is this restriction a killer? Colin W. |
|
From: Andrew S. <str...@as...> - 2005-12-02 16:22:00
|
Perry Greenfield wrote: >Travis Oliphant wrote: > > >>Question: How important is backwards compatibility with old numarray >>specification. In particular, I would go with the .fields access >>described by Perry, and eliminate the .field() approach? >> >> >> >For us, probably not critical since we have to do some rewriting anyway. >(But it would be nice to retain for a while as deprecated). > >But what about field names that don't map well to attributes? >I haven't had a chance to reread the past emails but I seem to >recall this was a significant issue. That would imply that .field() >would be needed for those cases anyway. > > I haven't read the above thread extensively, but the issue of field names that don't map well to attributes is significant. For example, users of pytables often have columns with names that are not valid Python names. So, regardless of what solution is the most obvious, there should at least be a way to get as such field names. (pytables users are used to doing that.) Cheers! Andrew |
|
From: Perry G. <pe...@st...> - 2005-12-02 13:30:52
|
Travis Oliphant wrote: > > Thus, I would like to revive discussion of the record object for > inclusion in scipy core. I pretty much agree with the semantics that > Perry described in his final email (is this all implemented in numarray, > yet?), No, it was only talk to date, with plans to implment it, but other things have drawn our work up to now. > Question: How important is backwards compatibility with old numarray > specification. In particular, I would go with the .fields access > described by Perry, and eliminate the .field() approach? > For us, probably not critical since we have to do some rewriting anyway. (But it would be nice to retain for a while as deprecated). But what about field names that don't map well to attributes? I haven't had a chance to reread the past emails but I seem to recall this was a significant issue. That would imply that .field() would be needed for those cases anyway. Perry |
|
From: Travis O. <oli...@ee...> - 2005-12-02 01:11:30
|
Christopher Hanley wrote: >Hi Travis, > >About a year ago (summer 2004) on the numpy distribution list there was >a lot of discussion of the records interface. I will dig through my >notes and put together a summary. > > Thanks for the pointers. I had forgotten about that discussion. I went back and re-read the thread. Here's a good link for others to re-read (the end of) this thread: http://news.gmane.org/find-root.php?message_id=%3cBD22BAC0.E9EB%25perry%40stsci.edu%3e I think some very good points were made. These points should be addressed from the context of scipy arrays which now support records in a very basic way. Because of this, we can support nested records of records --- but how is this to be presented to the user is still an open question (i.e. how do you build one...) I've finally been converted to believe that the notion of records is very important because it speaks of how to do the basic (typeless, mathless) array object that will go into Python correctly If we can get the general records type done right, then all the other types are examples of it. Thus, I would like to revive discussion of the record object for inclusion in scipy core. I pretty much agree with the semantics that Perry described in his final email (is this all implemented in numarray, yet?), except I would agree with Francesc Alted that a titles or labels concept should be allowed. I'm more enthusiastic about code than discussion, so I'm hoping for a short-lived discussion followed by actual code. I'm ready to do the implementation this week (I've already borrowed lots of great code from numarray which makes it easier), but feel free to chime in even if you read this later. In my mind, the discussion about the records array is primarily a discussion about the records data-type. The way I'm thinking, the scipy ndarray is a homogeneous collection of the same "thing." The big change in scipy core is that Numeric used to allow only certain data types, but now the ndarray can contain an arbitrary "void" data type. You can also add data-types to scipy core. These data-types are "almost" full members of the scipy data-type community. The "almost" is because the N*N casting matrix is not updated (this would require a re-design of how casting is considered). At some point, I'd like to fix this wart and make it so that data-types can be added at will -- I think if we get the record type right, I'll be able to figure out how to do this. We need to add a "record" data-type to scipy. Then, any array can be of "record" type, and there will be an additional "array scalar" that is what is returned when selecting a single element from the array. So, a record array would simply be an array of "records" plus some extra stuff for dealing with the mapping from field names to actual segments of the array element (we may decide that this mapping is general enough that all scipy arrays should have the capability of assigning names to sub-bytes of its main data-type and means of accessing those sub-bytes in which case the subclass is unnecessary). Let me explain further: Right now, the machinery is in place in scipy_core to get and set in any ndarray (regardless of its data-type) an arbitrary "field". A "field" in this context is defined as a sub-section of the basic element making up the array. Generically the sub-section is defined by an offset and a data-type or a tuple of a data type and a shape (to allow sub-arrays in a record). What I understand the user to want is the binding of a name to this generic sub-section descriptor. 1) Should we allow that for every scipy ndarray: complex data types have an obvious binding, would anybody want to name the first two bytes of their int32 array? I suggest holding off on this one until a records array is working.... 2) Supposing we don't go with number 1, we need to design a record data type that has this name-binding capability. The recarray class in scipy core SVN essentially just does this. Question: How important is backwards compatibility with old numarray specification. In particular, I would go with the .fields access described by Perry, and eliminate the .field() approach? Thanks for reading and any comments you can make. -Travis |
|
From: Todd M. <jm...@st...> - 2005-12-01 21:40:07
|
Sebastian Haase wrote: >(this was posted to SciPy on 2005-11-16 16:37 - maybe I got lost ;-) >Todd, >I'm just thinking of a nice feature that I think is now part of new > scipyCore: Mixing index ranges in one axis with index lists in another. >Example: > I have index 4,7,9 that I'm intrested in: use a[ [4,7,9] ] > If I want all section I obviously just say a[ : ] >But what do I do in the 2d case where I want 4,7,9 in one axis and all in the >other ? I understood that the new scipyCore allows a[:, [4,7,9]] >whereas numarray gives an error !? > > Yep, my impression is that the indexing in scipy newcore is an improvement over numarray which was itself a functional improvement over Numeric. I'm pretty sure fixing this in numarray is not a simple hack so, for now anyway, it won't get fixed. I wish I had better news... Regards, Todd >Thanks, >Sebastian Haase > >On Friday 04 November 2005 14:57, Todd Miller wrote: > > >>Sebastian Haase wrote: >> >> >>>Also I always need to thank Todd et al. for numarray which we are using >>>for about 4 years now. >>> >>> >>I'm glad you found numarray useful. >> >> >> >>>I was following - I thought - all the postings here, but I don't remember >>>when and what the reason was when a.type() changed to a.dtype (also >>>there is a "dtypecode" somewhere !?). Any reference or explanation would >>>be great. I have to say that the (old) parenthesis where always quite >>>"annoying" ! ;-) >>> >>>Question: does the way allow assignments like "a.dtype = Float32". >>>What does it do ? If not, is it raising an error (I had 2 different people >>>yesterday who tried to assign to a.type here in our lab ...) >>> >>>Also is this now completely supported/tested and suggested for numarray ? >>>(For the time numarray is still separate) >>> >>> >>I'm adding support for some of newcore's new interface features out of >>desire to make it easier to migrate. Our intent is to make it possible >>to write newcore code and run it on numarray now as newcore matures. >>Not every newcore feature is going to be supported, but we'll make an >>effort to support those which are easy to implement. Let me know is >>there's some newcore idiom you want to use that numarray doesn't have yet. >> >>Regards, >>Todd >> >> > >_______________________________________________ >SciPy-user mailing list >Sci...@sc... >http://www.scipy.net/mailman/listinfo/scipy-user > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > |
|
From: Perry G. <pe...@st...> - 2005-12-01 20:30:46
|
On Dec 1, 2005, at 3:19 PM, Sebastian Haase wrote: > (this was posted to SciPy on 2005-11-16 16:37 - maybe I got lost ;-) > Todd, > I'm just thinking of a nice feature that I think is now part of new > scipyCore: Mixing index ranges in one axis with index lists in > another. > Example: > I have index 4,7,9 that I'm intrested in: use a[ [4,7,9] ] > If I want all section I obviously just say a[ : ] > But what do I do in the 2d case where I want 4,7,9 in one axis and all > in the > other ? I understood that the new scipyCore allows a[:, [4,7,9]] > whereas numarray gives an error !? > > > Thanks, > Sebastian Haase If the question is whether we will be adding that feature to numarray, the answer is no. We don't have close to the resources to do that and work on porting our stuff to scipy_core at this time (anyone else that wants to is welcome). [As an aside, we considered adding that to numarray at the start, but decided against doing that because indexing seemed complex already; not that we are against the capability.] Perry |
|
From: Sebastian H. <ha...@ms...> - 2005-12-01 20:19:17
|
(this was posted to SciPy on 2005-11-16 16:37 - maybe I got lost ;-) Todd, I'm just thinking of a nice feature that I think is now part of new scipyCore: Mixing index ranges in one axis with index lists in another. Example: I have index 4,7,9 that I'm intrested in: use a[ [4,7,9] ] If I want all section I obviously just say a[ : ] But what do I do in the 2d case where I want 4,7,9 in one axis and all in the other ? I understood that the new scipyCore allows a[:, [4,7,9]] whereas numarray gives an error !? Thanks, Sebastian Haase On Friday 04 November 2005 14:57, Todd Miller wrote: > Sebastian Haase wrote: > >Also I always need to thank Todd et al. for numarray which we are using > > for about 4 years now. > > I'm glad you found numarray useful. > > >I was following - I thought - all the postings here, but I don't remember > > when and what the reason was when a.type() changed to a.dtype (also > > there is a "dtypecode" somewhere !?). Any reference or explanation would > > be great. I have to say that the (old) parenthesis where always quite > > "annoying" ! ;-) > > > >Question: does the way allow assignments like "a.dtype = Float32". > >What does it do ? If not, is it raising an error (I had 2 different people > >yesterday who tried to assign to a.type here in our lab ...) > > > >Also is this now completely supported/tested and suggested for numarray ? > > (For the time numarray is still separate) > > I'm adding support for some of newcore's new interface features out of > desire to make it easier to migrate. Our intent is to make it possible > to write newcore code and run it on numarray now as newcore matures. > Not every newcore feature is going to be supported, but we'll make an > effort to support those which are easy to implement. Let me know is > there's some newcore idiom you want to use that numarray doesn't have yet. > > Regards, > Todd _______________________________________________ SciPy-user mailing list Sci...@sc... http://www.scipy.net/mailman/listinfo/scipy-user |
|
From: gf <gyr...@gm...> - 2005-11-25 22:18:35
|
From: Francesc Altet <faltet@ca...>
Re: help in improving data analysis code
2005-11-25 07:32
>>A Divendres 25 Novembre 2005 16:27, Francesc Altet va escriure:
> > print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0]
>>
>> Ups. I have had a confusion. This should work better ;-)
>>
>> print nn[argsort(abs(nn-nn.mean()),0)][:-int(sz*0.10),0]
Hi Francesc,
Thank you for the suggestions.
Your code is performing a different task than mine was. In particular,
I believe it does not 're-mean' the data after removing each point.
However, based on the great ideas from your code, I now have the
function below that looks to be more efficient (although I haven't
measured it).
Any additional suggestions are appreciated.
-g
=3D=3D=3D=3D
from numarray import argsort, floor, absolute
def eliminate_outliers(data,frac):
num_to_eliminate =3D int(floor(data.size())*frac)
for i in range(num_to_eliminate):
data =3D data[argsort(absolute(data-data.mean()),0)][:-1,0]
return data
if __name__ =3D=3D "__main__":
from numarray.mlab import rand
sz =3D 100
nn =3D rand(sz,1)
nn[:10] =3D 20*rand(10,1)
nn[sz-10:] =3D -20*rand(10,1)
print eliminate_outliers(nn,0.10)
|
|
From: Francesc A. <fa...@ca...> - 2005-11-25 15:32:13
|
A Divendres 25 Novembre 2005 16:27, Francesc Altet va escriure:
> print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0]
Ups. I have had a confusion. This should work better ;-)
print nn[argsort(abs(nn-nn.mean()),0)][:-int(sz*0.10),0]
=2D-=20
>0,0< Francesc Altet =A0 =A0 http://www.carabos.com/
V V C=E1rabos Coop. V. =A0=A0Enjoy Data
"-"
|
|
From: Francesc A. <fa...@ca...> - 2005-11-25 15:27:26
|
A Divendres 25 Novembre 2005 15:24, gf va escriure:
>
> from numarray import add, array, asarray, absolute, argsort, floor, take,
> size
>
> def mean(m,axis=3D0):
> m =3D asarray(m)
> return add.reduce(m,axis)/float(m.shape[axis])
>
> def eliminate_outliers(dat,frac):
> num_to_eliminate =3D int(floor(size(dat,0)*frac))
> for i in range(num_to_eliminate):
> ind =3D argsort(absolute(dat-mean(dat)),0)
> sdat =3D take(dat,ind,0)[:,0]
> dat =3D sdat[:-1]
> return dat
>
> #--------------------------------------------------------------------
>
> if __name__ =3D=3D "__main__":
> from MLab import rand
> sz =3D 100
> nn =3D rand(sz,1)
> nn[:10] =3D 20*rand(10,1)
> nn[sz-10:] =3D -20*rand(10,1)
> print eliminate_outliers(nn,0.10)
=46or sz=3D100, the next line of code is 10x faster on my machine (more if
sz is bigger):
print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0]
I haven't checked it very carefully, so you should double check it.
BTW, you will need to use the numarray MLab interface:
from numarray.mlab import rand
Cheers,
=2D-=20
>0,0< Francesc Altet =A0 =A0 http://www.carabos.com/
V V C=E1rabos Coop. V. =A0=A0Enjoy Data
"-"
|
|
From: gf <gyr...@gm...> - 2005-11-25 14:24:12
|
Hi,
I am a newbie to Numeric/numarray programming and would appreciate
your help in improving the code below (which I'm sure is quite ugly to
an experienced numarray programmer).
An analysis we are carrying out requires the following:
1. evaluate the mean of a set of data
2. eliminate the data point farthest from the mean
3. repeat steps 1 and 2 until a certain specified fraction of points
has been eliminated.
Since this analysis will have to be performed (probably repeatedly) on
approximately ten thousand data sets, each of which contains 100-500
points, I would like the code to be as fast as possible.
Thanks for your help.
-g
=3D=3D=3D=3D
from numarray import add, array, asarray, absolute, argsort, floor, take, s=
ize
def mean(m,axis=3D0):
m =3D asarray(m)
return add.reduce(m,axis)/float(m.shape[axis])
def eliminate_outliers(dat,frac):
num_to_eliminate =3D int(floor(size(dat,0)*frac))
for i in range(num_to_eliminate):
ind =3D argsort(absolute(dat-mean(dat)),0)
sdat =3D take(dat,ind,0)[:,0]
dat =3D sdat[:-1]
return dat
#--------------------------------------------------------------------
if __name__ =3D=3D "__main__":
from MLab import rand
sz =3D 100
nn =3D rand(sz,1)
nn[:10] =3D 20*rand(10,1)
nn[sz-10:] =3D -20*rand(10,1)
print eliminate_outliers(nn,0.10)
|
|
From: Darren D. <dd...@co...> - 2005-11-24 00:07:40
|
On Wednesday 23 November 2005 6:47 pm, Anoop wrote: > Hi. I have problems installing scipy-core from source code on a Rocks > Cluster distribution of Linux running on x86-64 hardware. The error > messages tell me: > > /usr/bin/g77 -shared > build/temp.linux-x86_64-2.3/scipy/corelib/blasdot/_dotblas.o -L/usr/lib > -lblas -lg2c -o build/lib.linux-x86_64-2.3/scipy/lib/_dotblas.so > /usr/bin/ld: skipping incompatible /usr/lib/libblas.so when searching for > -lblas /usr/bin/ld: skipping incompatible /usr/lib/libblas.a when searching > for -lblas /usr/bin/ld: skipping incompatible > /usr/lib/gcc/x86_64-redhat-linux/3.4.4/../../../libblas.so when searching > for -lblas /usr/bin/ld: skipping incompatible > /usr/lib/gcc/x86_64-redhat-linux/3.4.4/../../../libblas.a when searching > for -lblas /usr/bin/ld: skipping incompatible /usr/lib/libblas.so when > searching for -lblas /usr/bin/ld: skipping incompatible /usr/lib/libblas.a > when searching for -lblas /usr/bin/ld: cannot find -lblas > collect2: ld returned 1 exit status I saw something similar to this just the other day. In my case, the problem was that scipy discovered broken libblas.* softlinks in /usr/lib and tried to build against them. It looks like something similar is happening here, check your atlas/blas/lapack installation. Darren -- Darren S. Dale, Ph.D. dd...@co... |
|
From: Anoop <ara...@uc...> - 2005-11-23 23:47:07
|
Hi. I have problems installing scipy-core from source code on a Rocks Cluster distribution of Linux running on x86-64 hardware. The error messages tell me: /usr/bin/g77 -shared build/temp.linux-x86_64-2.3/scipy/corelib/blasdot/_dotblas.o -L/usr/lib -lblas -lg2c -o build/lib.linux-x86_64-2.3/scipy/lib/_dotblas.so /usr/bin/ld: skipping incompatible /usr/lib/libblas.so when searching for -lblas /usr/bin/ld: skipping incompatible /usr/lib/libblas.a when searching for -lblas /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/3.4.4/../../../libblas.so when searching for -lblas /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/3.4.4/../../../libblas.a when searching for -lblas /usr/bin/ld: skipping incompatible /usr/lib/libblas.so when searching for -lblas /usr/bin/ld: skipping incompatible /usr/lib/libblas.a when searching for -lblas /usr/bin/ld: cannot find -lblas collect2: ld returned 1 exit status I really need to get this running. Help? Thanks, Anoop |
|
From: Francesc A. <fa...@ca...> - 2005-11-23 18:11:36
|
A Dimecres 23 Novembre 2005 18:33, Jeff Whitaker va escriure: > > - No 2 GB limitation > > - Support for compression (and other filters) > And any dimension can be 'unlimited' (not just the first). Yes, but just one dimension at a time, at least for the time being. We would like to eliminate this limitation rather sooner than later, though. =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Jeff W. <js...@fa...> - 2005-11-23 17:36:17
|
Francesc Altet wrote: > A Dimecres 23 Novembre 2005 17:47, Chris Barker va escriure: > >> Francesc Altet wrote: >> >>> Also, Jeff Whitaker has kindly contributed a new module called >>> tables.NetCDF. It is designed to be used as a drop-in replacement for >>> Scientific.IO.NetCDF, with only minor actions to existing code. >>> >> What advantages does tables.NetCDF have over Scientific.IO.NetCDF? >> > > Maybe Jeff would add something more, but the apparent ones are: > > - No 2 GB limitation > - Support for compression (and other filters) > > Cheers, > > And any dimension can be 'unlimited' (not just the first). -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Francesc A. <fa...@ca...> - 2005-11-23 17:00:17
|
A Dimecres 23 Novembre 2005 17:47, Chris Barker va escriure: > Francesc Altet wrote: > > Also, Jeff Whitaker has kindly contributed a new module called > > tables.NetCDF. It is designed to be used as a drop-in replacement for > > Scientific.IO.NetCDF, with only minor actions to existing code. > > What advantages does tables.NetCDF have over Scientific.IO.NetCDF? Maybe Jeff would add something more, but the apparent ones are: =2D No 2 GB limitation =2D Support for compression (and other filters) Cheers, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Jeff W. <js...@fa...> - 2005-11-23 16:58:33
|
Chris Barker wrote: > Francesc Altet wrote: > >> Also, Jeff Whitaker has kindly contributed a new module called >> tables.NetCDF. It is designed to be used as a drop-in replacement for >> Scientific.IO.NetCDF, with only minor actions to existing code. > > What advantages does tables.NetCDF have over Scientific.IO.NetCDF? > > -Chris > > > Chris: From my perspective, the transparent compression (using zlib and the hdf5 shuffle filter) is the biggest one. I get files that are a factor of 2-4 smaller. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Chris B. <Chr...@no...> - 2005-11-23 16:48:14
|
Francesc Altet wrote:
> Also, Jeff Whitaker has kindly contributed a new module called
> tables.NetCDF. It is designed to be used as a drop-in replacement for
> Scientific.IO.NetCDF, with only minor actions to existing code.
What advantages does tables.NetCDF have over Scientific.IO.NetCDF?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chr...@no...
|
|
From: Francesc A. <fa...@ca...> - 2005-11-23 11:39:34
|
PyTables is a library to deal with very large datasets. It leverages the excellent HDF5 and numarray libraries to allow doing that in a very efficient way using the Python language. More info in: http://pytables.sourceforge.net/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Announcing PyTables 1.2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The PyTables development team is happy to announce the availability of a new major version of PyTables package. This version sports a completely new in-memory tree implementation based around a *node cache system*. This system loads nodes only when needed and unloads them when they are rarely used. The new feature allows the opening and creation of HDF5 files with large hierarchies very quickly and with a low memory consumption (the object tree is no longer completely loaded in-memory), while retaining all the powerful browsing capabilities of the previous implementation of the object tree. You can read more about the dings and bells of the new cache system in: http://www.carabos.com/downloads/pytables/NewObjectTreeCache.pdf Also, Jeff Whitaker has kindly contributed a new module called tables.NetCDF. It is designed to be used as a drop-in replacement for Scientific.IO.NetCDF, with only minor actions to existing code. Also, if you have the Scientific.IO.NetCDF module installed, it allows to do conversions between HDF5 <--> NetCDF3 formats. Go to the PyTables web site for downloading the beast: http://pytables.sourceforge.net/ If you want more info about this release, please check out the more comprehensive announcement message available in: http://www.carabos.com/downloads/pytables/ANNOUNCE-1.2.html Acknowledgments =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Thanks to the users who provided feature improvements, patches, bug reports, support and suggestions. See THANKS file in distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last but not least, a big thank you to THG (http://www.hdfgroup.org/) for sponsoring many of the new features recently introduced in PyTables. =2D-- **Enjoy data!** -- The PyTables Team =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Alan G I. <ai...@am...> - 2005-11-22 14:04:45
|
On Tue, 22 Nov 2005, Joost van Evert apparently wrote: > does anyone understand importing modules? I want to import a new version > of Numeric(24.2) that was installed in my home directory, but the old > one in /usr/lib/python/site-packages gets imported. > The command "python setup.py --install=~" was used to install. And > ~/lib/python is the first item in my PYTHONPATH variable. So the > contents of sys.path is ['','~/lib/python', ...] > Does anyone know any means to get the new version imported? Except from > deleting the old version of course;) http://docs.python.org/tut/node8.html section 6.1.1 hth, Alan Isaac |
|
From: Joost v. E. <joo...@gm...> - 2005-11-22 13:57:47
|
Dear list, does anyone understand importing modules? I want to import a new version of Numeric(24.2) that was installed in my home directory, but the old one in /usr/lib/python/site-packages gets imported. The command "python setup.py --install=~" was used to install. And ~/lib/python is the first item in my PYTHONPATH variable. So the contents of sys.path is ['','~/lib/python', ...] Does anyone know any means to get the new version imported? Except from deleting the old version of course;) Regards, Joost |
|
From: Travis O. <oli...@ie...> - 2005-11-21 21:12:19
|
> There are a few extra capabilities which are supported in numarray's > memmap: > > 1. slice insertion > 2. slice deletion > 3. memmap based array resizing Thanks for the extra explanation. I could see how these might require more stuff under the hood. -Travis |
|
From: Andrew J. <a.h...@gm...> - 2005-11-21 17:18:22
|
[Originally posted to g.c.p.user...]
Hi all,
In the newest incarnation of scipy_core, I am having trouble with the
cholesky(a) routine:
/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/scipy/linalg/basic_lite.py
in cholesky_decomposition(a)
115 else:
116 lapack_routine = lapack_lite.dpotrf
--> 117 results = lapack_routine('L', n, a, m, 0)
118 if results['info'] > 0:
119 raise LinAlgError, 'Matrix is not positive definite -
Cholesky decomposition cannot be computed'
LapackError: Parameter a is not contiguous in lapack_lite.dpotrf
But this isn't true; I get this error even when I pass trivial and
contiguous matrices such as the output of identity(). Other linalg
routines (included complicated ones like singular_value_decomp) seem to
work fine.
Any ideas?
Andrew
|