You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Sebastian H. <ha...@ms...> - 2006-08-25 20:48:25
|
On Friday 25 August 2006 07:01, Travis Oliphant wrote: > Keith Goodman wrote: > > How do I delete a row (or list of rows) from a matrix object? > > > > To remove the n'th row in octave I use x(n,:) = []. Or n could be a > > vector of rows to remove. > > > > In numpy 0.9.9.2813 x[[1,2],:] = [] changes the values of all the > > elements of x without changing the size of x. > > > > In numpy do I have to turn it around and construct a list of the rows > > I want to keep? > > Basically, that is true for now. > > I think it would be worth implementing some kind of function for making > this easier. > > One might think of using: > > del a[obj] > > But, the problem with both of those approaches is that once you start > removing arbitrary rows (or n-1 dimensional sub-spaces) from an array > you very likely will no longer have a chunk of memory that can be > described using the n-dimensional array memory model. > > So, you would have to make memory copies. This could be done, of > course, and the data area of "a" altered appropriately. But, such > alteration of the memory would break any other objects that have a > "view" of the memory area of "a." Right now, there is no way to track > which objects have such "views", and therefore no good way to tell > (other than the very conservative reference count) if it is safe to > re-organize the memory of "a" in this way. > > So, "in-place" deletion of array objects would not be particularly > useful, because it would only work for arrays with no additional > reference counts (i.e. simple b=a assignment would increase the > reference count and make it impossible to say del a[obj]). > > However, a function call that returned a new array object with the > appropriate rows deleted (implemented by constructing a new array with > the remaining rows) would seem to be a good idea. > > I'll place a prototype (named delete) to that effect into SVN soon. > > -Travis > Now of course: I often needed to "insert" a column, row or section, ... ? I made a quick and dirty implementation for that myself: def insert(arr, i, entry, axis=0): """returns new array with new element inserted at index i along axis if arr.ndim>1 and if entry is scalar it gets filled in (ref. broadcasting) note: (original) arr does not get affected """ if i > arr.shape[axis]: raise IndexError, "index i larger than arr size" shape = list(arr.shape) shape[axis] += 1 a= N.empty(dtype=arr.dtype, shape=shape) aa=N.transpose(a, [axis]+range(axis)+range(axis+1,a.ndim)) aarr=N.transpose(arr, [axis]+range(axis)+range(axis+1,arr.ndim)) aa[:i] = aarr[:i] aa[i+1:] = aarr[i:] aa[i] = entry return a but maybe there is a way to put it it numpy directly. - Sebastian |
From: Travis O. <oli...@ie...> - 2006-08-25 20:38:56
|
kor...@id... wrote: > Message: 4 > Date: Thu, 24 Aug 2006 14:17:44 -0600 > From: Travis Oliphant <oli...@ee...> > Subject: Re: [Numpy-discussion] (no subject) > To: Discussion of Numerical Python > <num...@li...> > Message-ID: <44E...@ee...> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > kor...@id... wrote: > > > > You have a module built against an older version of NumPy. What modules > are being loaded? Perhaps it is matplotlib or SciPy > You need to re-build matplotlib. They should be producing a binary that is compatible with 1.0b2 (I'm being careful to make sure future releases are binary compatible with 1.0b2). Also, make sure that you remove the build directory under numpy if you have previously built a version of numpy prior to 1.0b2. -Travis |
From: <kor...@id...> - 2006-08-25 20:32:41
|
Message: 4 Date: Thu, 24 Aug 2006 14:17:44 -0600 From: Travis Oliphant <oli...@ee...> Subject: Re: [Numpy-discussion] (no subject) To: Discussion of Numerical Python <num...@li...> Message-ID: <44E...@ee...> Content-Type: text/plain; charset=ISO-8859-1; format=flowed kor...@id... wrote: >>On Thursday 24 August 2006 09:50, kor...@id... wrote: >> >> >>>Sorry for my ignorance, but I have not ever heard of or used mingw32. I >>>am also using python 2.3. >>> >>> >>http://en.wikipedia.org/wiki/Mingw explains in detail. >> >> > >$HOME=C:\Documents and Settings\Administrator >CONFIGDIR=C:\Documents and Settings\Administrator\.matplotlib >loaded ttfcache file C:\Documents and >Settings\Administrator\.matplotlib\ttffont >.cache >matplotlib data path c:\python23\lib\site-packages\matplotlib\mpl-data >backend WXAgg version 2.6.3.2 >Overwriting info=<function info at 0x01FAF3F0> from scipy.misc.helpmod >(was <fun >ction info at 0x01F896F0> from numpy.lib.utils) >Overwriting who=<function who at 0x01FA46B0> from scipy.misc.common (was ><functi >on who at 0x01F895F0> from numpy.lib.utils) >Overwriting source=<function source at 0x01FB2530> from scipy.misc.helpmod >(was ><function source at 0x01F89730> from numpy.lib.utils) >RuntimeError: module compiled against version 1000000 of C-API but this >version >of numpy is 1000002 >Fatal Python error: numpy.core.multiarray failed to import... exiting. > > >abnormal program termination > > You have a module built against an older version of NumPy. What modules are being loaded? Perhaps it is matplotlib or SciPy -Travis Travis I tried doing it again with removing scipy and my old version of numpy. I also have matplotlib installed. is there a special way that i have to go about installing this because of matplotlib? |
From: Travis O. <oli...@ie...> - 2006-08-25 20:01:41
|
Keith Goodman wrote: > How do I delete a row (or list of rows) from a matrix object? > > To remove the n'th row in octave I use x(n,:) = []. Or n could be a > vector of rows to remove. > > In numpy 0.9.9.2813 x[[1,2],:] = [] changes the values of all the > elements of x without changing the size of x. > > In numpy do I have to turn it around and construct a list of the rows > I want to keep? > Basically, that is true for now. I think it would be worth implementing some kind of function for making this easier. One might think of using: del a[obj] But, the problem with both of those approaches is that once you start removing arbitrary rows (or n-1 dimensional sub-spaces) from an array you very likely will no longer have a chunk of memory that can be described using the n-dimensional array memory model. So, you would have to make memory copies. This could be done, of course, and the data area of "a" altered appropriately. But, such alteration of the memory would break any other objects that have a "view" of the memory area of "a." Right now, there is no way to track which objects have such "views", and therefore no good way to tell (other than the very conservative reference count) if it is safe to re-organize the memory of "a" in this way. So, "in-place" deletion of array objects would not be particularly useful, because it would only work for arrays with no additional reference counts (i.e. simple b=a assignment would increase the reference count and make it impossible to say del a[obj]). However, a function call that returned a new array object with the appropriate rows deleted (implemented by constructing a new array with the remaining rows) would seem to be a good idea. I'll place a prototype (named delete) to that effect into SVN soon. -Travis |
From: Sebastian H. <ha...@ms...> - 2006-08-25 19:32:32
|
On Friday 25 August 2006 12:19, Charles R Harris wrote: > Hi, > > On 8/25/06, Travis Oliphant <oli...@ie...> wrote: > > Sebastian Haase wrote: > > >> This is now the behavior in SVN. Note that this is different from > > > > both > > > > >> Numeric (which gave an error) and numarray (which coerced to float32). > > >> > > >> But, it is consistent with how mixed-types are handled in calculations > > >> and is thus an easier rule to explain. > > >> > > >> Thanks for the testing. > > >> > > >> -Travis > > > > > > How hard would it be to change the rules back to the numarray behavior > > > ? > > > > It wouldn't be hard, but I'm not so sure that's a good idea. I do see > > the logic behind that approach and it is worthy of some discussion. > > I'll give my current opinion: > > > > The reason I changed the behavior is to get consistency so there is one > > set of rules on mixed-type interaction to explain. You can always do > > what you want by force-casting your int32 arrays to float32. There > > will always be some people who don't like whichever behavior is > > selected, but we are trying to move NumPy in a direction of consistency > > with fewer exceptions to explain (although this is a guideline and not > > an absolute requirement). > > > > Mixed-type interaction is always somewhat ambiguous. Now there is a > > consistent rule for both universal functions and other functions (move > > to a precision where both can be safely cast to --- unless one is a > > scalar and then its precision is ignored). > > I think this is a good thing. It makes it easy to remember what the > function will produce. The only oddity the user has to be aware of is that > int32 has more precision than float32. Probably not obvious to a newbie, > but a newbie will probably be using the double defaults anyway. Which is > another good reason for making double the default type. Not true - a numpy-(or numeric-programming) newbie working in medical imaging or astronomy would still get float32 data to work with. He/She would do some operations on the data and be surprised that memory (or disk space) blows up. > > If you don't want that to happen, then be clear about what data-type > > > should be used by casting yourself. In this case, we should probably > > not try and guess about what users really want in mixed data-type > > situations. > > I wonder if it would be reasonable to add the dtype keyword to hstack > itself? Hmmm, what are the conventions for coercions to lesser precision? > That could get messy indeed, maybe it is best to leave such things alone > and let the programmer deal with it by rethinking the program. In the float > case that would probably mean using a float32 array instead of an int32 > array. > > Chuck I think my main argument is that float32 is a very common type in (large) data processing to save memory. But I don't know about how many exceptions like an extra "float32 rule" we can handle ... I would like to hear how the numarray (STScI) folks think about this. Who else works with data of the order of GBs !? - Sebastian |
From: Charles R H. <cha...@gm...> - 2006-08-25 19:19:34
|
Hi, On 8/25/06, Travis Oliphant <oli...@ie...> wrote: > > Sebastian Haase wrote: > >> This is now the behavior in SVN. Note that this is different from > both > >> Numeric (which gave an error) and numarray (which coerced to float32). > >> > >> But, it is consistent with how mixed-types are handled in calculations > >> and is thus an easier rule to explain. > >> > >> Thanks for the testing. > >> > >> -Travis > >> > > > > How hard would it be to change the rules back to the numarray behavior ? > > > It wouldn't be hard, but I'm not so sure that's a good idea. I do see > the logic behind that approach and it is worthy of some discussion. > I'll give my current opinion: > > The reason I changed the behavior is to get consistency so there is one > set of rules on mixed-type interaction to explain. You can always do > what you want by force-casting your int32 arrays to float32. There > will always be some people who don't like whichever behavior is > selected, but we are trying to move NumPy in a direction of consistency > with fewer exceptions to explain (although this is a guideline and not > an absolute requirement). > > Mixed-type interaction is always somewhat ambiguous. Now there is a > consistent rule for both universal functions and other functions (move > to a precision where both can be safely cast to --- unless one is a > scalar and then its precision is ignored). I think this is a good thing. It makes it easy to remember what the function will produce. The only oddity the user has to be aware of is that int32 has more precision than float32. Probably not obvious to a newbie, but a newbie will probably be using the double defaults anyway. Which is another good reason for making double the default type. If you don't want that to happen, then be clear about what data-type > should be used by casting yourself. In this case, we should probably > not try and guess about what users really want in mixed data-type > situations. I wonder if it would be reasonable to add the dtype keyword to hstack itself? Hmmm, what are the conventions for coercions to lesser precision? That could get messy indeed, maybe it is best to leave such things alone and let the programmer deal with it by rethinking the program. In the float case that would probably mean using a float32 array instead of an int32 array. Chuck |
From: Keith G. <kwg...@gm...> - 2006-08-25 18:58:08
|
How do I delete a row (or list of rows) from a matrix object? To remove the n'th row in octave I use x(n,:) = []. Or n could be a vector of rows to remove. In numpy 0.9.9.2813 x[[1,2],:] = [] changes the values of all the elements of x without changing the size of x. In numpy do I have to turn it around and construct a list of the rows I want to keep? |
From: Travis O. <oli...@ie...> - 2006-08-25 18:50:33
|
Sebastian Haase wrote: >> This is now the behavior in SVN. Note that this is different from both >> Numeric (which gave an error) and numarray (which coerced to float32). >> >> But, it is consistent with how mixed-types are handled in calculations >> and is thus an easier rule to explain. >> >> Thanks for the testing. >> >> -Travis >> > > How hard would it be to change the rules back to the numarray behavior ? > It wouldn't be hard, but I'm not so sure that's a good idea. I do see the logic behind that approach and it is worthy of some discussion. I'll give my current opinion: The reason I changed the behavior is to get consistency so there is one set of rules on mixed-type interaction to explain. You can always do what you want by force-casting your int32 arrays to float32. There will always be some people who don't like whichever behavior is selected, but we are trying to move NumPy in a direction of consistency with fewer exceptions to explain (although this is a guideline and not an absolute requirement). Mixed-type interaction is always somewhat ambiguous. Now there is a consistent rule for both universal functions and other functions (move to a precision where both can be safely cast to --- unless one is a scalar and then its precision is ignored). If you don't want that to happen, then be clear about what data-type should be used by casting yourself. In this case, we should probably not try and guess about what users really want in mixed data-type situations. -Travis |
From: Robert K. <rob...@gm...> - 2006-08-25 18:02:49
|
Charles R Harris wrote: > Matrix rank has nothing to do with numpy rank. Numpy rank is simply the > number of indices required to address an element of an ndarray. I always > thought a better name for the Numpy rank would be dimensionality, but > like everything else one gets used to the numpy jargon, it only needs to > be defined someplace for what it is. "numpy rank" derives from "tensor rank" rather than "matrix rank". It's not *wrong*, but as with many things in mathematics, the term is overloaded and can be confusing. "dimensionality" is no better. A "three-dimensional array" might be [1, 2, 3], not [[[1]]]. http://mathworld.wolfram.com/TensorRank.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Rob H. <he...@ta...> - 2006-08-25 16:13:12
|
Yes, it works now. Thanks, -Rob On Aug 24, 2006, at 6:39 PM, Travis Oliphant wrote: > Rob Hetland wrote: > >> In compiling matplotlib and scipy, I get errors complaining about >> multiply defined symbols (See below). I tried to fix this with - >> multiply_defined suppress but this did not work. Is there a way to >> make this go away? >> >> > Can you try current SVN again, to see if it now works? > > -Travis > > > ---------------------------------------------------------------------- > --- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion ---- Rob Hetland, Associate Professor Dept. of Oceanography, Texas A&M University http://pong.tamu.edu/~rob phone: 979-458-0096, fax: 979-845-6331 |
From: Sebastian H. <ha...@ms...> - 2006-08-25 15:34:36
|
Sasha wrote: > On 8/25/06, Charles R Harris <cha...@gm...> wrote: >> Matrix rank has nothing to do with numpy rank. Numpy rank is simply the >> number of indices required to address an element of an ndarray. I always >> thought a better name for the Numpy rank would be dimensionality, but like >> everything else one gets used to the numpy jargon, it only needs to be >> defined someplace for what it is. > > That's my point exactly. The rank(2) definition was added by > Sebastian Haase who advocates the use of the term "ndims" instead of > "rank". I've discussed the use of "dimentionality' in the preamble. > Note that ndims stands for the number of dimensions, not > dimensionality. > > I don't want to remove rank(2) without hearing from Sebastian first > and I appreciate his effort to improve the glossary. Maybe we shold > add a "matrix rank" entry instead. My phasing is certainly suboptimal (I only remember the German wording - and even that only faintly - "linear independent" !?) But I put it in, remembering the discussion in "numpy" on *why* array.rank (numarray) was changed to array.ndim (numpy) I just thought this page might be a good place to 'discourage usage of badly-defined terms' or at least give the argument for "ndim". [ OK: it's not "badly" defined: but there are two separate camps on *what* it should mean --- ndim is clear.] BTW: Does the "matrix" class have m.rank attribute !? Cheers, Sebastian. |
From: Sebastian H. <ha...@ms...> - 2006-08-25 15:18:20
|
was: Re: [Numpy-discussion] hstack(arr_Int32, arr_float32) fails because of casting rules Travis Oliphant wrote: > Sebastian Haase wrote: >> On Thursday 24 August 2006 17:28, Travis Oliphant wrote: >> >>> Sebastian Haase wrote: >>> >>>> Hi, >>>> I get >>>> TypeError: array cannot be safely cast to required type >>>> >>>> when calling hstack() ( which calls concatenate() ) >>>> on two arrays being a int32 and a float32 respectively. >>>> >>>> I understand now that a int32 cannot be safely converted into a float32 >>>> but why does concatenate not automatically >>>> up(?) cast to float64 ?? >>>> >>> Basically, NumPy is following Numeric's behavior of raising an error in >>> this case of unsafe casting in concatenate. For functions that are not >>> universal-function objects, mixed-type behavior works basically just >>> like Numeric did (using the ordering of the types to determine which one >>> to choose as the output). >>> >>> It could be argued that the ufunc-rules should be followed instead. >>> >>> -Travis >>> >>> >> Are you saying the ufunc-rules would convert "int32-float32" to float64 and >> hence make my code "just work" !? >> > > This is now the behavior in SVN. Note that this is different from both > Numeric (which gave an error) and numarray (which coerced to float32). > > But, it is consistent with how mixed-types are handled in calculations > and is thus an easier rule to explain. > > Thanks for the testing. > > -Travis After sleeping over this, I am contemplating about the cases where one would use float32 in the first place. My case yesterday, where I only had a 1d line profile of my data, I was of course OK with coercion to float64. But if you are working with 3D image data (as in medicine) or large 2D images as in astronomy I would assume the reason use float32 is that computer memory is to tight to afford 64bits per pixel. This is probably why numarray tried to keep float32. Float32 can handle a few more digits of precision than int16, but not as much as int32. But I find that I most always have int32s only because its the default, whereas I have float32 as a clear choice to save memory. How hard would it be to change the rules back to the numarray behavior ? Who would be negatively affected ? And who positively ? Thanks for the great work. Sebastian |
From: Sasha <nd...@ma...> - 2006-08-25 13:48:57
|
On 8/25/06, Charles R Harris <cha...@gm...> wrote: > Matrix rank has nothing to do with numpy rank. Numpy rank is simply the > number of indices required to address an element of an ndarray. I always > thought a better name for the Numpy rank would be dimensionality, but like > everything else one gets used to the numpy jargon, it only needs to be > defined someplace for what it is. That's my point exactly. The rank(2) definition was added by Sebastian Haase who advocates the use of the term "ndims" instead of "rank". I've discussed the use of "dimentionality' in the preamble. Note that ndims stands for the number of dimensions, not dimensionality. I don't want to remove rank(2) without hearing from Sebastian first and I appreciate his effort to improve the glossary. Maybe we shold add a "matrix rank" entry instead. |
From: Charles R H. <cha...@gm...> - 2006-08-25 13:34:23
|
Hi, On 8/25/06, Stefan van der Walt <st...@su...> wrote: > > On Thu, Aug 24, 2006 at 11:10:24PM -0400, Sasha wrote: > > I would welcome an effort to make the glossary more novice friendly, > > but not at the expense of oversimplifying things. > > > > BTW, do you think "Rank ... (2) number of orthogonal dimensions of a > > matrix" is clear? Considering that matrix is defined a "an array of > > rank 2"? Is "rank" in linear algebra sense common enough in numpy > > documentation to be included in the glossary? > > > > For comparison, here are a few alternative formulations of matrix rank > > definition: > > > > "The rank of a matrix or a linear map is the dimension of the image of > > the matrix or the linear map, corresponding to the number of linearly > > independent rows or columns of the matrix, or to the number of nonzero > > singular values of the map." > > <http://mathworld.wolfram.com/MatrixRank.html> > > > > "In linear algebra, the column rank (row rank respectively) of a > > matrix A with entries in some field is defined to be the maximal > > number of columns (rows respectively) of A which are linearly > > independent." > > <http://en.wikipedia.org/wiki/Rank_(linear_algebra)> > > I prefer the last definition. Introductory algebra courses teach the > term "linearly independent" before "orthogonal" (IIRC). As for > "linear map", it has other names, too, and doesn't (in my mind) > clarify the definition of rank in this context. Matrix rank has nothing to do with numpy rank. Numpy rank is simply the number of indices required to address an element of an ndarray. I always thought a better name for the Numpy rank would be dimensionality, but like everything else one gets used to the numpy jargon, it only needs to be defined someplace for what it is. Chuck |
From: Sven S. <sve...@gm...> - 2006-08-25 10:28:12
|
kor...@id... schrieb: > Since no one has downloaded 1.0b3 yet, if someone wants to put up the > windows version for python2.3 i would be more than happy to be the first > person to download it :) > I'm sorry, this is *not* for python 2.3, but I posted a build of current svn for python 2.4 under windows here (direct download link): http://www.wiwi.uni-frankfurt.de/profs/nautz/downloads/software/numpy-1.0b4.dev3068.win32-py2.4.exe I didn't do anything except checking out and compiling it, so I guess this is not optimized in any way. Maybe it's still useful for some people. cheers, Sven |
From: Francesc A. <fa...@ca...> - 2006-08-25 10:12:07
|
=========================== Announcing PyTables 1.3.3 =========================== I'm happy to announce a new minor release of PyTables. In this one, we have focused on improving compatibility with latest beta versions of NumPy (0.9.8, 1.0b2, 1.0b3 and higher), adding some improvements and the typical bunch of fixes (some of them are important, like the possibility of re-using the same nested class in declaration of table records; see later). Go to the PyTables web site for downloading the beast: http://www.pytables.org/ or keep reading for more info about the new features and bugs fixed. Changes more in depth ===================== Improvements: - Added some workarounds on a couple of 'features' of recent versions of NumPy. Now, PyTables should work with a broad range of NumPy versions, ranging from 0.9.8 up to 1.0b3 (and hopefully beyond, but let's see). - When a loop for appending a table is not flushed before the node is unbounded (and hence, becomes ``killed`` in PyTables slang), like in:: import tables as T class Item(T.IsDescription): name = T.StringCol(length=16) vals = T.Float32Col(0.0) fileh = T.openFile("/tmp/test.h5", "w") table = fileh.createTable(fileh.root, 'table', Item) for i in range(100): table.row.append() #table.flush() # uncomment this prevent the warning table = None # Unbounding table node! a ``PerformanceWarning`` is issued telling the user that it is *much* recommended flushing the buffers in a table before unbounding it. Hopefully, this will also prevent other scary errors (like ``Illegal Instruction``, ``Malloc(): trying to call free() twice``, ``Bus Error`` or ``Segmentation fault`` ) that some people is seeing lately and which are most probably related with this issue. Bug fixes: - In situations where the same metaclass is used for declaring several columns in a table, like in:: class Nested(IsDescription): uid = IntCol() data = FloatCol() class B_Candidate(IsDescription): nested1 = Nested() nested2 = Nested() they were sharing the same column metadata behind the scenes, introducing several inconsistencies on it. This has been fixed. - More work on different padding conventions between NumPy/numarray. Now, all trailing spaces in chararrays are stripped-off during write/read operations. This means that when retrieving NumPy chararrays, it shouldn't appear spureous trailing spaces anymore (not even in the context of recarrays). The drawback is that you will loose *all* the trailing spaces, no matter if you want them in this place or not. This is not a very confortable situation to deal with, but hopefully, things will get better when NumPy would be at the core of PyTables. In the meanwhile, I hope that the current behaviour would be a minor evil for most of situations. This closes ticket #13 (again). - Solved a problem with conversions from numarray charrays to numpy objects. Before, when saving numpy chararrays with a declared length of N, but none of this components reached such a length, the dtype of the numpy chararray retrieved was the maximum length of the component strings. This has been corrected. - Fixed a minor glitch in detection of signedness in IntAtom classes. Thanks to Norbert Nemec for reporting this one and providing the fix. Known bugs: - Using ``Row.update()`` in tables with some columns marked as indexed gives a ``NotImplemented`` error although it should not. This is fixed in SVN trunk and the functionality will be available in the 1.4.x series. Meanwhile, a workaround would be refraining to declare columns as indexed and index them *after* the update process (with Col.createIndex() for example). Deprecated features: - None Backward-incompatible changes: - Please, see ``RELEASE-NOTES.txt`` file. Important note for Windows users ================================ If you are willing to use PyTables with Python 2.4 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win.ZIP What it is ========== PyTables is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data (with qsupport for full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a very easy-to-use tool for high performance data storage and retrieval. PyTables runs on top of the HDF5 library and numarray (but NumPy and Numeric are also supported) package for achieving maximum throughput and convenient use. Besides, PyTables I/O for table objects is buffered, implemented in C and carefully tuned so that you can reach much better performance with PyTables than with your own home-grown wrappings to the HDF5 library. PyTables sports indexing capabilities as well, allowing doing selections in tables exceeding one billion of rows in just seconds. Platforms ========= This version has been extensively checked on quite a few platforms, like Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64 (Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64) and MacOSX on PowerPC. For other platforms, chances are that the code can be easily compiled and run without further issues. Please, contact us in case you are experiencing problems. Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdf.ncsa.uiuc.edu/HDF5/ About numarray: http://www.stsci.edu/resources/software_hardware/numarray To know more about the company behind the PyTables development, see: http://www.carabos.com/ Acknowledgments =============== Thanks to various the users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last but not least, a big thank you to THG (http://www.hdfgroup.org/) for sponsoring many of the new features recently introduced in PyTables. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team |
From: David C. <da...@ar...> - 2006-08-25 10:06:12
|
Travis Oliphant wrote: > David Cournapeau wrote: >> Indeed. >> >> By the way, I tried something for python.thread + signals. This is posix >> specific, and it works as expected on linux: >> > Am I right that this could this be accomplished simply by throwing away > all the interrupt handling stuff in the code and checking for > PyOS_InterruptOccurred() in the place where you check for the global > variable that your signal handler uses? Your signal handler does > essentially what Python's signal handler already does, if I'm not mistaken. I don't know how the python signal handler works, but I believe it should do more or less the same, indeed. The key idea is that it is important to mask other signals related to interrupting. To have a relatively clear view on this, if you have not seen it, you may take a look at the gnu C doc on signal handling: http://www.gnu.org/software/libc/manual/html_node/Defining-Handlers.html#Defining-Handlers After having given some thought, I am wondering about what exactly we are trying to do: - the main problem is to be able to interrupt some which may take a long time to compute, without corrupting the whole python process. - for that, those function need to be able to trap the usual signals corresponding to interrupt (SIGINT, etc... on Unix, equivalents on windows). There are two ways to handle a signal: - check regularly some global (that is, global to the whole process) value, and if change this value if a signal is trapped. That's the easier way, but this is not thread safe as I first thought (I will code an example if I have time). - the signal handler jumps to an other point of the program where cleaning is done: this is more complicated, and I am not sure we need the complication (I have never used this scheme, so I may just miss the point totally). I don't even want to think how it works in multi-threading environment :) Now, the threading issue came in, and I am not sure why we need to care: this is a problem if numpy is implemented in a multi-thread way, but I don't believe it to be the case, right ? An other solution, which is used I think in more sophisticated programs, is having one thread with high priority, which only job is to detect signals, and to mask all signals in all other threads. Again, this seems overkill (and highly non portable) ? And this should be the python interpreter job, no ? Actually, as this is a generic problem for any python extension code, other really smart people should have thought about that... If I am interpreting correctly what is said here http://docs.python.org/lib/module-signal.html, I believe that what you suggest (using PyOS_InterruptOccurred() at some points) is what shall be done: the python interpreter is making sure that the signal is send to the main thread, that is the thread where numpy is executed (that's my understanding on the way python interpreter works, not a fact). David |
From: Stefan v. d. W. <st...@su...> - 2006-08-25 07:45:41
|
On Thu, Aug 24, 2006 at 11:10:24PM -0400, Sasha wrote: > I would welcome an effort to make the glossary more novice friendly, > but not at the expense of oversimplifying things. >=20 > BTW, do you think "Rank ... (2) number of orthogonal dimensions of a > matrix" is clear? Considering that matrix is defined a "an array of > rank 2"? Is "rank" in linear algebra sense common enough in numpy > documentation to be included in the glossary? >=20 > For comparison, here are a few alternative formulations of matrix rank > definition: >=20 > "The rank of a matrix or a linear map is the dimension of the image of > the matrix or the linear map, corresponding to the number of linearly > independent rows or columns of the matrix, or to the number of nonzero > singular values of the map." > <http://mathworld.wolfram.com/MatrixRank.html> >=20 > "In linear algebra, the column rank (row rank respectively) of a > matrix A with entries in some field is defined to be the maximal > number of columns (rows respectively) of A which are linearly > independent." > <http://en.wikipedia.org/wiki/Rank_(linear_algebra)> I prefer the last definition. Introductory algebra courses teach the term "linearly independent" before "orthogonal" (IIRC). As for "linear map", it has other names, too, and doesn't (in my mind) clarify the definition of rank in this context. Regards St=E9fan |
From: Travis O. <oli...@ie...> - 2006-08-25 06:10:23
|
David Cournapeau wrote: > Indeed. > > By the way, I tried something for python.thread + signals. This is posix > specific, and it works as expected on linux: > Am I right that this could this be accomplished simply by throwing away all the interrupt handling stuff in the code and checking for PyOS_InterruptOccurred() in the place where you check for the global variable that your signal handler uses? Your signal handler does essentially what Python's signal handler already does, if I'm not mistaken. -Travis |
From: David C. <da...@ar...> - 2006-08-25 04:32:38
|
Travis Oliphant wrote: > > Right, as long as you know what to do you are O.K. I was just thinking > about a hypothetical situation where the library allocated some > temporary memory that it was going to free at the end of the subroutine > but then an interrupt jumped out back to your code before it could > finish. In a case like this, you would have to use the "check if > interrupt has occurred" approach before and after the library call. Indeed. By the way, I tried something for python.thread + signals. This is posix specific, and it works as expected on linux: - first, a C extension which implements the signal handling. It has a function called hello, which is the entry point of the C module, and calls the function process (which does random computation). It checks if it got a SIGINT signal, and returns -1 if caught. Returns 0 if no SIGINT called: - extension compiled into python module (I used boost python because I am too lazy to find how to do it in C :) ) - python script which creates several threads running the hello function. They run in parallel, and ctrl+C is correctly handled. I think this is signal specific, and this needs to be improved (this is just meant as a toy example): import threading import hello import time class mythread(threading.Thread): def __init__(self): threading.Thread.__init__(self) def run(self): print "Starting thread", self.getName() st = 0 while st == 0: st = hello.foo(self.getName()) # sleep to force the python interpreter to run # other threads if available time.sleep(1) if st == -1: print self.getName() + " got signal" print "Ending thread", self.getName() nthread = 5 t = [mythread() for i in range(nthread)] [i.start() for i in t] Then, you have something like: tarting thread Thread-1 Thread-1 processing... done clean called Starting thread Thread-5 Thread-5 processing... done clean called Starting thread Thread-3 Thread-3 processing... done clean called Starting thread Thread-2 Thread-2 processing... done hello.c:hello signal caught line 56 for thread Thread-2 clean called Thread-1 processing... done clean called Starting thread Thread-4 Thread-4 processing... done clean called Thread-5 processing... done clean called Thread-3 processing... done hello.c:hello signal caught line 56 for thread Thread-3 clean called Thread-2 got signal Ending thread Thread-2 Thread-1 processing... done clean called Thread-4 processing... done clean called Thread-5 processing... done clean called Thread-3 got signal Ending thread Thread-3 Thread-1 processing... done hello.c:hello signal caught line 56 for thread Thread-1 clean called Thread-4 processing... done clean called Thread-5 processing... done hello.c:hello signal caught line 56 for thread Thread-5 clean called Thread-1 got signal Ending thread Thread-1 Thread-4 processing... done clean called Thread-5 got signal Ending thread Thread-5 Thread-4 processing... done clean called Thread-4 processing... done clean called Thread-4 processing... done hello.c:hello signal caught line 56 for thread Thread-4 clean called Thread-4 got signal Ending thread Thread-4 (SIGINT are received when Ctrl+C on linux) You can find all sources here: http://www.ar.media.kyoto-u.ac.jp/members/david/numpysig/ Please note that I know almost nothing about all this stuff, I just naively implemented from the example of GNU C library, and it always worked for me on matlab on my machine. I do not know if this is portable, if this can work for other signals, etc... David |
From: Travis O. <oli...@ie...> - 2006-08-25 04:03:07
|
Sebastian Haase wrote: > On Thursday 24 August 2006 17:28, Travis Oliphant wrote: > >> Sebastian Haase wrote: >> >>> Hi, >>> I get >>> TypeError: array cannot be safely cast to required type >>> >>> when calling hstack() ( which calls concatenate() ) >>> on two arrays being a int32 and a float32 respectively. >>> >>> I understand now that a int32 cannot be safely converted into a float32 >>> but why does concatenate not automatically >>> up(?) cast to float64 ?? >>> >> Basically, NumPy is following Numeric's behavior of raising an error in >> this case of unsafe casting in concatenate. For functions that are not >> universal-function objects, mixed-type behavior works basically just >> like Numeric did (using the ordering of the types to determine which one >> to choose as the output). >> >> It could be argued that the ufunc-rules should be followed instead. >> >> -Travis >> >> > Are you saying the ufunc-rules would convert "int32-float32" to float64 and > hence make my code "just work" !? > This is now the behavior in SVN. Note that this is different from both Numeric (which gave an error) and numarray (which coerced to float32). But, it is consistent with how mixed-types are handled in calculations and is thus an easier rule to explain. Thanks for the testing. -Travis |
From: Sebastian H. <ha...@ms...> - 2006-08-25 03:59:27
|
Travis Oliphant wrote: > Sebastian Haase wrote: >> On Thursday 24 August 2006 17:28, Travis Oliphant wrote: >> >> Are you saying the ufunc-rules would convert "int32-float32" to float64 and >> hence make my code "just work" !? >> > Yes. That's what I'm saying (but you would get float64 out --- but if > you didn't want that then you would have to be specific). > >> And why are there two sets of rules ? >> > Because there are two modules (multiarray and umath) where the > functionality is implemented. > >> Are the Numeric rules used at many places ? >> > Not that many. I did abstract the notion to a C-API: > PyArray_ConvertToCommonType and implemented the > scalars-don't-cause-upcasting part of the ufunc rules in that code. > But, I followed the old-style Numeric coercion rules for the rest of it > (because I was adapting Numeric). > > Right now, unless there are strong objections, I'm leaning to changing > that so that the same coercion rules are used whenever a common type is > needed. If you mean keeping the ufunc rules (which seem more liberal, fix my problem ;-) and might make using float32 in general more painless) - I would be all for it ... simplifying is always good in the long term ... Cheers, Sebastian > > It would not be that difficult of a change. |
From: Travis O. <oli...@ie...> - 2006-08-25 03:20:42
|
David Cournapeau wrote: >>> >>> >> If nothing is known about memory allocation of the external library, >> then I don't see how it can be safely interrupted using any mechanism. >> >> > If the library does nothing w.r.t signals, then you just have to clean > all the things related to the library once > you caught a signal. This is no different than cleaning your own code. > Right, as long as you know what to do you are O.K. I was just thinking about a hypothetical situation where the library allocated some temporary memory that it was going to free at the end of the subroutine but then an interrupt jumped out back to your code before it could finish. In a case like this, you would have to use the "check if interrupt has occurred" approach before and after the library call. But, then that library call is not interruptable. I could also see wanting to be able to interrupt a library calculation when you know it isn't allocating memory. So, I like having both possibilities available. So far we haven't actually put anything in the numpy code itself. I'm leaning to putting PyOS_InterruptOccurred-style checks in a few places at some point down the road. -Travis |
From: Sasha <nd...@ma...> - 2006-08-25 03:10:29
|
On 8/24/06, Bill Baxter <wb...@gm...> wrote: [snip] > Hey Sasha. Your defnition may be more correct, but I have to confess > I don't understand it. > > "Universal function. Universal functions follow similar rules for > broadcasting, coercion and "element-wise operation"." > > What is "coercion"? (Who or what is being coerced to do what?) and > what does it mean to "follow similar rules for ... coercion"? Similar > to what? This is not my definition, I just rephrased the introductory paragraph from the ufunc section of the "Numerical Python" <http://numpy.scipy.org/numpydoc/numpy-7.html#pgfId-36127>. Feel free to edit it so that it makes more sense. Please note that I originally intended the "Numpy Glossary" not as a place to learn new terms, but as a guide for those who know more than one meaning of the terms or more than one way to call something. (See the preamble.) This may explain why I did not include "ufunc" to begin with. (I remember deciding not to include "ufunc", but I don't remember the exact reason anymore.) I would welcome an effort to make the glossary more novice friendly, but not at the expense of oversimplifying things. BTW, do you think "Rank ... (2) number of orthogonal dimensions of a matrix" is clear? Considering that matrix is defined a "an array of rank 2"? Is "rank" in linear algebra sense common enough in numpy documentation to be included in the glossary? For comparison, here are a few alternative formulations of matrix rank definition: "The rank of a matrix or a linear map is the dimension of the image of the matrix or the linear map, corresponding to the number of linearly independent rows or columns of the matrix, or to the number of nonzero singular values of the map." <http://mathworld.wolfram.com/MatrixRank.html> "In linear algebra, the column rank (row rank respectively) of a matrix A with entries in some field is defined to be the maximal number of columns (rows respectively) of A which are linearly independent." <http://en.wikipedia.org/wiki/Rank_(linear_algebra)> |
From: Travis O. <oli...@ie...> - 2006-08-25 03:07:07
|
Sebastian Haase wrote: > On Thursday 24 August 2006 17:28, Travis Oliphant wrote: > > Are you saying the ufunc-rules would convert "int32-float32" to float64 and > hence make my code "just work" !? > Yes. That's what I'm saying (but you would get float64 out --- but if you didn't want that then you would have to be specific). > And why are there two sets of rules ? > Because there are two modules (multiarray and umath) where the functionality is implemented. > Are the Numeric rules used at many places ? > Not that many. I did abstract the notion to a C-API: PyArray_ConvertToCommonType and implemented the scalars-don't-cause-upcasting part of the ufunc rules in that code. But, I followed the old-style Numeric coercion rules for the rest of it (because I was adapting Numeric). Right now, unless there are strong objections, I'm leaning to changing that so that the same coercion rules are used whenever a common type is needed. It would not be that difficult of a change. -Travis |