You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Russell E O. <rowen@u.washington.edu> - 2004-07-14 17:41:09
|
At 9:50 AM -0700 2004-07-14, Paul F. Dubois wrote: >The median filter is prepared to take an argument of a numarray >array but ignorant of and unprepared to deal with masked values. >Using the __array__ trick, both Numeric.MA and numarray.ma would >'know' this and therefore replace the missing values in the filter's >argument with the 'fill value' for that type -- a big number in the >case of real arrays. You could explicitly choose that value (say >using the overall median of the data m) by passing x.filled(m) >rather than x to the filter. > >If there is no such value, you probably do have to do it in C. If >you wrote it in C, how would you treat missing elements? BTW it >wouldn't be that hard; just pass both the array and its mask as >separate elements to a C routine and use SWIG to hook it up. I already have routines that handle masked data in C to create a radial profiles from 2-d integer data (since I could not figure out how to do that in numarray). I chose to pass the mask as a separate array, since I could not find any C interface for numarray.ma and since NaN made no sense for integer data. That code was pretty straightforward. I wish I could have found a simple way to support multiple array types. I thought using C++ with prototypes would be the ticket, but absent any examples and after looking through the numarray code, I gave up and took the easy way out. (I didn't use SWIG, though, I just hand coded everything. Maybe that was a mistake.) I confess that makes me worry about the underpinnings of numarray. It seems an obvious candidate to be written in C++ with prototypes. I hate to think what the developers have to go through, instead. In any case, writing a median filter is a bigger deal than taking a radial profile, and since one already existed I thought I'd ask. >I doubt NaN would help you here; you'd still have to figure out what >to do in those places. Numeric did not have support for NaN because >there were portability problems. Probably still are. And you still >are stuck in a lot of cases anyway. Well, NaN isn't very general in any case, since it's meaningless for integer data. So maybe that's a red herring. (Though if NaN had worked to mask data I would cheerfully have converted my images to floats to take advantage of it!). What's really wanted is a more unified approach to masked data. I suppose it's pie in the sky, but I sure wish most the numarray functions took an optional mask array (or accepted a numarray.ma object -- nice for the user, but probably too painful for words under the hood). I don't think there are major issues with what to do with masked data. Simply ignoring it works in most cases, e.g. mean, std dev, sum, max... In some cases one needs the new mask as output (e.g. matrix multiply). Filtering is a bit subtle: can masked data be treated the same as data off the edge? I hope so, but I'm not sure. Anyway, I am grateful for what we do have. Without Numeric or numarray I would have to write all my image processing code in a different language. -- Russell |
From: Peter V. <ve...@em...> - 2004-07-14 17:37:26
|
On 14 Jul 2004, at 17:47, Russell E Owen wrote: > I want to 3x3 median filter a masked array (2-d array of ints -- an > astronomical image), where the masked data and points off the edge are > excluded from the local median calculation. Any suggestions for how to > do this efficiently? I don't think that you can do it very efficiently right now with the functions that are available in numarray. > I suspect I have to write it in C, which is an unpleasant prospect. Yes, that is unpleasant, trust me :-) However, in version 1.0 of numarray in the nd_image package, I have added some support for writing filter functions. The generic_filter() function iterates over the array and applies a user-defined filter function at each element. The user-defined function can be written in python or in C, and is called at each element with the values within the filter-footprint as an argument. You would write a function that finds the median of these values, excluding the NaNs (or whatever value that flags the mask.) I would suggest to prototype this function in python and move that to C as soon as it works to your satisfaction. See the numarray manual for more details. Cheers, Peter |
From: John H. <jdh...@ac...> - 2004-07-14 16:50:47
|
matplotlib is a 2D plotting library for python. You can use matplotlib interactively from a python shell or IDE, or embed it in GUI applications (WX, GTK, and Tkinter). matplotlib supports many plot types: line plots, bar charts, log plots, images, pseudocolor plots, legends, date plots, finance charts and more. What's new since matplotlib 0.50 This is the first wide release in 5 months and there has been a tremendous amount of development since then, with new backends, many optimizations, new plotting types, new backends and enhanced text support. See http://matplotlib.sourceforge.net/whats_new.html for details. * Todd Miller's tkinter backend (tkagg) with good support for interactive plotting using the standard python shell, ipython or others. matplotlib now runs on windows out of the box with python + numeric/numarry * Full Numeric / numarray integration with Todd Miller's numerix module. Prebuilt installers for numeric and numarray on win32. Others, please set your numerix settings before building matplotlib, as described on http://matplotlib.sourceforge.net/faq.html#NUMARRAY * Mathtext: you can write TeX style math expressions anywhere in your figure. http://matplotlib.sourceforge.net/screenshots.html#mathtext_demo. * Images - figure and axes images with optional interpolated resampling, alpha blending of multiple images, and more with the imshow and figimage commands. Interactive control of colormaps, intensity scaling and colorbars - http://matplotlib.sourceforge.net/screenshots.html#layer_images * Text: freetype2 support, newline separated strings with arbitrary rotations, Paul Barrett's cross platform font manager. http://matplotlib.sourceforge.net/screenshots.html#align_text * Jared Wahlstrand's SVG backend (alpha) * Support for popular financial plot types - http://matplotlib.sourceforge.net/screenshots.html#finance_work2 * Many optimizations and extension code to remove performance bottlenecks. pcolors and scatters are an order of magnitude faster. * GTKAgg, WXAgg, TkAgg backends for http://antigrain.com (agg) rendering in the GUI canvas. Now all the major GUIs (WX, GTK, Tk) can be used with a common (agg) renderer. * Many new examples and demos - see http://matplotlib.sf.net/examples or download the src distribution and look in the examples dir. Documentation and downloads available at http://matplotlib.sourceforge.net. John Hunter |
From: Russell E O. <rowen@u.washington.edu> - 2004-07-14 15:47:47
|
I want to 3x3 median filter a masked array (2-d array of ints -- an astronomical image), where the masked data and points off the edge are excluded from the local median calculation. Any suggestions for how to do this efficiently? I suspect I have to write it in C, which is an unpleasant prospect. I tried using NaN for points to mask out, but the median filter seems to handle those as "infinity", or something equally inappropriate. In a related vein, has Python come along far enough that it would be reasonable to add support for NaN to numarray -- in the sense that statistics calculations, filters, etc. could be convinced to ignore NaNs? Obviously this support would be contingent on compiling python with IEEE floating point support, but I suspect that's the default on most platforms these days. -- Russell |
From: <Seb...@el...> - 2004-07-14 15:40:36
|
I could not resist to propose an other solution: r = array([0,2,5,6,8]) l = (r[:,NewAxis] + r[NewAxis,:]).flat -----Original Message----- From: Hee-Seng Kye [mailto:ky...@ea...] Sent: mercredi 14 juillet 2004 4:22 To: num...@li... Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Hi. I wrote a program to calculate sums of every possible combinations of two indices of a list. The main body of the program looks something like this: r = [0,2,5,6,8] l = [] for x in range(0, len(r)): for y in range(0, len(r)): k = r[x]+r[y] l.append(k) print l 1. I've heard that it's not a good idea to have a 'for' loop within another 'for' loop, and I was wondering if there is a more efficient way to do this. 2. Does anyone know if there is a built-in function or module that would do the above task in NumPy or Numarray (or even in Python)? I would really appreciate it if anyone could let me know. Thanks for your help! |
From: Paul F. D. <pa...@pf...> - 2004-07-14 12:56:51
|
>>> add.reduce(take(r,indices([len(r),len(r)]))).flat array([ 0, 2, 5, 6, 8, 2, 4, 7, 8, 10, 5, 7, 10, 11, 13, 6, 8, 11, 12, 14, 8, 10, 13, 14, 16]) Always like a good challenge in the morning. God, it is like the old rush of writing APL. Hee-Seng Kye wrote: > Hi. I wrote a program to calculate sums of every possible combinations > of two indices of a list. The main body of the program looks something > like this: > > r = [0,2,5,6,8] > l = [] > > for x in range(0, len(r)): > for y in range(0, len(r)): > k = r[x]+r[y] > l.append(k) > print l > > 1. I've heard that it's not a good idea to have a 'for' loop within > another 'for' loop, and I was wondering if there is a more efficient way > to do this. > > 2. Does anyone know if there is a built-in function or module that would > do the above task in NumPy or Numarray (or even in Python)? > > I would really appreciate it if anyone could let me know. > > Thanks for your help! |
From: Todd M. <jm...@st...> - 2004-07-14 11:37:37
|
On Wed, 2004-07-14 at 05:36, Francesc Alted wrote: > A Dimarts 13 Juliol 2004 19:41, Todd Miller va escriure: > > The real fix for the bug appears to be to redefine the semantics of > > numarray's PyArrayObject ->data pointer to include ->byteoffset, > > altering the C-API. > > Oh well, I'm afraid that I'll be affected by that :(. Just to understand > that fully, you mean that real data for an array will start in the future at > narr->data, instead of narr->data+narr->byteoffset as it does now? That is the current plan. I was thinking developers could just replace the new narr->data with (narr->data - narr->byteoffset) if needed. I'm assuming the planned changes will cost at most a few edits and package redistribution, which I understand is still a major pain in the neck; let me know if the cost is higher than that for some reason. Regards, Todd |
From: Francesc A. <fa...@py...> - 2004-07-14 09:36:23
|
A Dimarts 13 Juliol 2004 19:41, Todd Miller va escriure: > The real fix for the bug appears to be to redefine the semantics of > numarray's PyArrayObject ->data pointer to include ->byteoffset, > altering the C-API. Oh well, I'm afraid that I'll be affected by that :(. Just to understand that fully, you mean that real data for an array will start in the future at narr->data, instead of narr->data+narr->byteoffset as it does now? -- Francesc Alted |
From: Hee-Seng K. <ky...@ea...> - 2004-07-14 06:29:52
|
Thank you so much. It works beautifully! On Jul 14, 2004, at 1:01 AM, Warren Focke wrote: > l = Numeric.add.outer(r, r).flat > oughta do the trick. Should work for numarray, too. > > On Tue, 13 Jul 2004, Hee-Seng Kye wrote: > >> Hi. I wrote a program to calculate sums of every possible >> combinations >> of two indices of a list. The main body of the program looks >> something >> like this: >> >> r = [0,2,5,6,8] >> l = [] >> >> for x in range(0, len(r)): >> for y in range(0, len(r)): >> k = r[x]+r[y] >> l.append(k) >> print l >> >> 1. I've heard that it's not a good idea to have a 'for' loop within >> another 'for' loop, and I was wondering if there is a more efficient >> way to do this. >> >> 2. Does anyone know if there is a built-in function or module that >> would do the above task in NumPy or Numarray (or even in Python)? >> >> I would really appreciate it if anyone could let me know. >> >> Thanks for your help! > > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - > digital self defense, top technical experts, no vendor pitches, > unmatched networking opportunities. Visit www.blackhat.com > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: eric j. <er...@en...> - 2004-07-14 05:08:06
|
Hey folks, Just a reminder that SciPy 04 is coming up. More information is here: http://www.scipy.org/wikis/scipy04 About the Conference and Keynote Speaker --------------------------------------------- The 1st annual *SciPy Conference* will be held this year at Caltech, September 2-3, 2004. As some of you may know, we've experienced great participation in two SciPy "Workshops" (with ~70 attendees in both 2002 and 2003) and this year we're graduating to a "conference." With the prestige of a conference comes the responsibility of a keynote address. This year, Jim Hugunin has answered the call and will be speaking to kickoff the meeting on Thursday September 2nd. Jim is the creator of Numeric Python, Jython, and co-designer of AspectJ. Jim is currently working on IronPython--a fast implementation of Python for .NET and Mono. Presenters ----------- We still have room for a few more standard talks, and there is plenty of room for lightning talks. Because of this, we are extending the abstract deadline until July 23rd. Please send your abstract to abs...@sc.... Travis Oliphant is organizing the presentations this year. (Thanks!) Once accepted, papers and/or presentation slides are acceptable and are due by August 20, 2004. Registration ------------- Early registration ($100.00) has been extended to July 23rd. Follow the links off of the main conference site: http://www.scipy.org/wikis/scipy04 After July 23rd, registration will be $150.00. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. Please register as soon as possible as it will help us in planning for food, room sizes, etc. Sprints -------- As of now, we really haven't had much of a call for coding sprints for the 3 days prior to SciPy 04. Below is the original announcement about sprints. If you would like to suggest a topic and see if others are interested, please send a message to the list. Otherwise, we'll forgo the sprints session this year. We're also planning three days of informal "Coding Sprints" prior to the conference -- August 30 to September 1, 2004. Conference registration is not required to participate in the sprints. Please email the list, however, if you plan to attend. Topics for these sprints will be determined via the mailing lists as well, so please submit any suggestions for topics to the scipy-user list: list signup: http://www.scipy.org/mailinglists/ list address: sci...@sc... thanks, eric |
From: Warren F. <fo...@sl...> - 2004-07-14 05:01:49
|
l = Numeric.add.outer(r, r).flat oughta do the trick. Should work for numarray, too. On Tue, 13 Jul 2004, Hee-Seng Kye wrote: > Hi. I wrote a program to calculate sums of every possible combinations > of two indices of a list. The main body of the program looks something > like this: > > r = [0,2,5,6,8] > l = [] > > for x in range(0, len(r)): > for y in range(0, len(r)): > k = r[x]+r[y] > l.append(k) > print l > > 1. I've heard that it's not a good idea to have a 'for' loop within > another 'for' loop, and I was wondering if there is a more efficient > way to do this. > > 2. Does anyone know if there is a built-in function or module that > would do the above task in NumPy or Numarray (or even in Python)? > > I would really appreciate it if anyone could let me know. > > Thanks for your help! |
From: Hee-Seng K. <ky...@ea...> - 2004-07-14 02:22:22
|
Hi. I wrote a program to calculate sums of every possible combinations of two indices of a list. The main body of the program looks something like this: r = [0,2,5,6,8] l = [] for x in range(0, len(r)): for y in range(0, len(r)): k = r[x]+r[y] l.append(k) print l 1. I've heard that it's not a good idea to have a 'for' loop within another 'for' loop, and I was wondering if there is a more efficient way to do this. 2. Does anyone know if there is a built-in function or module that would do the above task in NumPy or Numarray (or even in Python)? I would really appreciate it if anyone could let me know. Thanks for your help! |
From: Russell E O. <rowen@u.washington.edu> - 2004-07-14 00:04:10
|
At 1:41 PM -0700 2004-07-13, Mike Zingale wrote: >thanks, all these responses helped. I guess I was still a little >unclear with the slicing abilities in numarray... Also note that there is a shift function: numarray.nd_image.shift In your case I suspect slicing is better, but there are times when one really does want to shift the data (e.g. when one wants the resulting array to be the same shape as the original). -- Russell |
From: Mike Z. <zi...@uc...> - 2004-07-13 20:41:29
|
thanks, all these responses helped. I guess I was still a little unclear with the slicing abilities in numarray. Mike On Tue, 13 Jul 2004, Paul Dubois wrote: > Two of the responses to your question, while correct, might have seemed > mysterious to a beginner. > > a[1:] - a[:-1] > > is actually shorthand for: > > a[1:, :] - a[:-1, :] > > Or to be even more explicit: > > n = 8 > a[1:n, 0:n] - a[0:(n-1), 0:n] > > If you had wanted the difference in the second index, you have to use > the more explicit forms. > > > |
From: Robert K. <rk...@uc...> - 2004-07-13 20:00:44
|
Mike Zingale wrote: > Hi, I am trying to efficiently compute a difference of two 2-d flux > arrays, as arises quite commonly in finite-difference/finite-volume > methods. Ex: > > a = arange(64) > a.shape = (8,8) > > I want to do create a new array, b, of shape such that > > b[i,j] = a[i,j] - a[i-1,j] > > for 1 <= i < 8 > 0 <= i < 8 Try b = a[1:] - a[:-1] -- Robert Kern rk...@uc... "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter |
From: Tim H. <tim...@co...> - 2004-07-13 19:58:10
|
Mike Zingale wrote: >Hi, I am trying to efficiently compute a difference of two 2-d flux >arrays, as arises quite commonly in finite-difference/finite-volume >methods. Ex: > >a = arange(64) >a.shape = (8,8) > >I want to do create a new array, b, of shape such that > >b[i,j] = a[i,j] - a[i-1,j] > >for 1 <= i < 8 > 0 <= i < 8 > > That's supposed to be a j in the second eq., right? If I understand you right, what you want is: b = a[1:] - a[:-1] -tim >I can obviously do this through loops, but this is quite slow. In IDL, >which is often compared to numarray/python, this is simple to do with the >shift() function, but I cannot find an efficient way to do it with >numarray arrays. > >I tried defining a list > >i = range(8) >im1[1:9] = im1[1:9] - 1 > >and indexing with im1, but this does not work. > >Any suggestions? For large array, this simple differencing in python is >very expensive when using loops. > >Thanks, > >Mike > >------------------------------------------------------------------------------ >Michael Zingale >UCO/Lick Observatory >UCSC >Santa Cruz, CA 95064 > >phone: (831) 459-5246 >fax: (831) 459-5265 >e-mail: zi...@uc... >web: http://www.ucolick.org/~zingale > >``Don't worry head, the computer will do our thinking now'' -- Homer > > > >------------------------------------------------------- >This SF.Net email sponsored by Black Hat Briefings & Training. >Attend Black Hat Briefings & Training, Las Vegas July 24-29 - >digital self defense, top technical experts, no vendor pitches, >unmatched networking opportunities. Visit www.blackhat.com >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > |
From: Mike Z. <zi...@uc...> - 2004-07-13 19:53:19
|
Hi, I am trying to efficiently compute a difference of two 2-d flux arrays, as arises quite commonly in finite-difference/finite-volume methods. Ex: a = arange(64) a.shape = (8,8) I want to do create a new array, b, of shape such that b[i,j] = a[i,j] - a[i-1,j] for 1 <= i < 8 0 <= i < 8 I can obviously do this through loops, but this is quite slow. In IDL, which is often compared to numarray/python, this is simple to do with the shift() function, but I cannot find an efficient way to do it with numarray arrays. I tried defining a list i = range(8) im1[1:9] = im1[1:9] - 1 and indexing with im1, but this does not work. Any suggestions? For large array, this simple differencing in python is very expensive when using loops. Thanks, Mike ------------------------------------------------------------------------------ Michael Zingale UCO/Lick Observatory UCSC Santa Cruz, CA 95064 phone: (831) 459-5246 fax: (831) 459-5265 e-mail: zi...@uc... web: http://www.ucolick.org/~zingale ``Don't worry head, the computer will do our thinking now'' -- Homer |
From: Todd M. <jm...@st...> - 2004-07-13 17:41:56
|
Overview There is a bug in numarray's Numeric compatible C-API. The bug has been latent for a long time, since numarray-0.3 was released roughly two years ago. It is serious because it results in wrong answers for a certain extension functions fed a certain class of arrays. What's affected The bug affects affects numarray's add-on packages or third party extension functions which use the Numeric compatibility C-API. Generally, this means C-code that was either ported from Numeric or was written with both Numeric and numarray in mind. This includes the add-on packages numarray.linear_algebra, numarray.fft, numarray.random_array, and numarray.mlab. More recently, it includes the ports of core Numeric functions to numarray.numeric. Because numarray.ma uses numarray.numeric, the bug also affects numarray.ma. Finally, for numarray-1.0 this bug affects the functions numarray.argmin and numarray.argmax; these should be the only two functions in core numarray which are affected. Detailed Bug Description The bug is exposed by calling an extension function (written using the Numeric compatible C-API) with an array that has a non-zero _byteoffset attribute. Arrays with non-zero _byteoffset are typically created as a result of partially indexing higher dimensional arrays or slicing arrays. Partially indexing or slicing an array generally results in a sub-array, a view which often refers to an interior region of the original array buffer. Because numarray's PyArrayObject does not currently include it's ->byteoffset in its ->data pointer as the Numeric compatibility API assumes it does, an extension function sees the base region of the original array rather than the region belonging to the sub-array. Immediate User Workaround A simple user level workaround for people that need to use the affected packages and functions today is one like the following: def make_safe_for_numeric_api(a): a = numarray.asarray(a) if a._byteoffset != 0: return a.copy() else: return a The array inputs to an affected extension function need to be wrapped with calls to make_safe_for_numeric_api(). Since this is intrusive and a real fix should be released in the near future, this approach is not recommended. Long Term Fix The real fix for the bug appears to be to redefine the semantics of numarray's PyArrayObject ->data pointer to include ->byteoffset, altering the C-API. This should make most existing Numeric compatible extension functions work without modification or recompilation, but will necessitate the re-compilation of some extension functions written using the native numarray API approaches (the NA_* functions and macros). This recompilation will be required because key macros will change, most notably NA_OFFSETDATA. This fix is not the only possible one, and other suggestions are welcome, but changing the semantics of ->data appears to be the best way to facilitate numarray/Numeric interoperability. By doing this fix, numarray operates more like Numeric so fewer changes need to be made in the future to perform ports of Numeric code to numarray. Impact of Proposed Fix Regrettably, the proposed fix will break binary compatibility for clients of the numarray-1.0 native C-API. So, extensions built using the numarray native C-API will need to be rebuilt for numarray-1.1. Extensions that have made direct access to PyArrayObject's ->data and require the original offsetless meaning will also need to change code for numarray-1.1. This is something we *really* wanted to avoid... it just isn't going to happen this time. The Plan The current plan is to fix the Numeric compatible API by changing the semantics of ->data and release numarray-1.1 relatively soon, hopefully within 2 weeks. I'm sorry for any inconvenience this has caused numarray users. Regards, Todd Miller |
From: Francesc A. <fa...@py...> - 2004-07-13 09:12:23
|
PyTables is a hierarchical database package designed to efficiently manage very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with natural naming and C-code generated from Pyrex sources, makes it a fast, yet extremely easy-to-use tool for interactively saving and retrieving different kinds of datasets. It also provides flexible indexed access on disk to anywhere in the data. The primary purpose of this release is to incorporate updates to related to the newly released numarray 1.0. I've taken the opportunity to backport some improvements added in PyTables 0.9 (in alpha stage) as well as to fix the known problems Improvements: - The logic for computing the buffer sizes has been revamped. As a consequence, the performance of writing/reading tables with large record sizes has improved by a factor of ten or more, now exceeding 70 MB/s for writing and 130 MB/s for reading (using compression). - The maximum record size for tables has been raised to 512 KB (before it was 8 KB, due to some internal limitations) - Documentation has been improved in many minor details. As a result of a fix in the underlying documentation system (tbook), chapters start now at odd pages, instead of even. So those of you who want to print to double side probably will have better luck now when aligning pages ;). Another one is that HTML documentation has improved its look as well. Bug Fixes: - Indexing of Arrays with list or tuple flavors (#968131) When retrieving single elements from an array with 'List' or 'Tuple' flavors, an error occurred. This has been corrected and now you can retrieve fileh.root.array[2] without problems for 'List' or 'Tuple' flavored (E, VL)Arrays. - Iterators on Arrays with list or tuple flavors fail (#968132) When using iterators with Array objects with 'List' or 'Tuple' flavors, an error occurred. This has been corrected. - Last Index (-1) of Arrays doesn't work (#968149) When accessing to the last element in an Array using the notation -1, an empty list (or tuple or array) is returned instead of the proper value. This happened in general with all negative indices. Fixed. - Table.read(flavor="List") should return pure lists (#972534) However, it used to return a pointer to numarray.records.Record instances, as in: >>> fileh.root.table.read(1,2,flavor="List") [<numarray.records.Record instance at 0x4128352c>] >>> fileh.root.table.read(1,3,flavor="List") [<numarray.records.Record instance at 0x4128396c>, <numarray.records.Record instance at 0x41283a8c>] Now the next records are returned: >>> fileh.root.table.read(1,2, flavor=List) [(' ', 1, 1.0)] >>> fileh.root.table.read(1,3, flavor=List) [(' ', 1, 1.0), (' ', 2, 2.0)] In addition, when reading a single row of a table, a numarray.records.Record pointer was returned: >>> fileh.root.table[1] <numarray.records.Record instance at 0x4128398c> Now, it returns a tuple: >>> fileh.root.table[1] (' ', 1, 1.0) Which I think is more consistent, and more Pythonic. - Copy of leaves fails... (#973370) Attempting to copy leaves (Table or Array with different flavors) on top of themselves caused an internal error in PyTables. This has been corrected by silently avoiding the copy and returning the original Leaf as a result. Minor changes: - When assigning a value to a non-existing field in a table row, now a KeyError is raised, instead of the AttributeError that was issued before. I think this is more consistent with the type of error. - Tests have been improved so as to pass the whole suite when compiled in 64 bit mode on a Linux/PowerPC machine (namely a dual-G5 Powermac running a 64-bit, 2.6.4 Linux kernel and the preview YDL distribution for G5, with 64-bit GCC toolchain). Thanks to Ciro Cattuto for testing and reporting the modifications that were needed. Where PyTables can be applied? ------------------------------ PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like an AMD Opteron running SuSe Linux Enterprise Server or PowerPC G5 with Linux 2.6.x in 64bit mode. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/html/tut/tutorial1-1.html and, for newly introduced Variable Length Arrays: http://pytables.sourceforge.net/html/tut/vlarray2.html Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-07-13 09:06:29
|
A Dimarts 13 Juliol 2004 10:28, Francesc Alted va escriure: > A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: > > What I'm wondering about is what a single element of a record array > > should be. Returning a tuple has an undeniable simplicity to it. > > Yeah, this why I'm strongly biased toward this possibility. > > > On the other hand, we've been using recarrays that allow naming the > > various columns (which we refer to as "fields"). If one can refer > > to fields of a recarray, shouldn't one be able to refer to a field > > (by name) of one of it's elements? Or are you proposing that basic > > recarrays not have that sort of capability (something added by a > > subclass)? > > Well, I'm not sure about that. But just in case most of people would like to > access records by field as well as by index, I would advocate for the > possibility that the Record instances would behave as similar as possible as > a tuple (or dictionary?). That include creating appropriate __str__() *and* > __repr__() methods as well as __getitem__() that supports both name fields > and indices. I'm not sure about whether providing an __getattr__() method > would ok, but for the sake of simplicity and in order to have (preferably) > only one way to do things, I would say no. I've been thinking that one can made compatible to return a tuple on a single element of a RecArray and still being able to retrieve a field by name is to play with the RecArray.__getitem__ and let it to suport key names in addition to indices. This would be better seen as an example: Right now, one can say: >>> r=records.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") >>> r._fields["c1"] array([1, 2]) >>> r._fields["c1"][1] 2 What I propose is to be able to say: >>> r["c1"] array([1, 2]) >>> r["c1"][1] 2 Which would replace the notation: >>> r[1]["c1"] 2 which was recently suggested. I.e. the suggestion is to realize RecArrays as a collection of columns, as well as a collection of rows. -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-07-13 08:28:13
|
A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: > What I'm wondering about is what a single element of a record array > should be. Returning a tuple has an undeniable simplicity to it. Yeah, this why I'm strongly biased toward this possibility. > On the other hand, we've been using recarrays that allow naming the > various columns (which we refer to as "fields"). If one can refer > to fields of a recarray, shouldn't one be able to refer to a field > (by name) of one of it's elements? Or are you proposing that basic > recarrays not have that sort of capability (something added by a > subclass)? Well, I'm not sure about that. But just in case most of people would like to access records by field as well as by index, I would advocate for the possibility that the Record instances would behave as similar as possible as a tuple (or dictionary?). That include creating appropriate __str__() *and* __repr__() methods as well as __getitem__() that supports both name fields and indices. I'm not sure about whether providing an __getattr__() method would ok, but for the sake of simplicity and in order to have (preferably) only one way to do things, I would say no. Regards, -- Francesc Alted |
From: Russell E O. <rowen@u.washington.edu> - 2004-07-12 23:08:07
|
At 5:14 PM -0400 2004-07-12, Perry Greenfield wrote: >What I'm wondering about is what a single element of a record array >should be. Returning a tuple has an undeniable simplicity to it. >On the other hand, we've been using recarrays that allow naming the >various columns (which we refer to as "fields"). If one can refer >to fields of a recarray, shouldn't one be able to refer to a field >(by name) of one of it's elements? Or are you proposing that basic >recarrays not have that sort of capability (something added by a >subclass)? In my opinion, an single item of a record array should be a RecordItem object that is a dictionary that keeps items in field order. Thus: - use the standard dictionary interface to deal with values by name (except the keys are always in the correct order. - one can also get and set the all data at once as a tuple. This is NOT a standard dictionary interface, but is essential. Functions such as getvalues(), setvalues(dataTuple) should do it. Adopting the full dictionary interface means one gets a standard, mature and fairly complete set of features. ALSO a RecordItem object can then be used wherever a dictionary object is needed. I suspect it's also useful to have named field access: RecordItem.fieldname but am a bit reluctant to suggest so many different ways of getting to the data. I assume it will continue to be easy to get all data for a field by naming the appropriate field. That's a really nice feature. It would be even better if a masked array could be used, but I have no idea how hard this would be. Which brings up a side issue: any hope of integrating masked arrays into numarray, such that they could be used wherever a numarray array could be used? Areas that I particularly find myself needing them including nd_image filtering and writing C extensions. -- Russell P.S. I submitted several feature requests and bug reports for records on sourceforge months ago. I hope they'll not be overlooked during the review process. |
From: Perry G. <pe...@st...> - 2004-07-12 21:14:29
|
Francesc Alted wrote: > > As Perry said not too long ago that numarray crew would ask for > suggestions > for RecArray improvements, I'm going to suggest a couple. > > I find quite inconvenient the .tolist() method when applied to RecArray > objects as it is now: > > >>> r[2:4] > array( > [(3, 33.0, 'c'), > (4, 44.0, 'd')], > formats=['1UInt8', '1Float32', '1a1'], > shape=2, > names=['c1', 'c2', 'c3']) > >>> r[2:4].tolist() > [<numarray.records.Record instance at 0x406a946c>, > <numarray.records.Record instance at 0x406a912c>] > > > The suggested behaviour would be: > > >>> r[2:4].tolist() > [(3, 33.0, 'c'),(4, 44.0, 'd')] > > Another thing is that an element of recarray would be returned as a tuple > instead as a records.Record object: > > >>> r[2] > <numarray.records.Record instance at 0x4064074c> > > The suggested behaviour would be: > > >>> r[2] > (3, 33.0, 'c') > > I think the latter would be consistent with the convention that a > __getitem__(int) of a NumArray object returns a python type instead of a > rank-0 array. In the same way, a __getitem__(int) of a RecArray should > return a a python type (a tuple in this case). > These are good examples of where improvements are needed (we are also looking at how best to handle multidimensional arrays and should have a proposal this week). What I'm wondering about is what a single element of a record array should be. Returning a tuple has an undeniable simplicity to it. On the other hand, we've been using recarrays that allow naming the various columns (which we refer to as "fields"). If one can refer to fields of a recarray, shouldn't one be able to refer to a field (by name) of one of it's elements? Or are you proposing that basic recarrays not have that sort of capability (something added by a subclass)? Perry |
From: Chris B. <Chr...@no...> - 2004-07-09 16:43:56
|
Bruce, Thanks for your feedback. Bruce Southey wrote: > While I am not really following your thread, I just wanted to comment that the > Python Cookbook (at least the printed version) has some ways to count lines in a > file - assuming that the number of lines provides the size. The number of lines does not necessarily provide the size. In the general case, it doesn't at all. My whole goal here is the general case: being able to read a bunch of numbers out of any format of text file. This can be used as part of a parser for many file formats. If I was shooting for just one format, this would be easier, but not general purpose. Now that I have this, I can write a number of file format parsers in python with improved performance and easier syntax. Under Unix (but not > windows), I am aiming for a portable solution. > Alternatively if sufficient memory is available, storing the file in memory > (during the counting of elements) should always be faster than reading it a > second time from the hard disk. The primary reason to scan the file ahead of time to count the elements is to save the memory of duplicate copies of data. The other reason is to make memory management easier, but since I've already solved that problem, I'm done. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Thomas K. <tho...@ho...> - 2004-07-09 15:01:47
|
Hi I'm trying to compile/install numpy on a RH9 machine. When doing so I run into problems. I give the command: python setup.py install and get a long answer, with this error at the end: gcc -shared build/temp.linux-i686-2.2/lapack_litemodule.o -L/usr/lib/atlas -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-i686-2.2/lapack_lite.so /usr/bin/ld: cannot find -llapack collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 Does anyone know what I've done wrong? I've spent alot of time on this and really needs help now... Regards Thomas _________________________________________________________________ Hitta rätt på nätet med MSN Sök http://search.msn.se/ |