You can subscribe to this list here.
| 2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
| 2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
| 2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
| 2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
| 2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
| 2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
|
From: Travis O. <oli...@ie...> - 2006-02-06 04:40:24
|
Jeff Whitaker wrote: > > Hi: I've successfully used the examples at > http://www.scipy.org/Wiki/Cookbook/Pyrex_and_NumPy to access the data > in a 'normal' numpy array, but have had no success adapting these > examples to work with object arrays. I understand that the .data > attribute holds pointers to the objects which actually contain the > data in an object array, but how to you use those pointers to get the > data in C/pyrex? You have a pointer to a PyObject *object in the data. Thus, data should be recast to PyObject **. I don't know how to do that in PyRex. But, it's easy in C. In C, you will need to be concerned about reference counts. I don't know how pyrex handles this. |
|
From: Travis O. <oli...@ie...> - 2006-02-06 04:21:29
|
Jan Simons @planet.nl wrote: >Dear Travis, > >Thank you for all the work that you put into numerical Python. I believe that >it makes Python applicable to serious numerical work. > >I just attempted to install the package on my Suse 10.0 system (which does >have the (recent) python 2.4.1. > > I think the problem with the rpm binary is that I built the binary rpm versions against a debug-version of Python. Most people install from source on Linux, because this is the first time somebody has complained and I'm sure others have stumbled on this. I've been using a debug version of Python for a few months. I will probably switch back soon, which should make these issues less of a problem. Try building from source directly. Best, -travis |
|
From: Jeff W. <js...@fa...> - 2006-02-05 15:56:38
|
Hi: I've successfully used the examples at http://www.scipy.org/Wiki/Cookbook/Pyrex_and_NumPy to access the data in a 'normal' numpy array, but have had no success adapting these examples to work with object arrays. I understand that the .data attribute holds pointers to the objects which actually contain the data in an object array, but how do you use those pointers to get the data in C/pyrex? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Jeff W. <js...@fa...> - 2006-02-05 15:24:17
|
Hi: I've successfully used the examples at http://www.scipy.org/Wiki/Cookbook/Pyrex_and_NumPy to access the data in a 'normal' numpy array, but have had no success adapting these examples to work with object arrays. I understand that the .data attribute holds pointers to the objects which actually contain the data in an object array, but how to you use those pointers to get the data in C/pyrex? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Francesc A. <fa...@ca...> - 2006-02-05 12:33:00
|
El dv 03 de 02 del 2006 a les 22:05 +0100, en/na N. Volbers va escriure:
> Is there some way to retrieve the type object directly from the array (no=
t using any existing row) using only the name of the item? I have checked =
the dtype attribute, but I could only get the character representation for =
the item types (e.g. 'f4').
Ops, I've just discovered a new way to get the type in a simpler way:
In [17]:dtype =3D numpy.dtype({'names': ['name', 'weight'],'formats':
['U30', 'f4']})
In [18]:a =3D numpy.array([(u'Bill', 71.2), (u'Fred', 94.3)], dtype=3Ddtype=
)
In [20]:a.dtype.fields['name'][0].type
Out[20]:<type 'unicodescalar'>
In [21]:a.dtype.fields['weight'][0].type
Out[21]:<type 'float32scalar'>
For nested types, something like this should work:
ntype =3D a.dtype.fields['name'][0].fields['nested_field'][0].type
By the way, you will need numpy 0.9.5 (at least) for this to work.
Incidentally, Travis, what do you think about allowing:
In [30]:a.dtype.fields['weight']
Out[30]:dtype('<f4')
instead of current:
In [30]:a.dtype.fields['weight']
Out[30]:(dtype('<f4'), 120)
That way, users can access more easily to nested types:
ntype =3D a.dtype.fields['name'].fields['nested_field'].type
However, you will loose the info about the byteoffset/stride.
Alternatively, perhaps it would be interesting to give the dtype an
__getitem__ method so that we can do:
ntype =3D a.dtype['name']['nested_field'].type
In case of a dtype with no fields (i.e. a plain array), the
dtype[something] would raise a 'KeyError'. This approach would let the
current dtype.fields method untouched. I can work in a patch for this,
if you find it useful.
Cheers,
--=20
>0,0< Francesc Altet http://www.carabos.com/
V V C=E1rabos Coop. V. Enjoy Data
"-"
|
|
From: Francesc A. <fa...@ca...> - 2006-02-04 09:33:19
|
El dv 03 de 02 del 2006 a les 22:05 +0100, en/na N. Volbers va escriure:
> >>> dtype =3D numpy.dtype({'names': ['name', 'weight'],'formats': ['U30',=
'f4']})
> >>> a =3D numpy.array([(u'Bill', 71.2), (u'Fred', 94.3)], dtype=3Ddtype)
> Is there some way to retrieve the type object directly from the array (no=
t using any existing row) using only the name of the item? I have checked =
the dtype attribute, but I could only get the character representation for =
the item types (e.g. 'f4').
To retrieve the type directly from the array, you can use a function
like this:
def get_field_type_flat(descr, fname):
"""Get the type associated with a field named `fname`.
If the field name is not found, None is returned.
"""
for item in descr:
if fname =3D=3D item[0]:
return numpy.typeDict[item[1][1]]
return None
That one is very simple and fast. However, it can't deal with nested
types. The next one is more general:
def get_field_type_nested(descr, fname):
"""Get the type associated with a field named `fname`.
This funcion looks recursively in possible nested descriptions.
If the field is not found anywhere in the hierarchy, None is
returned. If there are two names that are equal in the hierarchy,
the first one (from top to bottom and from left to the right)
found is returned.
"""
for item in descr:
descr =3D item[1]
if fname =3D=3D item[0]:
return numpy.dtype(descr).type
else:
if isinstance(descr, list):
return get_field_type(descr, fname)
return None
The drawback here is that you can not select a field that is named the
same way and that lives in different levels of the hierarchy. For
example, selecting 'name' in a type structure like this:
+-----------+
|name |x |
| +-----+
| |name |
+-----+-----+
is ambiguous (in the algorithm implemented above, the top level 'name'
would be selected). Addressing this problem would imply to define a way
to univocally specify nested fields.
Anyway, I'm attaching a file with several examples on these functions.
HTH,
--=20
>0,0< Francesc Altet http://www.carabos.com/
V V C=E1rabos Coop. V. Enjoy Data
"-"
|
|
From: Tim H. <tim...@co...> - 2006-02-04 03:24:11
|
Hi I recently installed the Visual Studio .NET 2003 (AKA VC7) compiler
and I took a stab at compiling numpy. I've tried previously with the
free, toolkit version of VC7 with little success, but I was hoping this
would be a piece of cake. No joy!
It's quite possible that my compiler setup is gummed up by the previous
existence of the toolkit compiler. A bunch of paths were set to this and
that and there may be some residue that is messing things up. However I
successfully compiled numarray 1.5 and a couple of my own extensions so
things *seem* OK. So before I go hunting, I thought I'd ask and see if
there were some known issues with compiling numpy 0.9.4 with VC7.
The symptoms I'm seeing are, first, that it can't run configure. It
can't find python24.lib. An abbreviated traceback is shown at the
bottom. I kludged my way past this by replacing line 33 of
numpy/core/setup.py with the two lines:
python_lib = sysconfig.EXEC_PREFIX + '/libs'
result =
config_cmd.try_run(tc,include_dirs=[python_include],library_dirs=[python_lib])
That got me a little farther, but I quickly ran into trouble compiling
multiarray module:
C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c
/nologo /Ox /MD /W3 /GX /DNDEBU
G -Ibuild\src\numpy\core\src -Inumpy\core\include -Ibuild\src\numpy\core
-Inumpy\core\src -Inumpy\li
b\..\core\include -IC:\Python24\include -IC:\Python24\PC
/Tcnumpy\core\src\multiarraymodule.c /Fobui
ld\temp.win32-2.4\Release\numpy\core\src\multiarraymodule.obj
multiarraymodule.c
build\src\numpy\core\src\arraytypes.inc(5305) : error C2036: 'void *' :
unknown size
build\src\numpy\core\src\arraytypes.inc(5885) : error C2036: 'void *' :
unknown size
build\src\numpy\core\src\arraytypes.inc(6465) : error C2036: 'void *' :
unknown size
...a bunch of warnings...
c:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\numpy\core\src\arrayobject.c(4049)
: er
ror C2036: 'void *' : unknown size
...some more warnings...
error: Command ""C:\Program Files\Microsoft Visual Studio .NET
2003\Vc7\bin\cl.exe" /c /nologo /Ox /
MD /W3 /GX /DNDEBUG -Ibuild\src\numpy\core\src -Inumpy\core\include
-Ibuild\src\numpy\core -Inumpy\c
ore\src -Inumpy\lib\..\core\include -IC:\Python24\include
-IC:\Python24\PC /Tcnumpy\core\src\multiar
raymodule.c
/Fobuild\temp.win32-2.4\Release\numpy\core\src\multiarraymodule.obj"
failed with exit st
atus 2
Anyway, like I said, my compiler could be broken, but if there is a
known issue with VC7 or this rings a bell with anyone please let me
know. I certainly wouldn't mind a hint.
-tim
Traceback from configure failure:
-----------------------------------------------------------------------------------------------------------------------
C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c
/nologo /Ox /MD /W3 /GX /DNDEBU
G -IC:\Python24\include -Inumpy\core\src -Inumpy\lib\..\core\include
-IC:\Python24\include -IC:\Pyth
on24\PC /Tc_configtest.c /Fo_configtest.obj
C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe
/nologo /INCREMENTAL:NO _configt
est.obj /OUT:_configtest.exe
LINK : fatal error LNK1104: cannot open file 'python24.lib'
failure.
removing: _configtest.c _configtest.obj
Traceback (most recent call last):
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\setup.py", line 73, in ?
setup_package()
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\setup.py", line 66, in setup_pa
ckage
setup( **config.todict() )
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\numpy\distutils\core.py", line
93, in setup
return old_setup(**new_attr)
File "C:\Python24\lib\distutils\core.py", line 149, in setup
dist.run_commands()
File "C:\Python24\lib\distutils\dist.py", line 946, in run_commands
self.run_command(cmd)
File "C:\Python24\lib\distutils\dist.py", line 966, in run_command
cmd_obj.run()
File "C:\Python24\lib\distutils\command\build.py", line 112, in run
self.run_command(cmd_name)
File "C:\Python24\lib\distutils\cmd.py", line 333, in run_command
self.distribution.run_command(command)
File "C:\Python24\lib\distutils\dist.py", line 966, in run_command
cmd_obj.run()
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\numpy\distutils\command\build_s
rc.py", line 86, in run
self.build_sources()
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\numpy\distutils\command\build_s
rc.py", line 99, in build_sources
self.build_extension_sources(ext)
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\numpy\distutils\command\build_s
rc.py", line 143, in build_extension_sources
sources = self.generate_sources(sources, ext)
File "C:\Documents and
Settings\End-user\Desktop\numpy\numpy-0.9.4\numpy\distutils\command\build_s
rc.py", line 199, in generate_sources
source = func(extension, build_dir)
File "numpy\core\setup.py", line 35, in generate_config_h
raise "ERROR: Failed to test configuration"
ERROR: Failed to test configuration
-----------------------------------------------------------------------------------------------------------------------
|
|
From: Jeff W. <js...@fa...> - 2006-02-04 03:05:31
|
Travis Oliphant wrote: > Jeff Whitaker wrote: > >> >> Hi: >> >> I've noticed that code like this is really slow in numpy (0.9.4): >> >> import numpy as NP >> a = NP.ones(10000,'d') >> a = [2.*a1 for a1 in a] >> >> >> the last line takes 0.17 seconds on my G5, while for Numeric and >> numarray it takes only 0.01. Anyone know the reason for this? >> > We could actually change this right now, before the introduction of > scalar math by using the standard float table for the corresponding > array scalars. The only reason I didn't do this initially was that I > wanted consistency in behavior for "division-by-zero" between arrays > and scalars. > Using the Python float math you will get divide-by-zero errors whereas > you don't (unless you ask for them), with numpy arrays. > > Thus, current scalars are treated as 0-d arrays in the internals and > go through the entire ufunc machinery for every operation. > Now, the real question is why are you doing this? Using arrays in > this way defeats their purpose :-) > > What is wrong with 2*a? Now, of course there will be situations that > require this. > > -Travis > Travis: Of course I know this is a dumb thing to do - but sometimes it does happen that a function that expects a list actually gets a rank-1 array. The workaround in that case is to just pass it a.tolist() instead of a. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Sasha <nd...@ma...> - 2006-02-03 23:10:43
|
This is so because scalar math is very slow in numpy. This will improve with the introduction of the scalarmath module. > python -m timeit -s "from numpy import float_; x =3D float_(2)" "2.*x" 100000 loops, best of 3: 15.8 usec per loop > python -m timeit -s "x =3D 2." "2.*x" 1000000 loops, best of 3: 0.261 usec per loop On 2/3/06, Jeff Whitaker <js...@fa...> wrote: > > Hi: > > I've noticed that code like this is really slow in numpy (0.9.4): > > import numpy as NP > a =3D NP.ones(10000,'d') > a =3D [2.*a1 for a1 in a] > > > the last line takes 0.17 seconds on my G5, while for Numeric and > numarray it takes only 0.01. Anyone know the reason for this? > > -Jeff > > -- > Jeffrey S. Whitaker Phone : (303)497-6313 > Meteorologist FAX : (303)497-6449 > NOAA/OAR/PSD R/PSD1 Email : Jef...@no... > 325 Broadway Office : Skaggs Research Cntr 1D-124 > Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D103432&bid=3D230486&dat= =3D121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
|
From: Sasha <nd...@ma...> - 2006-02-03 23:02:52
|
On 2/3/06, Travis Oliphant <oli...@ee...> wrote: > I'm very concerned about the speed of PyArray_NewFromDescr. So, I > don't really want to make changes that will cause it to be slower for > all cases unless absolutely essential. > It is easy to change the code so that it only affects the branch in PyArray_NewFromDescr that currently raises an exception -- providing both strides but no buffer. There is no need to call _array_buffer_size if data is provided. > Could you give more examples of how you will be using these zero-stride > arrays? What problem are they actually solving? > Currently when I need to represent a statistic that is constant across population, I use scalars. In many cases this works because thanks to broadcasting rules a scalar behaves almost like a vector with equal elements. With the changes introduced in numpy, generic code that works on both scalars and vectors is becoming increasingly easier to write, but there are some cases where scalars cannot replace a vector with equal elements. For example, if you want to combine data for two populations and the data comes as two scalars, you need to somehow know the size of each population to add to the size of the result. A zero-stride array would solve this problem: it takes little memory, but unlike scalar knows its size. Another use that I was contemplating was to represent per-row or per-column mask in ma. It is often the case that in a rectangular matrix data may be missing only for an entire row. It is tempting to use rank-1 mask with an element for each row to represent this case. =20 That will work fine, but if you would not be able to use vectors to specify either per-row or per-column mask. With zero-stride array, you can use strides=3D(1,0) or strides=3D(0,1) and have the same memory use as with a vector. |
|
From: Jeff W. <js...@fa...> - 2006-02-03 22:34:18
|
Hi: I've noticed that code like this is really slow in numpy (0.9.4): import numpy as NP a = NP.ones(10000,'d') a = [2.*a1 for a1 in a] the last line takes 0.17 seconds on my G5, while for Numeric and numarray it takes only 0.01. Anyone know the reason for this? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Sasha <nd...@ma...> - 2006-02-03 22:04:11
|
On 2/2/06, Travis Oliphant <oli...@ie...> wrote:
> ...
> Here's the issue. With records it is quite easy to generate strides
> that are not integer multiples of the data. For example, a record
> [('field1', 'f8),('field2', 'i2')] data-type would have floating point
> data separated by 10 bytes. When you get a view of field1 (but getting
> that attribute) you would get such a "misaligned" data.
>
> Look at the following:
>
> temp =3D array([(1.8,2),(1.7,3)],dtype=3D'f8,i2')
> temp['f1'].strides
> (10,)
>
> How would you represent that in the element-based strides report?
You are right. I cannot think of anything better than just byte-based
strides in this case. Maybe we could add a restriction
abs(strides[i]) >=3D itemsize? This will probably catch some of the more
common mistakes that are due to using number of elements instead
of number of bytes.
|
|
From: Alexander B. <ale...@gm...> - 2006-02-03 22:03:14
|
On 2/2/06, Travis Oliphant <oli...@ie...> wrote:
> ...
> Here's the issue. With records it is quite easy to generate strides
> that are not integer multiples of the data. For example, a record
> [('field1', 'f8),('field2', 'i2')] data-type would have floating point
> data separated by 10 bytes. When you get a view of field1 (but getting
> that attribute) you would get such a "misaligned" data.
>
> Look at the following:
>
> temp =3D array([(1.8,2),(1.7,3)],dtype=3D'f8,i2')
> temp['f1'].strides
> (10,)
>
> How would you represent that in the element-based strides report?
You are right. I cannot think of anything better than just byte-based
strides in this case. Maybe we could add a restriction
abs(strides[i]) >=3D itemsize? This will probably catch some of the more
common mistakes that are due to using number of elements instead
of number of bytes.
|
|
From: Travis O. <oli...@ee...> - 2006-02-03 21:53:40
|
Sasha wrote: >Attached patch allows numpy create memory-saving zero-stride arrays. > > > A good first cut. I'm very concerned about the speed of PyArray_NewFromDescr. So, I don't really want to make changes that will cause it to be slower for all cases unless absolutely essential. Could you give more examples of how you will be using these zero-stride arrays? What problem are they actually solving? I would also like to get more opinions about Sasha's proposal for zero-stride arrays. -Travis |
|
From: Sasha <nd...@ma...> - 2006-02-03 21:42:52
|
On 2/2/06, Travis Oliphant <oli...@ie...> wrote: > Sasha wrote: > > >Sure. I've started working on a "proof of concept" patch and will post = it soon. > > > Great. Attached patch allows numpy create memory-saving zero-stride arrays. Here is a sample session: >>> from numpy import * >>> x =3D ndarray([5], strides=3D0) >>> x array([12998768, 12998768, 12998768, 12998768, 12998768]) >>> x[0] =3D 0 >>> x array([0, 0, 0, 0, 0]) >>> x.strides =3D 4 Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: strides is not compatible with available memory >>> x.strides (0,) >>> x.data Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: cannot get single-segment buffer for discontiguous array >>> exp(x) array([ 1., 1., 1., 1., 1.]) # Only single-element buffer is required for zero-stride array: >>> y =3D ones(1) >>> z =3D ndarray([10], strides=3D0, buffer=3Dy) >>> z array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) I probably missed some places where buffer size is computed as a product of dimensions, but it should not be hard to review the code for those if we agree that having zero-stride arrays is a good idea. Note that I did not attempt to change any behaviors, the only change is that zero-stride arrays do not use more memory than they need. |
|
From: N. V. <mit...@we...> - 2006-02-03 21:05:43
|
Hello everyone!
I think I have finally understood the 'void array-scalar object', but now I need some help me with the following. Assume I have an array, e.g.
>>> dtype = numpy.dtype({'names': ['name', 'weight'],'formats': ['U30', 'f4']})
>>> a = numpy.array([(u'Bill', 71.2), (u'Fred', 94.3)], dtype=dtype)
and this array is displayed in a graphical list. When the user modifies a value in the GUI, the value, which is a string, needs to be converted to the appropriate type, which in this example might either be a unicode string for the 'name' _or_ a float for the 'weight'.
If the row already exists, I can get the type object easily:
>>> my_type = type(ds.array['weight'][0])
<type 'float32_arrtype'>
and using this type object, I can convert the string
>>> value = my_type(user_value)
Is there some way to retrieve the type object directly from the array (not using any existing row) using only the name of the item? I have checked the dtype attribute, but I could only get the character representation for the item types (e.g. 'f4').
Any help would be appreciated,
Niklas Volbers.
|
|
From: Matthew B. <mat...@gm...> - 2006-02-03 18:35:39
|
Hi, This is just to flag up a problem I ran into for matlab, which is that Pentium 3s and 4s have very very slow standard math performance with NaN values - for example adding to an NaN value on my machine is about 22 times slower than adding to a non-NaN value. This can become a very big problem with matrix multiplication if there are a significant number of NaNs. I explained the problem here, for matlab and the software I have been working with: http://www.mrc-cbu.cam.ac.uk/Imaging/Common/spm_intel_tune.shtml To illustrate, I've attached a timing script, running on current svn numpy linked with a standard P4 optimized ATLAS library. It (dot) multiples a 200x200 array of ones by a) another 200x200 array of ones and b) a 200x200 array of NaNs: ones * ones: 0.017460 ones * NaNs: 2.323742 proportion: 133.090452 Happily, for the Pentium 4, you can solve the problem by forcing the chip to do floating point math with the SSE instructions, which do not have this NaN penalty. So, the solution was only to recompile the ATLAS libraries with extra gcc flags forcing the use of SSE math (see the page above) - or use the Intel Math Kernel libraries, which appear to have already used this trick. Here's output from numpy linked to the recompiled ATLAS libraries: ones * ones: 0.026638 ones * NaNs: 0.023987 proportion: 0.900473 I wonder if it would be worth considering distributing the recompiled libraries by default in any binary releases? Or include a test like this one in the benchmarks to warn users about this problem? Best, Matthew |
|
From: Sasha <nd...@ma...> - 2006-02-03 18:09:10
|
On Feb 2, 2006, at 10:02 PM, Travis Oliphant wrote: >> >> Please let me know if you plan to change PyArray_CheckStrides so that >> we don't duplicate effort. >> >> > I won't do anything with it in the near future. > Attached patch deals with negative strides and prohibits zero strides. I think we can agree that this is the right behavior while zero-stride semantics are being discussed. Since I am touching C- API, I would like you to take a look before I commit. Also I am not sure "self->data - new->data" is always the right was to compute offset in array_strides_set . -- sasha |
|
From: Francesc A. <fa...@ca...> - 2006-02-03 16:42:50
|
A Divendres 03 Febrer 2006 15:07, N. Volbers va escriure: > I tried to install the beta and discovered that it is not possible to > build w/o numarray. So is numpy just optional and numarray a requirement > or will it be possible to build pytables only with numpy support ? No, numarray is still a *requeriment* for compiling PyTables; NumPy and Numeric are *not needed* at all for compilation. However, if they are present (I mean, at run-time, not at compile-time), they can be used both to provide input data to be written to disk and to get output data read from disk. You can even have different objects with different flavors (currently "numarray", "numpy", "numeric" or "python") in the same PyTables file, so that you can retrieve different objects (numarray, Numpy, Numeric or pure Python) in the same session depending on its flavor (but of course, this is not for the faint-hearted ;-). It is the magic of array interface: http://numeric.scipy.org/array_interface.html that allows doing this in a very efficient manner. Cheers, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Travis O. <oli...@ie...> - 2006-02-03 03:02:22
|
Sasha wrote:
>On 2/2/06, Travis Oliphant <oli...@ie...> wrote:
>
>
>Please let me know if you plan to change PyArray_CheckStrides so that
>we don't duplicate effort.
>
>
I won't do anything with it in the near future.
>Can you suggest a use-case? I cannot think of anything that cannot be
>handled using a record-array view of the buffer.
>
>
Here's the issue. With records it is quite easy to generate strides
that are not integer multiples of the data. For example, a record
[('field1', 'f8),('field2', 'i2')] data-type would have floating point
data separated by 10 bytes. When you get a view of field1 (but getting
that attribute) you would get such a "misaligned" data.
Look at the following:
temp = array([(1.8,2),(1.7,3)],dtype='f8,i2')
temp['f1'].strides
(10,)
How would you represent that in the element-based strides report?
So, fractional strides are actually fundamental to the ability to have
record arrays.
>The problem is that many people (including myself) think that they
>know what strides are when they come to numpy because they used
>strides in other libraries (e.g. BLAS).
>
>
>Most people expect element-based strides. A footnote in your book
>"Our definition of stride here is an element-based stride, while the
>strides attribute returns a byte-based stride." also suggests that
>element-based strides are more natural.
>
>
It's easier to explain striding when you have contiguous chunks of
memory of the same data-type, but record-arrays change that and require
byte-based striding.
>Assuming strides attribute is not used except for testing, would you
>object to renaming current byte-based strides to "byte_strides" and
>implementing element-based "strides"?
>
>
I wouldn't have a problem with that, necessarily (though there is
already an __array_strides__ attribute that is byte-based for the array
interface --- except it returns None for C-style contiguous so we really
don't need another attribute). The remaining issue is how will
fractional strides be represented?
-Travis
|
|
From: Travis O. <oli...@ie...> - 2006-02-03 02:50:58
|
Sasha wrote: >Sure. I've started working on a "proof of concept" patch and will post it soon. > > > Great. >>I'm concerned that your proposal has too many potential pitfalls. At >>least you haven't addressed them sufficiently. My current inclination >>is to simply disallow setting the strides attribute now that the >>misaligned segments of code have been tested. >> >> > >That would be an unfortunate result of my post :-( I would suggest >just to disallow zero strides in PyArray_CheckStrides until I can >convince you that they are not that dangerous. > > Inclinations to... and actual plans to... are quite different things :-) So, I'm waiting and seeing. You may be on to something. Let's see what others think and what you really have in mind. -Travis |
|
From: Sasha <nd...@ma...> - 2006-02-03 02:16:25
|
On 2/2/06, Travis Oliphant <oli...@ie...> wrote: > The changes you describe, however, require serious thought with C-level > explanations because you will be changing some fundamental assumptions > that are made throughout the code. > I agree, but I would like to discuss this at the conceptual level first and maybe hear from people not intimately familiar with the C code about what they would expect from a zero stride. > For example, currently there is no way you can construct new memory for > an array and have different strides assigned (that's why strides is > ignored if no buffer is given). You would have to change the behavior > of the C-level function PyArray_NewFromDescr. You need to propose how > exactly you would change that. > Sure. I've started working on a "proof of concept" patch and will post it = soon. > Checking for strides that won't cause later segfaults can be tricky > especially if you start allowing buffer-sizes to be different than array > dimensions. How do you propose to ensure that you won't walk outside > of allocated memory when somebody changes the strides later? > I think PyArray_CheckStrides would catch that, but I will have to test that once I have some code ready. > I'm concerned that your proposal has too many potential pitfalls. At > least you haven't addressed them sufficiently. My current inclination > is to simply disallow setting the strides attribute now that the > misaligned segments of code have been tested. That would be an unfortunate result of my post :-( I would suggest just to disallow zero strides in PyArray_CheckStrides until I can convince you that they are not that dangerous. |
|
From: Sasha <nd...@ma...> - 2006-02-03 01:59:11
|
On 2/2/06, Travis Oliphant <oli...@ie...> wrote: > Sasha wrote: > > > Of course strides have always been there, they've just never been > visible from Python. > I know that strides were always part of C-API, but I don't know if they were exposed to python in numarray. If they were, there is probably some history of use. Can someone confirm or deny that? > Allowing the user to set the strides may not be a good idea. It was > done largely so that the code that deals with misaligned data could be > tested. Presently settable strides attribute does not feel like an "experts only" feature. (You've documented it in your book!) > However, it also allows you a lot of flexibility for > interacting with arbitrary data-buffers that might be useful, so I'm > inclined to allow it if the possible problems can be fixed. > This is a great feature and I can see it being used to explain ndarrays to novices. I don't think it should be regarded as "for experts only." > > > >Looks like a bug. PyArray_CheckStrides only checks for one end of the > >buffer. > > > Right. PyArray_CheckStrides needs to be better or we can't allow > negative strides. > Please let me know if you plan to change PyArray_CheckStrides so that we don't duplicate effort. > >3. "Fractional" strides: > >I call "fractional" strides that are not a multiple of "itemsize". > In dealing with an arbitrary data-buffer, I could see this as being > useful, so I'm not sure if disallowing it is a good idea. Can you suggest a use-case? I cannot think of anything that cannot be handled using a record-array view of the buffer. > Again, > setting strides is not something that should be done by the average user > so I'm not as concerned about "forgetting" the units strides are in. > If a user is going to be setting strides you have to assume they are > being careful. > The problem is that many people (including myself) think that they know what strides are when they come to numpy because they used strides in other libraries (e.g. BLAS). Most people expect element-based strides. A footnote in your book "Our definition of stride here is an element-based stride, while the strides attribute returns a byte-based stride." also suggests that element-based strides are more natural. > A separate attribute called steps that uses element-sizes instead of > byte-sizes is a possible idea. Assuming strides attribute is not used except for testing, would you object to renaming current byte-based strides to "byte_strides" and implementing element-based "strides"? I would even suggest "_byte_strides" as a clearly "don't use it unless you know what you are doing" name. |
|
From: Travis O. <oli...@ie...> - 2006-02-03 01:38:47
|
Sasha wrote: >A rank-1 array with strides=0 behaves almost like a scalar, in fact >scalar arithmetics is currently implemented by setting stride to 0 is >generic umath loops. Like scalar, rank-1 array with stride=0 only >needs a buffer of size 1*itemsize, but currently numpy does not allow >creation of rank-1 arrays with buffer smaller than size*itemsize: > > As you noted, broadcasting is actually done by setting strides equal to 0 in the affected dimensions. The changes you describe, however, require serious thought with C-level explanations because you will be changing some fundamental assumptions that are made throughout the code. For example, currently there is no way you can construct new memory for an array and have different strides assigned (that's why strides is ignored if no buffer is given). You would have to change the behavior of the C-level function PyArray_NewFromDescr. You need to propose how exactly you would change that. Checking for strides that won't cause later segfaults can be tricky especially if you start allowing buffer-sizes to be different than array dimensions. How do you propose to ensure that you won't walk outside of allocated memory when somebody changes the strides later? I'm concerned that your proposal has too many potential pitfalls. At least you haven't addressed them sufficiently. My current inclination is to simply disallow setting the strides attribute now that the misaligned segments of code have been tested. -Travis |
|
From: Sasha <nd...@ma...> - 2006-02-03 01:02:21
|
As I explained in my previous post, numpy allows zeros in the "strides" tuple, but the arrays with such strides have unexpected properties. In this post I will try to explain why arrays with zeros in strides are desireable and what properties they should have. A rank-1 array with strides=3D0 behaves almost like a scalar, in fact scalar arithmetics is currently implemented by setting stride to 0 is generic umath loops. Like scalar, rank-1 array with stride=3D0 only needs a buffer of size 1*itemsize, but currently numpy does not allow creation of rank-1 arrays with buffer smaller than size*itemsize: >>> ndarray([5], strides=3D[0], buffer=3Darray([1])) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: buffer is too small for requested array An array with 0 stride is a better alternative to x + zeros(n) than a scalar or rank-0 x because an array with zero stride knows its size. (With the current umath implementation, adding two arrays with stride=3D0, would still require n operations, but this would probably not be the case if BLAS is used instead of a generic loop). I propose to make a few changes to the way zeros in strides are handled. This looks like undocumented territory, so I don't think there are any compatibility issues. 1. Change the buffer size requirements so that dimentions with zero stride count as size=3D1. 2. Use strides provided to the ndarray even when buffer is not provided. Currently they are silently ignored: >>> ndarray([5], strides=3D[0]).strides (4,) 3. Fix augmented assignment operators. Currently: >>> x =3D zeros(5) >>> x.strides=3D0 >>> x +=3D 1 >>> x array([5, 5, 5, 5, 5]) >>> x +=3D arange(5) >>> x array([15, 15, 15, 15, 15]) Desired: >>> x =3D zeros(5) >>> x.strides=3D0 >>> x +=3D 1 >>> x array([1, 1, 1, 1, 1]) >>> x +=3D arange(5) >>> x array([1, 2, 3, 4, 5]) This will probably require proper handling of stride=3D0 case in the output arguments of ufuncs in general, so this may be harder to get right than the first two proposals. 4. Introduce xzeros and xones functions that will create stride=3D0 arrays as a super-fast alternative to zeros and ones. |