You can subscribe to this list here.
| 2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
| 2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
| 2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
| 2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
| 2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
| 2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
|
From: Arnd B. <arn...@we...> - 2006-02-08 17:23:09
|
On Tue, 7 Feb 2006, Arnd Baecker wrote:
> On Tue, 7 Feb 2006, Pearu Peterson wrote:
[...]
> > >> So, a recommended fix would be to build Python with icc and as a
> > >> result correct libraries will be used for building 3rd party extension
> > >> modules.
OK, I went for this.
With numpy.__version__ '0.9.5.2069'
I get for numpy.test(10)
======================================================================
FAIL: check_basic
(numpy.lib.function_base.test_function_base.test_cumprod)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/lib/tests/test_function_base.py",
line 169, in check_basic
1320, 6600, 26400],ctype))
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/testing/utils.py",
line 156, in assert_array_equal
assert cond,\
AssertionError:
Arrays are not equal (mismatch 57.1428571429%):
Array 1: [ 1. 2. 20. 0. 0. 0. 0.]
Array 2: [ 1.0000000000000000e+00 2.0000000000000000e+00
2.0000000000000000e+01
2.2000000000000000e+02 1.32000000000000...
======================================================================
FAIL: check_basic (numpy.lib.function_base.test_function_base.test_cumsum)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/lib/tests/test_function_base.py",
line 128, in check_basic
assert_array_equal(cumsum(a), array([1,3,13,24,30,35,39],ctype))
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/testing/utils.py",
line 156, in assert_array_equal
assert cond,\
AssertionError:
Arrays are not equal (mismatch 57.1428571429%):
Array 1: [ 1. 3. 13. 11. 17. 5. 9.]
Array 2: [ 1. 3. 13. 24. 30. 35. 39.]
======================================================================
FAIL: check_simple
(numpy.lib.function_base.test_function_base.test_unwrap)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/lib/tests/test_function_base.py",
line 273, in check_simple
assert(all(diff(unwrap(rand(10)*100))<pi))
AssertionError
======================================================================
FAIL: check_nd (numpy.lib.index_tricks.test_index_tricks.test_grid)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/lib/tests/test_index_tricks.py",
line 30, in check_nd
assert_array_almost_equal(c[1][:,-1],2*ones(10,'d'),11)
File
"/work/home/baecker/INSTALL_PYTHON_again_with_icc/Inst/lib/python2.4/site-packages/numpy/testing/utils.py",
line 183, in assert_array_almost_equal
assert cond,\
AssertionError:
Arrays are not almost equal (mismatch 90.0%):
Array 1: [ 0.666666666667 1.111111111111 1.555555555556 2.
2.444444444444
2.888888888889 3.333333333333 -1.037...
Array 2: [ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
Concnerning warnings/remarks:
cat build_log_numpy.txt | grep warning | wc
1420 12817 133601
cat build_log_numpy.txt | grep remark | wc
3371 32184 365467
Best, Arnd
|
|
From: Gerard V. <ger...@gr...> - 2006-02-08 15:21:21
|
On Wed, 8 Feb 2006 16:10:52 +0200 Stefan van der Walt <st...@su...> wrote: > This is probably a silly question, but what is the best way of > creating column vectors? 'arange' always returns a row vector, on > which you cannot perform 'transpose' since it has only one dimension. > > mat(arange(1,10)).transpose() > > works, but seems a bit long-winded (in comparison to MATLAB's [1:10]'). > > I'd appreciate pointers in the right direction. > What about this? arange(1, 10)[:, NewAxis] Gerard |
|
From: Piotr L. <lus...@cs...> - 2006-02-08 15:02:40
|
On Wednesday 08 February 2006 09:33, Sven Schreiber wrote: > Stefan van der Walt schrieb: > > This is probably a silly question, but what is the best way of > > creating column vectors? 'arange' always returns a row vector, on > > which you cannot perform 'transpose' since it has only one > > dimension. > > > > mat(arange(1,10)).transpose() > > mat(range(1,10)).T is a bit shorter, but I would agree that doing > matrix algebra in numpy is not as natural as with explicitly > matrix-oriented languages; my understanding is that this is due to > numpy's broader (n-dimensional) scope. > > Numpy-masters: Is there a way to set a user- or project-specific > config switch or something like that to always get matrix results > when dealing with 1d and 2d arrays? I think that would make numpy > much more attractive for people like Stefan and me coming from the 2d > world. I'm not a master by far but I heard that question before. Isn't the mlab module just for that purpose? I was explained that the problem with a "switch" is that the same code will behave differently depending on which installation you run. If you run on my n-D installation it will do one thing and if you run it on your 2-D installation (with the 2D world "switch" enabled) you get subtly different result. It might become a bug hunting nighmare. I think this is when Python's explicit vs. implicit rule kicks in: python -c 'import this' Piotr |
|
From: Sven S. <sve...@gm...> - 2006-02-08 14:34:18
|
Stefan van der Walt schrieb: > This is probably a silly question, but what is the best way of > creating column vectors? 'arange' always returns a row vector, on > which you cannot perform 'transpose' since it has only one dimension. > > mat(arange(1,10)).transpose() > mat(range(1,10)).T is a bit shorter, but I would agree that doing matrix algebra in numpy is not as natural as with explicitly matrix-oriented languages; my understanding is that this is due to numpy's broader (n-dimensional) scope. Numpy-masters: Is there a way to set a user- or project-specific config switch or something like that to always get matrix results when dealing with 1d and 2d arrays? I think that would make numpy much more attractive for people like Stefan and me coming from the 2d world. cheers, Sven |
|
From: Stefan v. d. W. <st...@su...> - 2006-02-08 14:08:29
|
This is probably a silly question, but what is the best way of creating column vectors? 'arange' always returns a row vector, on which you cannot perform 'transpose' since it has only one dimension. mat(arange(1,10)).transpose() works, but seems a bit long-winded (in comparison to MATLAB's [1:10]'). I'd appreciate pointers in the right direction. Regards St=E9fan |
|
From: Andrew J. <a.h...@gm...> - 2006-02-08 12:35:37
|
Hi All, [originally posted in a slightly off-topic thread, so I thought I'd try here -- sorry for the duplication!] What is the status of gcc 4.0 support (on Mac OS X at least)? It's a bit of a pain to have to switch between the two (are there any other disadvantages?). As of my last attempt, numpy.test() fails due to some machar issues if I recall correctly. Andrew |
|
From: Francesc A. <fa...@ca...> - 2006-02-08 10:09:29
|
A Dimecres 08 Febrer 2006 09:41, Travis Oliphant va escriure:
> Hmm. I think I'm beginning to like your idea. We could in fact make
Good :-)
> the NumPy Unicode type always UCS4 and then keep the Python Unicode
> scalar. On Python UCS2 builds the conversion would use UTF-16 to go to
> the Python scalar (which would always inherit from the native unicode
> type).
Yes, exactly.
> But, all in all, it sounds like a good plan. If the time comes that
> somebody wants to add a reduced-size USC2 array of unicode characters
> then we can cross that bridge if and when it comes up.
Well, provided the recommendations about migrating to 32-bit unicode
objects, I'd say that this would be a strange desire. If the problem
is memory consumption, the users can always choose regular 8-bit
strings (of course, without supporting completely general unicode
characters).
> I still like using explicit typecode characters in the array interface
> to denote UCS2 or the UCS4 data-type. We could still change from 'W',
> 'w' to other characters...
But, why do you want to do this? If data type for unicode in arrays is
always UCS4 and in scalars is always determined by the python build,
then why do we want to try to distinguish them with specific type
codes? At C level there should be straightforward ways to determine
whether a scalar is UCS2 or UCS4 (just looking at the native python
type), and at python level there is not an evident way to distinguish
(correct me if I'm wrong here) between an UCS2 and UCS4 unicode
string, and in fact, the user will not notice the difference in
general (but see later).
Besides, having an 'U' as indicator for unicode is compatible in the
way Python has to express 32-bit unicode chars (i.e. \Uxxxxxxxx). So I
find that keeping 'U' for specifying unicode types would be more than
enough and that introducing 'w' and 'W' (or whathever) will only
introduce unnecessary burden, IMO. Moreover, if a user tries to know
the type using the .dtype descriptor, he will find that the type
continues to be 'U' irregardingly of the build he is using. Something
like:
# We are in a UCS2 interpreter
In [30]: numpy.array([1],dtype=3D"U2")[0].dtype
Out[30]: dtype('<U4')
In [31]: numpy.array([1],dtype=3D"U2")[0].dtype.char
Out[31]: 'U'
Of course, he would be able to notice that their unicode scalars as
smaller than unicode in arrays, but only if he looks at the type
descriptor and notice the extend of the type is shorter than expected
(4 instead of 8), but apart from that, nothing else will be different.
BTW, it would be nice if, in order to penalize people as less as
possible, we can ask the python developers to make UCS4 the default
build, just to avoid conversions between UCS4<-->UCS2. I'm still
wondering why this is not the default... :-/
Cheers,
=2D-=20
>0,0< Francesc Altet =A0 =A0 http://www.carabos.com/
V V C=E1rabos Coop. V. =A0=A0Enjoy Data
"-"
|
|
From: Gerard V. <ger...@gr...> - 2006-02-08 09:29:12
|
On Wed, 08 Feb 2006 01:41:18 -0700 Travis Oliphant <oli...@ie...> wrote: > >Well, probably I've overlooked something, but I really think that this > >would be a nice thing to do. > > > > > There are details in the scalar-array conversions (getitem and setitem > that would have to be implemented but it is possible. The UCS4 --> > UTF-16 encoding is one of the easiest. It's done in unicodeobject.h in > Python, but I'm not sure it's exposed other than going through the > interpreter. > > Does this seem like a solution that everyone can live with? > Yes. The only point that worries me a little bit that some problems are limited by memory or memory bandwidth and for those cases UCS2 arrays are better than UCS4 arrays. I have run into memory problems before and I don't know if it will happen for unicode strings. Time will tell. Gerard |
|
From: Travis O. <oli...@ie...> - 2006-02-08 08:41:26
|
Francesc Altet wrote: >Ok. I see that you got my point. Well, maybe I'm wrong here, but my >proposal would result in implementing just one new data-type for 32-bit >unicode when the python platform is UCS2 aware. If, as you said above, >Py_UCS4 type is always defined, even on UCS2 interpreters, that should >be relatively easy to do. > Hmm. I think I'm beginning to like your idea. We could in fact make the NumPy Unicode type always UCS4 and then keep the Python Unicode scalar. On Python UCS2 builds the conversion would use UTF-16 to go to the Python scalar (which would always inherit from the native unicode type). It would be one data-type where there was not an identical match in the memory layout of the scalar and the array data-type, but because in this case there are conversions to go back and forth, it may not matter. This would not be too difficult to implement, actually --- it would require new functions to handle conversions in arraytypes.inc.src and some modifications to PyArray_Scalar. The only draw-back is that now all unicode arrays are twice as large and the aforementioned asymmetry between the data-type and the array-scalar on Python UCS2 builds. But, all in all, it sounds like a good plan. If the time comes that somebody wants to add a reduced-size USC2 array of unicode characters then we can cross that bridge if and when it comes up. I still like using explicit typecode characters in the array interface to denote UCS2 or the UCS4 data-type. We could still change from 'W', 'w' to other characters... >Well, probably I've overlooked something, but I really think that this >would be a nice thing to do. > > There are details in the scalar-array conversions (getitem and setitem that would have to be implemented but it is possible. The UCS4 --> UTF-16 encoding is one of the easiest. It's done in unicodeobject.h in Python, but I'm not sure it's exposed other than going through the interpreter. Does this seem like a solution that everyone can live with? -Travis |
|
From: Francesc A. <fa...@ca...> - 2006-02-08 08:08:33
|
El dt 07 de 02 del 2006 a les 13:35 -0700, en/na Travis Oliphant va escriure: > Sure it could be implemented. It's just a matter of effort. Python=20 > itself always defines a Py_UCS4 type even on UCS2 builds. We would just=20 > have to make sure Py_UCS2 is always defined as well.=20 Be careful with this because you can run into problems. For example, trying to import numpy compiled with a UCS4 python from a UCS2 one, gives me the following: $ python Python 2.4.2 (#1, Feb 8 2006, 08:16:44) [GCC 4.0.3 20060115 (prerelease) (Debian 4.0.2-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy import core -> failed: /usr/lib/python2.4/site-packages/numpy/core/multiarray.so: undefined symbol: _PyUnicodeUCS4_IsWhitespace import random -> failed: 'module' object has no attribute 'dtype' import lib -> failed: /usr/lib/python2.4/site-packages/numpy/core/multiarray.so: undefined symbol: _PyUnicodeUCS4_IsWhitespace Although I guess that this would be not a problem when using a numpy compiled with a proper interpreter. Just wanted to point out this. > The biggest hassle is implementing the corresponding scalar type. The=20 > one corresponding to the build for Python comes free. The other would=20 > have to be implemented directly. Yeah, it seems like we should end implementing a new Unicode type entirely in NumPy in a way or other. > I've seen data-bases handle this by warning the user to make sure the=20 > size of their data area is large enough to handle their longest use=20 > case. You can still used fixed-sizes you just have to make sure they=20 > are large enough (or risk truncation). Ok. I can admit that data can be truncated (you may end with a corrupted Unicode string, but this is the responsability of the user :-(). However, another thing that I feel unconfortable with is the additional encoding/decoding steps that potentially introduces UCS2 for doing I/O. Well, perhaps this is faster than I suppose and that I/O speed will not be too affected, but still... > >Well, I don't understand well here. I thought that you were proposing a > >32-bit unicode type for NumPy and then converting it appropriately to > >UCS2 (conversion to UCS4 wouldn't be necessary as it would be the same > >as the native NumPy unicode type) just in case that the user requires an > >scalar out of the NumPy object. But you are talking here about defining > >separate UCS4 and UCS2 data-types. I admit that I'm loosed here... > > > > =20 > > > I suppose that is another approach: we could internally have all=20 > UNICODE data-types use 4-bytes and do the conversions necessary. But,=20 > it would still require us to do most of work of supporting two=20 > data-types. Currently, the unicode scalar object is a simple=20 > inheritance from Python's UNICODE data-type. That would have to change=20 > and the work to do that is most of the work to support two different=20 > data-types. So, if we are going to go through that effort. I would=20 > rather see the result be two different Unicode data-types supported.=20 Ok. I see that you got my point. Well, maybe I'm wrong here, but my proposal would result in implementing just one new data-type for 32-bit unicode when the python platform is UCS2 aware. If, as you said above, Py_UCS4 type is always defined, even on UCS2 interpreters, that should be relatively easy to do. So, you we can make all the NumPy unicode *arrays* based on this new type. The NumPy unicode *scalars* will inherit directly from the native Py_UCS2 type for this interpreter. Then, we just have to implement the necessary conversions between UCS4<-->UCS2 to comunicate data from NumPy array into/from scalar type. The only drawback that I see in this approach is that you will end having UCS4 types in numpy ndarrays and UCS2 types when getting scalars from them (however, the user will hardly notice this, IMO). The advantage would be that NumPy arrays will always be UCS4 irregardingly of the platform they are, making the access to their data from C much easier and portable (and yes, efficient!). Of course, if you are using a UCS4 platform, then you can choose the same native Py_UCS4 type for NumPy arrays and scalars and you are done. Well, probably I've overlooked something, but I really think that this would be a nice thing to do. Regards, --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=E1rabos Coop. V. Enjoy Data "-" |
|
From: Travis O. <oli...@ie...> - 2006-02-08 00:51:53
|
Tim Hochberg wrote: > > OK, I finally got it to pass all of the tests. The final two pieces of > the puzzle were using _isnan and _finite and then realizing the > _finite was not in fact the opposite of isinf. Thanks for finding this. I've updated the ufuncobject.h file with definitions for isinf, isfinite, and isnan. Presumably this should allow the SVN version of numpy to build. Let me know what happens. -Travis |
|
From: Travis O. <oli...@ie...> - 2006-02-07 20:36:10
|
Francesc Altet wrote: >El dt 07 de 02 del 2006 a les 12:26 -0700, en/na Travis Oliphant va >escriure: > > >>Python itself hands us this difference. Is it really so different then >>the fact that python integers are either 32-bit or 64-bit depending on >>the platform. >> >>Perhaps what this is telling us, is that we do indeed need another >>data-type for 4-byte unicode. It's how we solve the problem of 32-bit >>or 64-bit integers (we have a 64-bit integer on all platforms). >> >> > >Agreed. > > > >>Then in NumPy we can support going back and forth between UCS-2 (which >>we can then say is UTF-16) and UCS-4. >> >> > >If this could be implemented, then excellent! > > Sure it could be implemented. It's just a matter of effort. Python itself always defines a Py_UCS4 type even on UCS2 builds. We would just have to make sure Py_UCS2 is always defined as well. The biggest hassle is implementing the corresponding scalar type. The one corresponding to the build for Python comes free. The other would have to be implemented directly. >The problem with unicode encodings is that most (I'm thinking in UTF-8 >and UTF-16) choose (correct me if I'm wrong here) a technique of >surrogating pairs when trying to encode values that doesn't fit in a >single word (7 bits for UTF-8 and 15 bits for UTF-16), which brings to a >*variable* length of the coded output. And this is precisely the point: >PyTables (as NumPy itself, or any other piece of software with >efficiency in mind) would require a *fixed* space for keeping data, not >a space that can be bigger or smaller depending on the number of >surrogate pairs that should be used to encode a certain unicode string. > > You are correct that encoding introduces a variable byte-length per character (up to 6 for UTF-8 and up to 2 for UTF-16 I think). I've seen data-bases handle this by warning the user to make sure the size of their data area is large enough to handle their longest use case. You can still used fixed-sizes you just have to make sure they are large enough (or risk truncation). >But, if what you are saying is that NumPy would adopt a 32-bit unicode >type internally and then do the appropriate conversion to/from the >python interpreter, then this is perfect, because it is the buffer of >NumPy that will be used to be written/read to/from disk, not the Python >object, and the buffer of such a NumPy object meets the requisites to >become an efficient buffer: fixed length *and* large enough to keep >*every* Unicode character without a need to use encodings. > > I see the value in such a buffer, I really do. I'm just concerned about forcing everyone to use Python UCS4 builds. That is way too stringent. I'm afraid the only real solution is to implement a UCS2 and a UCS4 data-type. >Well, I don't understand well here. I thought that you were proposing a >32-bit unicode type for NumPy and then converting it appropriately to >UCS2 (conversion to UCS4 wouldn't be necessary as it would be the same >as the native NumPy unicode type) just in case that the user requires an >scalar out of the NumPy object. But you are talking here about defining >separate UCS4 and UCS2 data-types. I admit that I'm loosed here... > > > I suppose that is another approach: we could internally have all UNICODE data-types use 4-bytes and do the conversions necessary. But, it would still require us to do most of work of supporting two data-types. Currently, the unicode scalar object is a simple inheritance from Python's UNICODE data-type. That would have to change and the work to do that is most of the work to support two different data-types. So, if we are going to go through that effort. I would rather see the result be two different Unicode data-types supported. -Travis |
|
From: Francesc A. <fa...@ca...> - 2006-02-07 20:08:13
|
El dt 07 de 02 del 2006 a les 12:26 -0700, en/na Travis Oliphant va escriure: > Python itself hands us this difference. Is it really so different then=20 > the fact that python integers are either 32-bit or 64-bit depending on=20 > the platform. =20 >=20 > Perhaps what this is telling us, is that we do indeed need another=20 > data-type for 4-byte unicode. It's how we solve the problem of 32-bit=20 > or 64-bit integers (we have a 64-bit integer on all platforms). Agreed. > Then in NumPy we can support going back and forth between UCS-2 (which=20 > we can then say is UTF-16) and UCS-4. If this could be implemented, then excellent! > The issue with saving to disk is really one of encoding anyway. So, if=20 > PyTables want's do do this correctly, then it should be using a=20 > particular encoding anyway. The problem with unicode encodings is that most (I'm thinking in UTF-8 and UTF-16) choose (correct me if I'm wrong here) a technique of surrogating pairs when trying to encode values that doesn't fit in a single word (7 bits for UTF-8 and 15 bits for UTF-16), which brings to a *variable* length of the coded output. And this is precisely the point: PyTables (as NumPy itself, or any other piece of software with efficiency in mind) would require a *fixed* space for keeping data, not a space that can be bigger or smaller depending on the number of surrogate pairs that should be used to encode a certain unicode string. But, if what you are saying is that NumPy would adopt a 32-bit unicode type internally and then do the appropriate conversion to/from the python interpreter, then this is perfect, because it is the buffer of NumPy that will be used to be written/read to/from disk, not the Python object, and the buffer of such a NumPy object meets the requisites to become an efficient buffer: fixed length *and* large enough to keep *every* Unicode character without a need to use encodings. > I think the best solution is to define separate UCS4 and UCS2 data-types=20 > and handle conversion between them using the casting functions. This=20 > is a bit of work to implement, but not too bad... Well, I don't understand well here. I thought that you were proposing a 32-bit unicode type for NumPy and then converting it appropriately to UCS2 (conversion to UCS4 wouldn't be necessary as it would be the same as the native NumPy unicode type) just in case that the user requires an scalar out of the NumPy object. But you are talking here about defining separate UCS4 and UCS2 data-types. I admit that I'm loosed here... Regards, --=20 >0,0< Francesc Altet http://www.carabos.com/ V V C=E1rabos Coop. V. Enjoy Data "-" |
|
From: Rich S. <rsh...@ap...> - 2006-02-07 19:31:18
|
On Tue, 7 Feb 2006, Travis Oliphant wrote: > The only real advantage is to ease the transition burden. Several > third-party libraries have not converted yet, so to use those you still > need Numeric. Thank you. Rich -- Richard B. Shepard, Ph.D. | Author of "Quantifying Environmental Applied Ecosystem Services, Inc. (TM) | Impact Assessments Using Fuzzy Logic" <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863 |
|
From: Travis O. <oli...@ie...> - 2006-02-07 19:26:17
|
Gerard Vermeulen wrote: >>While I agree that this solution is more consistent, I must say that >>I'm not very confortable with having to deal with two different widths >>for unicode characters. >> Python itself hands us this difference. Is it really so different then the fact that python integers are either 32-bit or 64-bit depending on the platform. Perhaps what this is telling us, is that we do indeed need another data-type for 4-byte unicode. It's how we solve the problem of 32-bit or 64-bit integers (we have a 64-bit integer on all platforms). Then in NumPy we can support going back and forth between UCS-2 (which we can then say is UTF-16) and UCS-4. The issue with saving to disk is really one of encoding anyway. So, if PyTables want's do do this correctly, then it should be using a particular encoding anyway. The internal representation of Unicode should not technically matter as it's only input and output that is important. I won't support requiring a UCS-4 build of Python, though. That's too stringent. Most characters are contained within the 0th plane of UCS-2. For the additional characters (only up to 0x0010FFFF are defined), the surrogate pairs can be used. I think the best solution is to define separate UCS4 and UCS2 data-types and handle conversion between them using the casting functions. This is a bit of work to implement, but not too bad... >Wouldn't it be possible that numpy takes care of the "surrogate pairs" >when transferring unicode strings from UCS2-interpreters to UCS4-ndarrays >and vice-versa? > >It would be nice to be able to cast explicitly between UCS2- and UCS4- arrays, >too. > >Requesting users to recompile their Python is a rather brutal solution :-) > > I agree. I much prefer an additional data-type since that is after-all what UCS2 and UCS4 are... different data-types. -Travis |
|
From: Travis O. <oli...@ie...> - 2006-02-07 19:11:33
|
Rich Shepard wrote: > On Tue, 7 Feb 2006, Travis Oliphant wrote: > >> No need to do that. Numeric and NumPy (import numpy) can live happily >> together. With versions of Numeric about 24.0, then can even share >> the same >> data. > > > Travis, > > Are there advantages to having both on the system? I read the Numeric > manual a couple of times, but haven't looked deeply at the division > between > the two. The only real advantage is to ease the transition burden. Several third-party libraries have not converted yet, so to use those you still need Numeric. -Travis |
|
From: Travis O. <oli...@ie...> - 2006-02-07 19:08:41
|
Tim Hochberg wrote: > > A couple of more minor issues. > > 1. numpy/random/mtrand/distributions.c needs M_PI defined if is not > already. I used the def from umathmodule.c: > > #ifndef M_PI > #define M_PI 3.14159265358979323846264338328 > #endif > > 2. The math library m.lib was hardcoded into numpy/random/setup.py. > I simply replaced ['m'] with [], which is probably not right in > general. It should probably be grabbed from config.h. > > 3. This made it through all the compiling, but blew up on linking > randomkit because sever CryptXXX functions were not defined. I added > 'Advapi32' to the libraries list. (In total libraries went from ['m'] > to ['Advapi32']. > > With this I got a full compile. I successfully imported numpy and > added a couple of matrices. Hooray! > > Is there a way to run it through some regression tests? That seems > like it should be the next step. > > Let's see if we can't fix up the setup.py file to handle this common platform correctly.... import numpy numpy.test(1,1) -Travis |
|
From: Gerard V. <ger...@gr...> - 2006-02-07 18:49:05
|
On Tue, 7 Feb 2006 15:42:34 +0100
Francesc Altet <fa...@ca...> wrote:
> A Dimarts 07 Febrer 2006 08:16, Travis Oliphant va escriure:
> > In current SVN, numpy assumes 'w' is 2-byte unicode and 'W' is 4-byte
> > unicode in the array interface typestring. Right now these codes
> > require that the number of bytes be specified explicitly (to satisfy the
> > array interface requirement). There is still only 1 Unicode data-type
> > on the platform and it has the size of Python's Py_UNICODE type. The
> > character 'U' continues to be useful on data-type construction to stand
> > for a unicode string of a specific character length. It's internal dtype
> > representation will use 'w' or 'W' depending on how Python was compiled.
> >
> > This may not solve all issues, but at least it's a bit more consistent
> > and solves the problem of
> >
> > dtype(dtype('U8').str) not producing the same datatype.
> >
> > It also solves the problem of unicode written out with one compilation
> > of Python and attempted to be written in with another (it won't let you
> > because only one of 'w#' or 'W#' is supported on a platform.
>
> While I agree that this solution is more consistent, I must say that
> I'm not very confortable with having to deal with two different widths
> for unicode characters. What bothers me is the lack portability of
> unicode strings when saving them to disk in python interpreters
> UCS4-enabled and retrieving with UCS2-enabled ones in the context of
> PyTables (or any other database). Let's suppose that a user have a
> numpy object of type unicode that has been created in a python with
> UCS4. This would look like:
>
> # UCS4-aware interpreter here
> >>> numpy.array(u"\U000110fc", "U1")
> array(u'\U000110fc', dtype=(unicode,4))
>
> Now, suppose that you save this in a PyTables file (for example) and
> you want to regenerate it on a python interpreter compiled with UCS2.
> As the buffer on-disk has a fixed length, we are forced to use unicode
> types twice as larger as containers for this data. So the net effect
> is that we will end in the UCS2 interpreter with an object like:
>
> # UCS2-aware interpreter here
> >>> numpy.array(u"\U000110fc", "U2")
> array(u'\U000110fc', dtype=(unicode,4))
>
> which, apparently is the same than the one above, but not quite. To
> begin with, the former is an array that is an unicode scalar with only
> *one* character, while the later has *two* characters. But worse than
> that, the interpretation of the original content changes drastically
> in the UCS2 platform. For example, if we select the first and second
> characters of the string in the UCS2-aware platform, we have:
>
> >>> numpy.array(u"\U000110fc", "U2")[()][0]
> u'\ud804'
> >>> numpy.array(u"\U000110fc", "U2")[()][1]
> u'\udcfc'
>
> that have nothing to do with the original \U000110fc character (I'd
> expect to get at least the truncated values \u0001 and \u10fc). I
> think this is because of the conventions that are used to represent
> 32-bit unicode characters in UTF-16 using a technique called
> "surrogate pairs" (see: http://www.unicode.org/glossary/).
>
> All in all, my opinion is that allowing the coexistence of different
> sizes of unicode types in numpy would be a receipt for disaster when
> one wants to transport unicode characters between platforms with
> python interpreters compiled with different unicode sizes.
> Consequently I'd propose to suport just one size of unicode sizes in
> numpy, namely, the 4-byte one, and if this size doesn't match the
> underlying python platform, then refuse to deliver native unicode
> objects if the user is asking for them. Something like would work:
>
> # UCS2-aware interpreter here
> >>> h=numpy.array(u"\U000110fc", "U1")
> >>> h # This is a 'true' 32-bit unicode array in numpy
> array(u'\U000110fc', dtype=(unicode,4))
> >>> h[()] # Try to get a native unicode object in python
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> ValueError: unicode sizes in numpy and your python interpreter doesn't
> match. Sorry, but you should get an UCS4-enable python interpreter if
> you want to successfully complete this operation.
>
> As a premium, we can get rid of the 'w' and 'W' typecodes that has
> been introduced a bit forcedly, IMO. I don't know, however, how
> difficult would be implementing this in numpy. Another option can be
> to refuse to compile numpy with UCS2-aware interpreters, but this
> sounds a bit extreme, but see below.
>
> OTOH, I'm not an expert in Unicode, but after googling a bit, I've
> found interesting recommendations about its use in Python. The first
> is from Uge Ubuchi in http://www.xml.com/pub/a/2005/06/15/py-xml.html.
> Here is the relevant excerpt:
>
> """
> I also want to mention another general principle to keep in mind: if
> possible, use a Python install compiled to use UCS4 character storage
> [...] UCS4 uses more space to store characters, but there are some
> problems for XML processing in UCS2, which the Python core team is
> reluctant to address because the only known fixes would be too much of
> a burden on performance. Luckily, most distributors have heeded this
> advice and ship UCS4 builds of Python.
> """
>
> So, it seems that the Python crew is not interested in solving
> problems with with UCS2. Now, towards the end of the PEP 261 ('Support
> for "wide" Unicode characters') one can read this as a final
> conclusion:
>
> """
> This PEP represents the least-effort solution. Over the next several
> years, 32-bit Unicode characters will become more common and that may
> either convince us that we need a more sophisticated solution or (on
> the other hand) convince us that simply mandating wide Unicode
> characters is an appropriate solution.
> """
>
> This PEP dates from 27-Jun-2001, so the "next several years" the
> author is referring to is nowadays. In fact, the interpreters in my
> Debian based Linux, are both compiled with UCS4. Despite of this, it
> seems that the default for compiling python is using UCS2 provided
> that you still need to pass the flag "--enable-unicode=ucs4" if you
> want to end with a UCS4-enabled interpreter. I wonder why they are
> doing this if that can positively lead to problems with XML as Uge
> Ubuchi said (?).
>
> Anyway, I don't know if the recommendation of compiling Python with
> UCS4 is spread enough or not in the different distributions, but
> people can easily check this with:
>
> >>> len(buffer(u"u"))
> 4
>
> if the output of this is 4 (as in my example), then the interpreter is
> using UCS4; if it is 2, it is using UCS2.
>
> Finally, I agree that asking for help about these issues in the python
> list would be a good idea.
>
I have no good solution for this problem, but the standard Python on my
1-year old Mandrake is still UCS2 and I quote from PEP-261:
Windows builds will be narrow for a while based on the fact that
there have been few requests for wide characters, those requests
are mostly from hard-core programmers with the ability to buy
their own Python and Windows itself is strongly biased towards
16-bit characters.
Suppose that is still true. Maybe Vista will change that.
Wouldn't it be possible that numpy takes care of the "surrogate pairs"
when transferring unicode strings from UCS2-interpreters to UCS4-ndarrays
and vice-versa?
It would be nice to be able to cast explicitly between UCS2- and UCS4- arrays,
too.
Requesting users to recompile their Python is a rather brutal solution :-)
Gerard
|
|
From: Tim H. <tim...@co...> - 2006-02-07 18:33:15
|
Eric Firing wrote: > Francesc, Travis, > > Francesc Altet wrote: > [...] > >> All in all, my opinion is that allowing the coexistence of different >> sizes of unicode types in numpy would be a receipt for disaster when >> one wants to transport unicode characters between platforms with >> python interpreters compiled with different unicode sizes. > > > I agree--it would be a nightmare. > > >> Anyway, I don't know if the recommendation of compiling Python with >> UCS4 is spread enough or not in the different distributions, but >> people can easily check this with: >> >> >>>>> len(buffer(u"u")) >>>> >> >> 4 >> >> if the output of this is 4 (as in my example), then the interpreter is >> using UCS4; if it is 2, it is using UCS2. > > > No, it is not sufficiently widespread; Mandriva 2006 python is > compiled for UCS2. Also the default build for MS Windows is compiled for UCS2. How about always storing data as UCS4 and converting it on the fly to UCS2 when extracting a python string from the array, if on a UCS2 python build. Isn't converting to UCS2 simply a matter of lopping off the top two bytes? If so, converting it should be simply a check that the value is not out of range, followed by the aforementioned lopping. -tim |
|
From: Eric F. <ef...@ha...> - 2006-02-07 18:21:44
|
Francesc, Travis, Francesc Altet wrote: [...] > All in all, my opinion is that allowing the coexistence of different > sizes of unicode types in numpy would be a receipt for disaster when > one wants to transport unicode characters between platforms with > python interpreters compiled with different unicode sizes. I agree--it would be a nightmare. > Anyway, I don't know if the recommendation of compiling Python with > UCS4 is spread enough or not in the different distributions, but > people can easily check this with: > > >>>>len(buffer(u"u")) > > 4 > > if the output of this is 4 (as in my example), then the interpreter is > using UCS4; if it is 2, it is using UCS2. No, it is not sufficiently widespread; Mandriva 2006 python is compiled for UCS2. Eric |
|
From: Rich S. <rsh...@ap...> - 2006-02-07 18:17:51
|
On Tue, 7 Feb 2006, Travis Oliphant wrote: > No need to do that. Numeric and NumPy (import numpy) can live happily > together. With versions of Numeric about 24.0, then can even share the same > data. Travis, Are there advantages to having both on the system? I read the Numeric manual a couple of times, but haven't looked deeply at the division between the two. Many thanks, Rich -- Richard B. Shepard, Ph.D. | Author of "Quantifying Environmental Applied Ecosystem Services, Inc. (TM) | Impact Assessments Using Fuzzy Logic" <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863 |
|
From: Travis O. <oli...@ie...> - 2006-02-07 18:15:29
|
Rich Shepard wrote: > Last evening I downloaded numpy-0.9.4 and scipy-0.4.4. I have an earlier >version of Numeric in /usr/lib/python2.4/site-packages/Numeric/. Should I >remove all references to Numeric before installing NumPy? > >Rich > > > No need to do that. Numeric and NumPy (import numpy) can live happily together. With versions of Numeric about 24.0, then can even share the same data. -Travis |
|
From: Arnd B. <arn...@we...> - 2006-02-07 18:12:43
|
On Tue, 7 Feb 2006, Pearu Peterson wrote:
[... /numpy/distutils/exec_command.py ...]
> from numpy.distutils.misc_util import is_sequence, is_string
>
> should be changed to
>
> from misc_util import is_sequence, is_string
>
> to fix this.
Making the same type of change in
numpy/distutils/system_info.py
worked if ATLAS is not used (`export ATLAS=None`).
Otherwise I get:
python numpy/distutils/system_info.py lapack_opt
lapack_opt_info:
lapack_mkl_info:
mkl_info:
NOT AVAILABLE
NOT AVAILABLE
atlas_threads_info:
Setting PTATLAS=ATLAS
system_info.atlas_threads_info
Setting PTATLAS=ATLAS
Setting PTATLAS=ATLAS
FOUND:
libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/home/baecker/python2/lib/atlas']
language = f77
include_dirs = ['/usr/include']
Traceback (most recent call last):
File "numpy/distutils/system_info.py", line 1693, in ?
show_all()
File "numpy/distutils/system_info.py", line 1689, in show_all
r = c.get_info()
File
"/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/system_info.py",
line 338, in get_info
self.calc_info()
File
"/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/system_info.py",
line 1123, in calc_info
atlas_version = get_atlas_version(**version_info)
File
"/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/system_info.py",
line 1028, in get_atlas_version
from core import Extension, setup
File
"/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/core.py",
line 12, in ?
from numpy.distutils.extension import Extension
ImportError: No module named numpy.distutils.extension
numpy/distutils/core.py is full of
`from numpy.distutils.command import ...`.
> >>> Concerning icc compilation I used:
> >>>
> >>> export FC_VENDOR=Intel
> >>
> >> This has no effect anymore. Use --fcompiler=intel instead.
> >
> > OK - I have to confess that I am really confused about
> > which options might work and which not.
> > Is there a document which describes this?
>
> FC_VENDOR env. variable was used in old f2py long time ago. When Fortran
> compiler support was moved to scipy_distutils, --fcompiler option was
> introduced to config, config_fc, build_ext,.. setup.py commands.
> One should use any of these commands to specify a Fortran compiler and
> config_fc to change various Fortran compiler flags. See
> python setup.py config_fc --help
> for more information.
>
> How to enhance C compiler options, see standard Distutils documentation.
>
> >>> export F77=ifort
> >>> export CC=icc
> >>> export CXX=icc
> >
> > But these are still needed?
>
> No for F77, using --fcompiler=.. should be enough. I am not sure about CC,
> CXX, must try it out..
>
> >> When Python is compiled with a different compiler than numpy (or any
> >> extension module) is going to be installed then proper libraries must be
> >> specified manually. Which libraries and flags are needed exactly, this is
> >> described in compilers manual.
> >>
> >> So, a recommended fix would be to build Python with icc and as a
> >> result correct libraries will be used for building 3rd party extension
> >> modules.
> >
> > This would also mean that all dependent packages will have
> > to be installed again, right?
> > I am sorry but then I won't be able to help with icc at the moment
> > as I am completely swamped with other stuff...
> >
> >> Otherwise one has to read compilers manual, sections like
> >> about gcc-compatibility and linking might be useful. See also
> >> http://www.scipy.org/Wiki/FAQ#head-8371c35ef08b877875217aaac5489fc747b4aceb
> >
> > I thought that supplying ``--libraries="irc"``
> > might cure the problem, but
> > (quoting from
> > http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2983903
> > )
> > """
> > However, in the build log I only found -lirc for
> > the config_tests but nowhere else.
> > What should I do instead of the above?
> > """
>
> Try:
>
> export CC=icc
> python setup.py build build_ext -lirc
>
> This will probably use gcc for linking
Yes, it does use gcc for linking. I also had to specify
the location of `libirc`,
export CC=icc
python setup.py build build_ext -L/opt/intel/cc_90/lib/ -lirc
followed by
python setup.py config --fcompiler=intel install
worked.
On import I get another error
import core -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N3/lib/python2.4/site-packages/numpy/core/umath.so:
undefined symbol: __libm_sincos
import random -> failed: 'module' object has no attribute 'dtype'
import lib -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N3/lib/python2.4/site-packages/numpy/core/umath.so:
undefined symbol: __libm_sincos
import linalg -> failed: /opt/intel/fc_90/lib/libunwind.so.6: undefined
symbol: ?1__serial_memmove
import dft -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N3/lib/python2.4/site-packages/numpy/core/umath.so:
undefined symbol: __libm_sincos
So it seems I will have to specify more libraries,
Would this be the correct syntax:
python setup.py build build_ext -L/opt/intel/cc_90/lib/:SomeOtherPath
-lirc:someotherlibrary
?
From ``python setup.py build build_ext --help``
--libraries (-l) external C libraries to link with
--library-dirs (-L) directories to search for external C libraries
(separated by ':')
it is not clear how to specify several libraries with "-l"?
But that did not work (neither did -lirc -lm)
> but might fix undefined symbol problems.
Many thanks,
Arnd
|
|
From: Colin J. W. <cj...@sy...> - 2006-02-07 18:01:14
|
Travis Oliphant wrote: > Gerard Vermeulen wrote: > >> On Wed, 01 Feb 2006 11:15:09 -0500 >> "Colin J. Williams" <cj...@sy...> wrote: >> >> [ currently numpy uses ndarray, with synonym ArrayType, for a >> multidimensional array ] >> >> >> >>> [Dbg]>>> import types >>> [Dbg]>>> dir(types) >>> ['BooleanType', 'BufferType', 'BuiltinFunctionType', >>> 'BuiltinMethodType', 'ClassType', 'CodeType', 'ComplexType', >>> 'DictProxyType', 'DictType', 'DictionaryType', 'EllipsisType', >>> 'FileType', 'FloatType', 'FrameType', 'FunctionType', 'GeneratorType', >>> 'Instance >>> Type', 'IntType', 'LambdaType', 'ListType', 'LongType', 'MethodType', >>> 'ModuleType', 'NoneType', 'NotImplementedType', 'ObjectType', >>> 'SliceType', 'StringType', 'StringTypes', 'TracebackType', 'TupleType', >>> 'TypeType', 'UnboundMethodType', 'UnicodeType', 'XRan >>> geType', '__builtins__', '__doc__', '__file__', '__name__'] >>> [Dbg]>>> >>> >>> >> >> >> Isn't the types module becoming superfluous? >> >> >> > That's the point I was trying to make. ArrayType is to ndarray as > DictionaryType is to dict. My understanding is that the use of > types.DictionaryType is discouraged. > > -Travis > I was simply trying to suggest that the name ArrayType is more appropriate name that ndbigarray or ndarray for the multidimensional array. Since the intent is, in the long run, to integrate numpy with the Python distribution, the use of a name in the style of the existing Python types would appear to be better. Is the types module becoming superfluous? I've cross posted to c.l.p to seek information on this. Colin W. |
|
From: Pearu P. <pe...@sc...> - 2006-02-07 16:06:36
|
On Tue, 7 Feb 2006, Arnd Baecker wrote: > On Tue, 7 Feb 2006, Pearu Peterson wrote: > >> On Tue, 7 Feb 2006, Arnd Baecker wrote: >> >>> Alright, we might need the asbestos suite thing: >>> >>> Something ahead: I normally used >>> python numpy/distutils/system_info.py lapack_opt >>> to figure out which library numpy is going to use. >>> With current svn I get the folloowing error: >>> >>> Traceback (most recent call last): >>> File "numpy/distutils/system_info.py", line 111, in ? >>> from exec_command import find_executable, exec_command, get_pythonexe >>> File >>> "/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/exec_command.py", >>> line 56, in ? >>> from numpy.distutils.misc_util import is_sequence >>> ImportError: No module named numpy.distutils.misc_util >> >> This occurs probably because numpy is not installed. > > Maybe I am wrong, but I thought that I could run the above > command before any installation to see which > libraries will be used. > My installation notes on this give me the feeling that > this used to work... from numpy.distutils.misc_util import is_sequence, is_string should be changed to from misc_util import is_sequence, is_string to fix this. >>> Concerning icc compilation I used: >>> >>> export FC_VENDOR=Intel >> >> This has no effect anymore. Use --fcompiler=intel instead. > > OK - I have to confess that I am really confused about > which options might work and which not. > Is there a document which describes this? FC_VENDOR env. variable was used in old f2py long time ago. When Fortran compiler support was moved to scipy_distutils, --fcompiler option was introduced to config, config_fc, build_ext,.. setup.py commands. One should use any of these commands to specify a Fortran compiler and config_fc to change various Fortran compiler flags. See python setup.py config_fc --help for more information. How to enhance C compiler options, see standard Distutils documentation. >>> export F77=ifort >>> export CC=icc >>> export CXX=icc > > But these are still needed? No for F77, using --fcompiler=.. should be enough. I am not sure about CC, CXX, must try it out.. >> When Python is compiled with a different compiler than numpy (or any >> extension module) is going to be installed then proper libraries must be >> specified manually. Which libraries and flags are needed exactly, this is >> described in compilers manual. >> >> So, a recommended fix would be to build Python with icc and as a >> result correct libraries will be used for building 3rd party extension >> modules. > > This would also mean that all dependent packages will have > to be installed again, right? > I am sorry but then I won't be able to help with icc at the moment > as I am completely swamped with other stuff... > >> Otherwise one has to read compilers manual, sections like >> about gcc-compatibility and linking might be useful. See also >> http://www.scipy.org/Wiki/FAQ#head-8371c35ef08b877875217aaac5489fc747b4aceb > > I thought that supplying ``--libraries="irc"`` > might cure the problem, but > (quoting from > http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2983903 > ) > """ > However, in the build log I only found -lirc for > the config_tests but nowhere else. > What should I do instead of the above? > """ Try: export CC=icc python setup.py build build_ext -lirc This will probably use gcc for linking but might fix undefined symbol problems. Pearu |