You can subscribe to this list here.
| 2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
| 2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
| 2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
| 2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
| 2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
| 2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
|
From: Arnd B. <arn...@we...> - 2006-02-07 15:26:39
|
On Tue, 7 Feb 2006, Pearu Peterson wrote:
> On Tue, 7 Feb 2006, Arnd Baecker wrote:
>
> > Alright, we might need the asbestos suite thing:
> >
> > Something ahead: I normally used
> > python numpy/distutils/system_info.py lapack_opt
> > to figure out which library numpy is going to use.
> > With current svn I get the folloowing error:
> >
> > Traceback (most recent call last):
> > File "numpy/distutils/system_info.py", line 111, in ?
> > from exec_command import find_executable, exec_command, get_pythonexe
> > File
> > "/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/exec_command.py",
> > line 56, in ?
> > from numpy.distutils.misc_util import is_sequence
> > ImportError: No module named numpy.distutils.misc_util
>
> This occurs probably because numpy is not installed.
Maybe I am wrong, but I thought that I could run the above
command before any installation to see which
libraries will be used.
My installation notes on this give me the feeling that
this used to work...
> > Concerning icc compilation I used:
> >
> > export FC_VENDOR=Intel
>
> This has no effect anymore. Use --fcompiler=intel instead.
OK - I have to confess that I am really confused about
which options might work and which not.
Is there a document which describes this?
> > export F77=ifort
> > export CC=icc
> > export CXX=icc
But these are still needed?
> > python setup.py config --compiler=intel install --prefix=$DESTnumpyDIR
> > | tee ../build_log_numpy_${nr}.txt
>
> There is no intel compiler. Allowed C compilers are
> unix,msvc,cygwin,mingw32,bcpp,mwerks,emx. Distutils should have given an
> exception when using --compiler=intel.
>
> If you are using IFC compiled blas/lapack libraries then --fcompiler=intel
> might produce importable extension modules (because then ifc is used for
> linking that knows about which intel libraries need be linked to a shared
> library).
For this test I haven't used any blas/lapack. But it is good to know.
> > Trying to test the resulting numpy gives:
> >
> > In [1]: import numpy
> > import core -> failed:
> > /home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/multiarray.so:
> > undefined symbol: ?1__serial_memmove
>
> <snip>
>
> > I already reported this a month ago with a bit more information
> > on a possible solution
> > http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2983903
>
> When Python is compiled with a different compiler than numpy (or any
> extension module) is going to be installed then proper libraries must be
> specified manually. Which libraries and flags are needed exactly, this is
> described in compilers manual.
>
> So, a recommended fix would be to build Python with icc and as a
> result correct libraries will be used for building 3rd party extension
> modules.
This would also mean that all dependent packages will have
to be installed again, right?
I am sorry but then I won't be able to help with icc at the moment
as I am completely swamped with other stuff...
> Otherwise one has to read compilers manual, sections like
> about gcc-compatibility and linking might be useful. See also
> http://www.scipy.org/Wiki/FAQ#head-8371c35ef08b877875217aaac5489fc747b4aceb
I thought that supplying ``--libraries="irc"``
might cure the problem, but
(quoting from
http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2983903
)
"""
However, in the build log I only found -lirc for
the config_tests but nowhere else.
What should I do instead of the above?
"""
Best, Arnd
|
|
From: Pearu P. <pe...@sc...> - 2006-02-07 14:59:18
|
On Tue, 7 Feb 2006, Arnd Baecker wrote:
> Alright, we might need the asbestos suite thing:
>
> Something ahead: I normally used
> python numpy/distutils/system_info.py lapack_opt
> to figure out which library numpy is going to use.
> With current svn I get the folloowing error:
>
> Traceback (most recent call last):
> File "numpy/distutils/system_info.py", line 111, in ?
> from exec_command import find_executable, exec_command, get_pythonexe
> File
> "/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/exec_command.py",
> line 56, in ?
> from numpy.distutils.misc_util import is_sequence
> ImportError: No module named numpy.distutils.misc_util
This occurs probably because numpy is not installed.
> Concerning icc compilation I used:
>
> export FC_VENDOR=Intel
This has no effect anymore. Use --fcompiler=intel instead.
> export F77=ifort
> export CC=icc
> export CXX=icc
> python setup.py config --compiler=intel install --prefix=$DESTnumpyDIR
> | tee ../build_log_numpy_${nr}.txt
There is no intel compiler. Allowed C compilers are
unix,msvc,cygwin,mingw32,bcpp,mwerks,emx. Distutils should have given an
exception when using --compiler=intel.
If you are using IFC compiled blas/lapack libraries then --fcompiler=intel
might produce importable extension modules (because then ifc is used for
linking that knows about which intel libraries need be linked to a shared
library).
> Trying to test the resulting numpy gives:
>
> In [1]: import numpy
> import core -> failed:
> /home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/multiarray.so:
> undefined symbol: ?1__serial_memmove
<snip>
> I already reported this a month ago with a bit more information
> on a possible solution
> http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2983903
When Python is compiled with a different compiler than numpy (or any
extension module) is going to be installed then proper libraries must be
specified manually. Which libraries and flags are needed exactly, this is
described in compilers manual.
So, a recommended fix would be to build Python with icc and as a
result correct libraries will be used for building 3rd party extension
modules. Otherwise one has to read compilers manual, sections like
about gcc-compatibility and linking might be useful. See also
http://www.scipy.org/Wiki/FAQ#head-8371c35ef08b877875217aaac5489fc747b4aceb
Pearu
|
|
From: Francesc A. <fa...@ca...> - 2006-02-07 14:43:08
|
A Dimarts 07 Febrer 2006 08:16, Travis Oliphant va escriure:
> In current SVN, numpy assumes 'w' is 2-byte unicode and 'W' is 4-byte
> unicode in the array interface typestring. Right now these codes
> require that the number of bytes be specified explicitly (to satisfy the
> array interface requirement). There is still only 1 Unicode data-type
> on the platform and it has the size of Python's Py_UNICODE type. The
> character 'U' continues to be useful on data-type construction to stand
> for a unicode string of a specific character length. It's internal dtype
> representation will use 'w' or 'W' depending on how Python was compiled.
>
> This may not solve all issues, but at least it's a bit more consistent
> and solves the problem of
>
> dtype(dtype('U8').str) not producing the same datatype.
>
> It also solves the problem of unicode written out with one compilation
> of Python and attempted to be written in with another (it won't let you
> because only one of 'w#' or 'W#' is supported on a platform.
While I agree that this solution is more consistent, I must say that
I'm not very confortable with having to deal with two different widths
for unicode characters. What bothers me is the lack portability of
unicode strings when saving them to disk in python interpreters
UCS4-enabled and retrieving with UCS2-enabled ones in the context of
PyTables (or any other database). Let's suppose that a user have a
numpy object of type unicode that has been created in a python with
UCS4. This would look like:
# UCS4-aware interpreter here
>>> numpy.array(u"\U000110fc", "U1")
array(u'\U000110fc', dtype=3D(unicode,4))
Now, suppose that you save this in a PyTables file (for example) and
you want to regenerate it on a python interpreter compiled with UCS2.
As the buffer on-disk has a fixed length, we are forced to use unicode
types twice as larger as containers for this data. So the net effect
is that we will end in the UCS2 interpreter with an object like:
# UCS2-aware interpreter here
>>> numpy.array(u"\U000110fc", "U2")
array(u'\U000110fc', dtype=3D(unicode,4))
which, apparently is the same than the one above, but not quite. To
begin with, the former is an array that is an unicode scalar with only
*one* character, while the later has *two* characters. But worse than
that, the interpretation of the original content changes drastically
in the UCS2 platform. For example, if we select the first and second
characters of the string in the UCS2-aware platform, we have:
>>> numpy.array(u"\U000110fc", "U2")[()][0]
u'\ud804'
>>> numpy.array(u"\U000110fc", "U2")[()][1]
u'\udcfc'
that have nothing to do with the original \U000110fc character (I'd
expect to get at least the truncated values \u0001 and \u10fc). I
think this is because of the conventions that are used to represent
32-bit unicode characters in UTF-16 using a technique called
"surrogate pairs" (see: http://www.unicode.org/glossary/).
All in all, my opinion is that allowing the coexistence of different
sizes of unicode types in numpy would be a receipt for disaster when
one wants to transport unicode characters between platforms with
python interpreters compiled with different unicode sizes.
Consequently I'd propose to suport just one size of unicode sizes in
numpy, namely, the 4-byte one, and if this size doesn't match the
underlying python platform, then refuse to deliver native unicode
objects if the user is asking for them. Something like would work:
# UCS2-aware interpreter here
>>> h=3Dnumpy.array(u"\U000110fc", "U1")
>>> h # This is a 'true' 32-bit unicode array in numpy
array(u'\U000110fc', dtype=3D(unicode,4))
>>> h[()] # Try to get a native unicode object in python
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: unicode sizes in numpy and your python interpreter doesn't
match. Sorry, but you should get an UCS4-enable python interpreter if
you want to successfully complete this operation.
As a premium, we can get rid of the 'w' and 'W' typecodes that has
been introduced a bit forcedly, IMO. I don't know, however, how
difficult would be implementing this in numpy. Another option can be
to refuse to compile numpy with UCS2-aware interpreters, but this
sounds a bit extreme, but see below.
OTOH, I'm not an expert in Unicode, but after googling a bit, I've
found interesting recommendations about its use in Python. The first
is from Uge Ubuchi in http://www.xml.com/pub/a/2005/06/15/py-xml.html.
Here is the relevant excerpt:
"""
I also want to mention another general principle to keep in mind: if
possible, use a Python install compiled to use UCS4 character storage
[...] UCS4 uses more space to store characters, but there are some
problems for XML processing in UCS2, which the Python core team is
reluctant to address because the only known fixes would be too much of
a burden on performance. Luckily, most distributors have heeded this
advice and ship UCS4 builds of Python.
"""
So, it seems that the Python crew is not interested in solving
problems with with UCS2. Now, towards the end of the PEP 261 ('Support
for "wide" Unicode characters') one can read this as a final
conclusion:
"""
This PEP represents the least-effort solution. Over the next several
years, 32-bit Unicode characters will become more common and that may
either convince us that we need a more sophisticated solution or (on
the other hand) convince us that simply mandating wide Unicode
characters is an appropriate solution.
"""
This PEP dates from 27-Jun-2001, so the "next several years" the
author is referring to is nowadays. In fact, the interpreters in my
Debian based Linux, are both compiled with UCS4. Despite of this, it
seems that the default for compiling python is using UCS2 provided
that you still need to pass the flag "--enable-unicode=3Ducs4" if you
want to end with a UCS4-enabled interpreter. I wonder why they are
doing this if that can positively lead to problems with XML as Uge
Ubuchi said (?).
Anyway, I don't know if the recommendation of compiling Python with
UCS4 is spread enough or not in the different distributions, but
people can easily check this with:
>>> len(buffer(u"u"))
4
if the output of this is 4 (as in my example), then the interpreter is
using UCS4; if it is 2, it is using UCS2.
=46inally, I agree that asking for help about these issues in the python
list would be a good idea.
Cheers,
=2D-=20
>0,0< Francesc Altet =A0 =A0 http://www.carabos.com/
V V C=E1rabos Coop. V. =A0=A0Enjoy Data
"-"
|
|
From: Arnd B. <arn...@we...> - 2006-02-07 13:49:54
|
Hi Travis,
On Mon, 6 Feb 2006, Travis Oliphant wrote:
> We need to test numpy on other compilers besides gcc, so that we can
> ferret out any gnu-isms that we may be relying on.
>
> Anybody out there with compilers they are willing to try out and/or
> report on?
Alright, we might need the asbestos suite thing:
Something ahead: I normally used
python numpy/distutils/system_info.py lapack_opt
to figure out which library numpy is going to use.
With current svn I get the folloowing error:
Traceback (most recent call last):
File "numpy/distutils/system_info.py", line 111, in ?
from exec_command import find_executable, exec_command, get_pythonexe
File
"/work/home/baecker/INSTALL_PYTHON5_icc/CompileDir/numpy/numpy/distutils/exec_command.py",
line 56, in ?
from numpy.distutils.misc_util import is_sequence
ImportError: No module named numpy.distutils.misc_util
Concerning icc compilation I used:
export FC_VENDOR=Intel
export F77=ifort
export CC=icc
export CXX=icc
python setup.py config --compiler=intel install --prefix=$DESTnumpyDIR
| tee ../build_log_numpy_${nr}.txt
The build log shows
1393 warnings
3362 remarks
Should I post them off-list or on scipy-dev?
Trying to test the resulting numpy gives:
In [1]: import numpy
import core -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/multiarray.so:
undefined symbol: ?1__serial_memmove
import random -> failed: 'module' object has no attribute 'dtype'
import lib -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/multiarray.so:
undefined symbol: ?1__serial_memmove
import linalg -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/linalg/lapack_lite.so:
undefined symbol: ?1__serial_memmove
import dft -> failed:
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/multiarray.so:
undefined symbol: ?1__serial_memmove
---------------------------------------------------------------------------
exceptions.ImportError Traceback (most
recent call last)
/work/home/baecker/<ipython console>
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/__init__.py
43
44 test = ScipyTest('numpy').test
---> 45 import add_newdocs
46
47 __doc__ += """
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/add_newdocs.py
----> 2 from lib import add_newdoc
3
4 add_newdoc('numpy.core','dtypedescr',
5 [('fields', "Fields of the data-typedescr if any."),
6 ('alignment', "Needed alignment for this data-type"),
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/lib/__init__.py
3 from numpy.version import version as __version__
4
----> 5 from type_check import *
6 from index_tricks import *
7 from function_base import *
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/lib/type_check.py
6 'common_type']
7
----> 8 import numpy.core.numeric as _nx
9 from numpy.core.numeric import ndarray, asarray, array, isinf,
isnan, \
10 isfinite, signbit, ufunc, ScalarType, obj2sctype
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/__init__.py
3 from numpy.version import version as __version__
4
----> 5 import multiarray
6 import umath
7 import numerictypes as nt
ImportError:
/home/baecker/python2/scipy_icc5_lintst_n_N0/lib/python2.4/site-packages/numpy/core/multiarray.so:
undefined symbol: ?1__serial_memmove
I already reported this a month ago with a bit more information
on a possible solution
http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2983903
Best, Arnd
|
|
From: Pearu P. <pe...@sc...> - 2006-02-07 11:54:32
|
On Mon, 6 Feb 2006, Andrew Straw wrote: > I've significantly updated the page at > http://scipy.org/Wiki/Cookbook/Pyrex_and_NumPy FYI, numpy.distutils now supports building pyrex extension modules. See numpy/distutils/tests/pyrex_ext/ for a working example. In case of Cookbook/Pyrex_and_NumPy, the corresponding setup.py file is: #!/usr/bin/env python def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration config = Configuration('mypackage',parent_package,top_path) config.add_extension('pyrex_and_numpy', sources = ['test.pyx'], depends = ['c_python.pxd','c_numpy.pxd']) return config if __name__ == "__main__": from numpy.distutils.core import setup setup(**configuration(top_path='').todict()) And to build the package inplace, use python setup.py build_src build_ext --inplace Pearu |
|
From: Arnd B. <arn...@we...> - 2006-02-07 11:12:09
|
On Mon, 6 Feb 2006, Jeff Whitaker wrote: > Andrew Straw wrote: > > > Hi Jeff, > > > > I've significantly updated the page at > > http://scipy.org/Wiki/Cookbook/Pyrex_and_NumPy > > > > Pyrex should be able to do everything you need. > > > > I hope you find the revised page more useful. Please let me know (or > > fix the page) if you have any issues or questions. > > > > Cheers! > > Andrew > > Andrew: Thanks! That looks like exactly what I need. -Jeff Very nice! Would it be better the policy that any runnable .py file is an attachment (see tst.py in http://scipy.org/Wiki/WikiSandBox) to the page, so that it can be easily downloaded? Presently one has to disable line numbers, copy the text, paste into an editor and save with the right file name... Best, Arnd |
|
From: Francesc A. <fa...@ca...> - 2006-02-07 10:54:49
|
A Dimarts 07 Febrer 2006 01:08, Travis Oliphant va escriure: > In SVN of numpy, the dtype objects now have a .base attribute and a > .shape attribute. > > The .shape attribute returns (1,) or the shape of the sub-array. Uh, it wouldn't be better to put .shape =3D 1 in case of a scalar field and (...) for a non-scalar field? Remember that this is the current convention for the numpy protocol. > The .base attribute returns the data-type object of the base-type, or a > new reference to self, if the object has no base.type. > > Thus, in current SVN > > dtype['x'].base.name would always give you what you want. Great. I like it. Thanks! =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
|
From: Andrew J. <a.h...@gm...> - 2006-02-07 09:09:48
|
Also, what is the status of gcc 4.0 support (on Mac OS X at least)? It's a bit of a pain to have to switch between the two (are there any other disadvantages?). Andrew Travis Oliphant wrote: > > We need to test numpy on other compilers besides gcc, so that we can > ferret out any gnu-isms that we may be relying on. > > Anybody out there with compilers they are willing to try out and/or > report on? > > Thanks, > > -Travis > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 |
|
From: Travis O. <oli...@ie...> - 2006-02-07 07:16:48
|
> I'm way out of my depth here, but it really sounds like there needs to
> be one descriptor for each type. Just for example "U" could be 2-byte
> unicode and "V" (assuming it's not taken already) could be 4-byte
> unicode. Then the size for a given descriptor would be constant and
> things would be much less confusing.
In current SVN, numpy assumes 'w' is 2-byte unicode and 'W' is 4-byte
unicode in the array interface typestring. Right now these codes
require that the number of bytes be specified explicitly (to satisfy the
array interface requirement). There is still only 1 Unicode data-type
on the platform and it has the size of Python's Py_UNICODE type. The
character 'U' continues to be useful on data-type construction to stand
for a unicode string of a specific character length. It's internal dtype
representation will use 'w' or 'W' depending on how Python was compiled.
This may not solve all issues, but at least it's a bit more consistent
and solves the problem of
dtype(dtype('U8').str) not producing the same datatype.
It also solves the problem of unicode written out with one compilation
of Python and attempted to be written in with another (it won't let you
because only one of 'w#' or 'W#' is supported on a platform.
-Travis
|
|
From: Travis O. <oli...@ie...> - 2006-02-07 06:40:01
|
We need to test numpy on other compilers besides gcc, so that we can ferret out any gnu-isms that we may be relying on. Anybody out there with compilers they are willing to try out and/or report on? Thanks, -Travis |
|
From: Travis O. <oli...@ie...> - 2006-02-07 06:13:36
|
Tim Hochberg wrote:
> Travis Oliphant wrote:
>
>> Tim Hochberg wrote:
>>
>>>
>>> Just a little update on this:
>>>
>>> It appears that all (or almost all) of the checks in
>>> generate_config_h must be failing. I would guess from a missing
>>> library or some such. I will investigate some more and see what I find.
>>>
>> That shouldn't be a big problem. It just means that NumPy will
>> provide the missing features instead of using the system functions.
>> More problematic is the strange errors you are getting about void *
>> not having a size. The line numbers you show are where we have
>> variable declarations like
>>
>> register intp i
>>
>> Is it possible that integers the size of void * cannot be placed in a
>> register??
>
>
> OK, I think I found what causes the problem. What we have is lines like:
>
> for(i=0; i<n; i++, ip+=skip, op+=oskip) {
>
> where op is declared (void*).
There shouldn't be anything like that. These should all be char *.
Where did you see these?
>
> Of course, unfuncmodule then failed to compile. A quick peak shows
> that it's throwing a lot of syntax errors. It appears to happen
> whenever there's a longdouble function defined. For example:
>
> longdouble sinl(longdouble x) {
> return (longdouble) sin((double)x);
> }
On your platform longdouble should be equivalent to double, so I'm not
sure why this would fail.
-Travis
|
|
From: Travis O. <oli...@ie...> - 2006-02-07 04:03:17
|
Tim Hochberg wrote: > > Just a little update on this: > > It appears that all (or almost all) of the checks in generate_config_h > must be failing. I would guess from a missing library or some such. I > will investigate some more and see what I find. > That shouldn't be a big problem. It just means that NumPy will provide the missing features instead of using the system functions. More problematic is the strange errors you are getting about void * not having a size. The line numbers you show are where we have variable declarations like register intp i Is it possible that integers the size of void * cannot be placed in a register?? -Travis |
|
From: Travis O. <oli...@ee...> - 2006-02-07 01:27:49
|
Tim Hochberg wrote:
>> Right now, the typestring value gives the number of bytes in the
>> type. Thus, "U4" gives dtype("<U8") on my system where
>> sizeof(Py_UNICODE)==2, but on another system it could give
>> dtype("<U16").
>> I know only a little-bit about unicode. The full Unicode character
>> is a 4-byte entity, but there are standard 2-byte (UTF-16) and even
>> 1-byte (UTF-8) encoders.
>>
>> I changed the source so that ("<U8") gets interpreted the same as
>> "U4" (i.e. if you specify an endianness then you are being
>> byte-conscious anyway and so the number is interpreted as a byte,
>> otherwise the number is interpreted as a length). This fixes issues
>> on the same platform, but does not fix issues where data is saved out
>> with one Python interpreter and read in by another with a different
>> value of sizeof(Py_UNICODE).
>
>
> This sounds like a mess. I'm not sure what the level of Unicode
> expertise is one this list (I certainly don't add to it), but I'd be
> tempted to raise this issue on PythonDev and see if anyone there has
> any good suggestions.
>
I'm not a unicode expert, but I have read-up on it so I think I at least
understand the issues involved.
> I'm way out of my depth here, but it really sounds like there needs to
> be one descriptor for each type. Just for example "U" could be 2-byte
> unicode and "V" (assuming it's not taken already) could be 4-byte
> unicode. Then the size for a given descriptor would be constant and
> things would be much less confusing.
>
This is what I'm currently thinking. The question is would we have to
define a new basic data-type for 4-byte unicode or would we just handle
this on the input. Would we also define a 1-byte unicode data-type or
just let the user deal with that using standard strings and encoding as
is currently done in Python.
-Travis
|
|
From: Tim H. <tim...@co...> - 2006-02-07 01:13:19
|
Travis Oliphant wrote:
> Francesc Altet wrote:
>
>> Hi,
>>
>> I'm a bit surprised by the fact that unicode types are the only ones
>> breaking the rule that must be specified with a different number of
>> bytes than it really takes. For example:
>>
>>
>
> Right now, the array protocol typestring is a little ambiguous on
> unicode characters. Ideally, the array interface would describe what
> kind of Unicode characters are being dealt with so that 2-byte and
> 4-byte unicode characters have a different description in the typestring.
>
> Python can be compiled with Unicode as either 2-byte or 4-byte. The
> 'U#' descriptor is supposed to be the Python unicode data-type with #
> representing the number of characters. If this data-type is handed
> off to a Python that is compiled with a different representation for
> Unicode, then we have a problem.
>
> Right now, the typestring value gives the number of bytes in the
> type. Thus, "U4" gives dtype("<U8") on my system where
> sizeof(Py_UNICODE)==2, but on another system it could give dtype("<U16").
> I know only a little-bit about unicode. The full Unicode character is
> a 4-byte entity, but there are standard 2-byte (UTF-16) and even
> 1-byte (UTF-8) encoders.
>
> I changed the source so that ("<U8") gets interpreted the same as "U4"
> (i.e. if you specify an endianness then you are being byte-conscious
> anyway and so the number is interpreted as a byte, otherwise the
> number is interpreted as a length). This fixes issues on the same
> platform, but does not fix issues where data is saved out with one
> Python interpreter and read in by another with a different value of
> sizeof(Py_UNICODE).
This sounds like a mess. I'm not sure what the level of Unicode
expertise is one this list (I certainly don't add to it), but I'd be
tempted to raise this issue on PythonDev and see if anyone there has any
good suggestions.
I'm way out of my depth here, but it really sounds like there needs to
be one descriptor for each type. Just for example "U" could be 2-byte
unicode and "V" (assuming it's not taken already) could be 4-byte
unicode. Then the size for a given descriptor would be constant and
things would be much less confusing.
-tim
|
|
From: Travis O. <oli...@ee...> - 2006-02-07 00:08:31
|
Francesc Altet wrote:
>Hi,
>
>I don't specially like the 'void*' typecasting that are receiving the
>types in fields in situations like:
>
>In [143]:dtype = numpy.dtype([
> .....: ('x', '<i4', (2,)),
> .....: ('Info',[
> .....: ('name', '<U120'),
> .....: ('weight', '<f4')])])
>
>In [147]:dtype.fields['x'][0].name
>Out[147]:'void64'
>
>were you can see that we have lost the information about the native type
>of the 'x' field. Rather, I'd expect something like:
>
>
In SVN of numpy, the dtype objects now have a .base attribute and a
.shape attribute.
The .shape attribute returns (1,) or the shape of the sub-array.
The .base attribute returns the data-type object of the base-type, or a
new reference to self, if the object has no base.type.
Thus, in current SVN
dtype['x'].base.name would always give you what you want.
-Travis
|
|
From: Travis O. <oli...@ee...> - 2006-02-06 22:15:01
|
Francesc Altet wrote:
>Hi,
>
>I'm a bit surprised by the fact that unicode types are the only ones
>breaking the rule that must be specified with a different number of
>bytes than it really takes. For example:
>
>
Right now, the array protocol typestring is a little ambiguous on
unicode characters. Ideally, the array interface would describe what
kind of Unicode characters are being dealt with so that 2-byte and
4-byte unicode characters have a different description in the typestring.
Python can be compiled with Unicode as either 2-byte or 4-byte. The
'U#' descriptor is supposed to be the Python unicode data-type with #
representing the number of characters. If this data-type is handed off
to a Python that is compiled with a different representation for
Unicode, then we have a problem.
Right now, the typestring value gives the number of bytes in the type.
Thus, "U4" gives dtype("<U8") on my system where sizeof(Py_UNICODE)==2,
but on another system it could give dtype("<U16").
I know only a little-bit about unicode. The full Unicode character is a
4-byte entity, but there are standard 2-byte (UTF-16) and even 1-byte
(UTF-8) encoders.
I changed the source so that ("<U8") gets interpreted the same as "U4"
(i.e. if you specify an endianness then you are being byte-conscious
anyway and so the number is interpreted as a byte, otherwise the number
is interpreted as a length). This fixes issues on the same platform,
but does not fix issues where data is saved out with one Python
interpreter and read in by another with a different value of
sizeof(Py_UNICODE).
-Travis
|
|
From: Jeff W. <js...@fa...> - 2006-02-06 21:31:45
|
Andrew Straw wrote: > Hi Jeff, > > I've significantly updated the page at > http://scipy.org/Wiki/Cookbook/Pyrex_and_NumPy > > Pyrex should be able to do everything you need. > > I hope you find the revised page more useful. Please let me know (or > fix the page) if you have any issues or questions. > > Cheers! > Andrew Andrew: Thanks! That looks like exactly what I need. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Andrew S. <str...@as...> - 2006-02-06 20:32:49
|
Hi Jeff, I've significantly updated the page at http://scipy.org/Wiki/Cookbook/Pyrex_and_NumPy Pyrex should be able to do everything you need. I hope you find the revised page more useful. Please let me know (or fix the page) if you have any issues or questions. Cheers! Andrew |
|
From: Travis O. <oli...@ee...> - 2006-02-06 19:20:16
|
Francesc Altet wrote:
>Hi,
>
>I don't specially like the 'void*' typecasting that are receiving the
>types in fields in situations like:
>
>In [143]:dtype = numpy.dtype([
> .....: ('x', '<i4', (2,)),
> .....: ('Info',[
> .....: ('name', '<U120'),
> .....: ('weight', '<f4')])])
>
>In [147]:dtype.fields['x'][0].name
>Out[147]:'void64'
>
>were you can see that we have lost the information about the native type
>of the 'x' field. Rather, I'd expect something like:
>
>
Well, it's actually there. Look at
dtype.fields['x'][0].subdtype[0]
dtype.fields['x'][0].subdtype[1]
The issue is that the base data-type of the 'x' field is void-64 (that's
the dtype object the array "sees").
-Travis
|
|
From: Travis O. <oli...@ee...> - 2006-02-06 19:16:03
|
Francesc Altet wrote: >Hi, > >I'm a bit surprised by the fact that unicode types are the only ones >breaking the rule that must be specified with a different number of >bytes than it really takes. For example: > > Yeah, it's a bit annoying. There are special checks throughout the code for this. The problem, though is that sizeof(Py_UNICODE) can be 4 or 2 depending on how Python was compiled. Also, Python treats unicode and string characters as having the same length (even though internally, there is a different number of bytes required). So, I'm not sure exactly what to do, short of introducing a new code for "Unicode with specific number of bytes." I think the inconsistency should be removed, though. I'm just not sure how to do it. -Travis |
|
From: Francesc A. <fa...@ca...> - 2006-02-06 19:05:36
|
Hi,
I don't specially like the 'void*' typecasting that are receiving the
types in fields in situations like:
In [143]:dtype =3D numpy.dtype([
.....: ('x', '<i4', (2,)),
.....: ('Info',[
.....: ('name', '<U120'),
.....: ('weight', '<f4')])])
In [147]:dtype.fields['x'][0].name
Out[147]:'void64'
were you can see that we have lost the information about the native type
of the 'x' field. Rather, I'd expect something like:
In [147]:dtype.fields['x'][0].name
Out[147]:'int32'
although for this to be complete one should add a 'shape' attribute to
dtype:
In [147]:dtype.fields['x'][0].shape
Out[147]:(2,)
In fact, this information is already contained in dtype container:
In [148]:dtype.fields['x'][0]
Out[148]:dtype(('<i4',(2,)))
but this is clearly little consistent with the current mapping of the
dtype.fields['x'] attributes.
I strongly feel that this suggested approach makes more sense than the
current one. Thoughts?
--=20
>0,0< Francesc Altet http://www.carabos.com/
V V C=E1rabos Coop. V. Enjoy Data
"-"
|
|
From: Francesc A. <fa...@ca...> - 2006-02-06 18:52:43
|
Hi,
I've implemented a simple mapping protocol in the descriptor type so
that the user would be able to do:
In [138]:dtype =3D numpy.dtype([
.....: ('x', '<i4', (2,)),
.....: ('Info',[
.....: ('name', '<U120'),
.....: ('weight', '<f4')])])
In [139]:dtype['Info'].name
Out[139]:'void3872'
In [140]:dtype['Info']['name'].type
Out[140]:<type 'unicodescalar'>
instead of the current:
In [141]:dtype.fields['Info'][0].name
Out[141]:'void3872'
In [142]:dtype.fields['Info'][0].fields['name'][0].type
Out[142]:<type 'unicodescalar'>
which I find cumbersome to type. Find the patch for this in the
attachments.
OTOH, I've completed the tests for heterogeneous objects in
test_numerictypes.py. Now, there is a better check for both flat and
nested fields, as well as explicit checking of type descriptors
(including tests for the new mapping interface in descriptors). So far,
no more problems have been detected by the new tests :-). Please, note
that you will need the patch above applied in order to run the tests.
Travis, if you think that it would be better to do not apply the patch,
the tests can be easily adapted by changing lines like:
self.assert_(h.dtype['x'][0].name[:4] =3D=3D 'void')=20
by
self.assert_(h.dtype.fields['x'][0].name[:4] =3D=3D 'void')=20
Cheers,
--=20
>0,0< Francesc Altet http://www.carabos.com/
V V C=E1rabos Coop. V. Enjoy Data
"-"
|
|
From: Francesc A. <fa...@ca...> - 2006-02-06 18:24:36
|
Hi,
I'm a bit surprised by the fact that unicode types are the only ones
breaking the rule that must be specified with a different number of
bytes than it really takes. For example:
In [120]:numpy.dtype([('x','c16')])
Out[120]:dtype([('x', '<c16')])
In [121]:numpy.dtype([('x','S16')])
Out[121]:dtype([('x', '|S16')])
but:
In [119]:numpy.dtype([('x','U4')])
Out[119]:dtype([('x', '<U16')])
Even worse:
In [126]:numpy.dtype(numpy.dtype('u4').str)
Out[126]:dtype('<u4')
but:
In [125]:numpy.dtype(numpy.dtype('U4').str)
Out[125]:dtype('<U64') # !!!!
which can quickly led to problems in users' code.
I think that, for the sake of consistency and exactly like the user must
know that a c16 is a complex taking 16 octets, he must know that a
unicode character should take 4 bytes. With this, we should have:
In [119]:numpy.dtype([('x','U4')])
Out[119]:dtype([('x', '<U4')])
and forbid unicode character length that are not multiple of 4. I know
that, initially, it would be a bit strange for the user to specify 'S4'
for a string with 4 chars and 'U16' for an unicode string of 4 chars as
well, but hopefully he would be used soon to this.
The only problem with that I see with what I'm proposing is that I don't
know whether the unicode would take always 4-bytes in all the platforms
(--> 64-bit issues?). OTOH, I thought that Python would represent
internally unicode strings with 16-bit chars. Oh well, I'm bit lost on
this. Anybody can bring some light?
Cheers,
--=20
>0,0< Francesc Altet http://www.carabos.com/
V V C=E1rabos Coop. V. Enjoy Data
"-"
|
|
From: Nico <nic...@li...> - 2006-02-06 15:13:51
|
Hi. I'm a new user of the numpy-discussion and scipy-user mailing-lists. So, as I usually do, here are a few words about me and my use of numpy/scipy. I am a doctorate student, in Paris; I will work on numerical analysis, mesh generation and image processing, and I intend to do the prototyping (and maybe everything) of my works with python. I recently choosed python because... - flexible and rich language for array manipulation - seems a good language to help me write clean, clear, bug-free and reusable code - seems possible to make a GUI frontend without too much pain - seems OK to glue with various other C/fortran applications without to much pain - free, as in free beer (I had to work on Matlab previously, and I don't like to force people pay for an expensive licence if they are interested in my work) - free, as in free speech (... I also had serious problems, needing compatibility of Matlab with a linux kernel not officially supported) I use numpy/scipy on Debian/Ubuntu, building from the release tarballs. And I am currently reading the available documentation... Last thing: What about a #scipy irc channel? I feel there are too many people on irc.freenode.org/#python for an efficient use. Happy coding! -- Nico |
|
From: Jeff W. <js...@fa...> - 2006-02-06 13:00:55
|
Travis Oliphant wrote: > Jeff Whitaker wrote: > >> >> Hi: I've successfully used the examples at >> http://www.scipy.org/Wiki/Cookbook/Pyrex_and_NumPy to access the data >> in a 'normal' numpy array, but have had no success adapting these >> examples to work with object arrays. I understand that the .data >> attribute holds pointers to the objects which actually contain the >> data in an object array, but how to you use those pointers to get the >> data in C/pyrex? > > You have a pointer to a PyObject *object in the data. Thus, data > should be recast to PyObject **. I don't know how to do that in PyRex. Travis: Apparently not. If I try to do this pyrex says 115:25: Pointer base type cannot be a Python object > But, it's easy in C. > In C, you will need to be concerned about reference counts. OK, I was hoping to avoid hand-coding an extension in C (which I'm woefully unqualified to do). -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |