From: Mike R. <mik...@gm...> - 2006-07-25 00:36:55
|
I'm trying to work with memmaps on very large files, i.e. > 2 GB, up to 10 GB. The files are data cubes of images (my largest is 1290(x)x1024(y)x2011(z)) and my immediate task is to strip the data from 32-bits down to 16, and to rearrange some of the data on a per-xy-plane basis. I'm running this on a Fedora Core 5 64-bit system, with python-2.5b2(that I believe I compiled in 64-bit mode) and numpy-1.0b1. The disk has 324 GB free space. The log from a minimal case is as follows: ressler > python2.5 Python 2.5b2 (r25b2:50512, Jul 18 2006, 12:58:29) [GCC 4.1.1 20060525 (Red Hat 4.1.1-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> data=np.memmap('temp_file',mode='w+',shape=(2011,1280,1032),dtype='h') size = 2656450560 bytes = 5312901120 len(mm) = 5312901120 (2011, 1280, 1032) h 0 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.5/site-packages/numpy/core/memmap.py", line 75, in __new__ offset=offset, order=order) TypeError: buffer is too small for requested array >>> If I have a small number of frames (z=800 rather than 2011), this all works fine. I've added a few lines to memmap.py to print some diagnostic information - the error occurs on line 71 in the original memmap.py file, not 75. The "size =" and "bytes =" lines show that memmap.py is calculating the correct size for the buffer, and the len(mm) shows that the python mmap.mmap call on line 67 is returning a buffer of the correct size. The "(2011, 1280, 1032) h 0 0" bit is from a print statement that was left in the source file by the authors, and indicates what the following "self = ndarray.__new__" call is trying to do. However, it is the ndarray.__new__ call that is breaking down, and I don't really have enough skill to continue chasing it down. I took a quick look at the C source, but I couldn't figure out where the ndarray.__new__ is actually defined. Any suggestions to help me past this? Thanks. Mike -- mik...@al... |
From: Travis O. <oli...@ie...> - 2006-07-25 06:47:11
|
Mike Ressler wrote: > I'm trying to work with memmaps on very large files, i.e. > 2 GB, up > to 10 GB. The files are data cubes of images (my largest is > 1290(x)x1024(y)x2011(z)) and my immediate task is to strip the data > from 32-bits down to 16, and to rearrange some of the data on a > per-xy-plane basis. I'm running this on a Fedora Core 5 64-bit system, > with python-2.5b2 (that I believe I compiled in 64-bit mode) and > numpy-1.0b1. The disk has 324 GB free space. I just discovered the problem. All the places where PyObject_As<Read/Write>Buffer is used needs to have the final argument changed to Py_ssize_t (which in arrayobject.h is defined as int if you are using less than Python 2.5). This should be fixed in SVN shortly.... -Travis |
From: Mike R. <mik...@al...> - 2006-07-28 16:47:31
|
On 7/24/06, Travis Oliphant <oli...@ie...> wrote: > > Mike Ressler wrote: > > I'm trying to work with memmaps on very large files, i.e. > 2 GB, up > > to 10 GB. Can't believe I'm really the first, but so be it. I just discovered the problem. All the places where > PyObject_As<Read/Write>Buffer is used needs to have the final argument > changed to Py_ssize_t (which in arrayobject.h is defined as int if you > are using less than Python 2.5). > > This should be fixed in SVN shortly.... Yeess! My little script can handle everything I've thrown at it now. It can read a 10 GB raw file, strip the top 16 bits, rearrange pixels, byte swap, and write it all back to a 5 GB file in 16 minutes flat. Not bad at all. And I've verified that the output is correct ... If someone can explain the rules of engagement for Lightning Talks, I'm thinking about presenting this at SciPy 2006. Then you'll see there is a reason for my madness. As an aside, the developer pages could use some polish on explaining the different svn areas, and how to get what one wants. An svn checkout as described on the page gets you the 1.1 branch that DOES NOT have the updated memmap fix. After a minute or two of exploring, I found that "svn co http://svn.scipy.org/svn/numpy/branches/ver1.0/numpy numpy" got me what I wanted. Thanks for your help and the quick solution. FWIW, I got my copy of the book a couple of weeks ago; very nice. Mike -- mik...@al... |
From: Karol L. <kar...@kn...> - 2006-07-25 06:59:41
|
On Tuesday 25 July 2006 02:36, Mike Ressler wrote: > I'm trying to work with memmaps on very large files, i.e. > 2 GB, up to 10 > GB. The files are data cubes of images (my largest is > 1290(x)x1024(y)x2011(z)) and my immediate task is to strip the data from > 32-bits down to 16, and to rearrange some of the data on a per-xy-plane > basis. I'm running this on a Fedora Core 5 64-bit system, with > python-2.5b2(that I believe I compiled in 64-bit mode) and > numpy-1.0b1. The disk has 324 GB free space. > > The log from a minimal case is as follows: > > ressler > python2.5 > Python 2.5b2 (r25b2:50512, Jul 18 2006, 12:58:29) > [GCC 4.1.1 20060525 (Red Hat 4.1.1-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numpy as np > >>> data=np.memmap('temp_file',mode='w+',shape=(2011,1280,1032),dtype='h') > > size = 2656450560 > bytes = 5312901120 > len(mm) = 5312901120 > (2011, 1280, 1032) h 0 0 > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/local/lib/python2.5/site-packages/numpy/core/memmap.py", line > 75, in __new__ > offset=offset, order=order) > TypeError: buffer is too small for requested array > > > If I have a small number of frames (z=800 rather than 2011), this all works > fine. I've added a few lines to memmap.py to print some diagnostic > information - the error occurs on line 71 in the original memmap.py file, > not 75. The "size =" and "bytes =" lines show that memmap.py is calculating > the correct size for the buffer, and the len(mm) shows that the python > mmap.mmap call on line 67 is returning a buffer of the correct size. The > "(2011, 1280, 1032) h 0 0" bit is from a print statement that was left in > the source file by the authors, and indicates what the following "self = > ndarray.__new__" call is trying to do. However, it is the ndarray.__new__ > call that is breaking down, and I don't really have enough skill to > continue chasing it down. I took a quick look at the C source, but I > couldn't figure out where the ndarray.__new__ is actually defined. > > Any suggestions to help me past this? Thanks. > > Mike I know Travis has nswered in a different thread. Let me jsut add where the actual error is raised - maybe it will be of some use. It is around line 5490 of arrayobject.c (procedure array_new): else { /* buffer given -- use it */ if (dims.len == 1 && dims.ptr[0] == -1) { dims.ptr[0] = (buffer.len-(intp)offset) / itemsize; } else if ((strides.ptr == NULL) && \ buffer.len < itemsize* \ PyArray_MultiplyList(dims.ptr, dims.len)) { PyErr_SetString(PyExc_TypeError, "buffer is too small for " \ "requested array"); goto fail; } So it does look like an overflow to me. Karol -- written by Karol Langner wto lip 25 08:56:42 CEST 2006 |
From: Mike R. <mik...@gm...> - 2006-07-26 15:52:18
|
My apologies if this is a duplicate - my first attempt doesn't seem to have gone back to the list. ---------- Forwarded message ---------- From: Mike Ressler <mik...@al...> Date: Jul 25, 2006 12:17 PM Subject: Re: ***[Possible UCE]*** [Numpy-discussion] Bug in memmap/python allocation code? To: Travis Oliphant <oli...@ie...> Cc: Num...@li... On 7/24/06, Travis Oliphant <oli...@ie...> wrote: > Mike Ressler wrote: > > I'm trying to work with memmaps on very large files, i.e. > 2 GB, up > > to 10 GB. Can't believe I'm really the first, but so be it. I just discovered the problem. All the places where > PyObject_As<Read/Write>Buffer is used needs to have the final argument > changed to Py_ssize_t (which in arrayobject.h is defined as int if you > are using less than Python 2.5). > > This should be fixed in SVN shortly.... Yeess! My little script can handle everything I've thrown at it now. It can read a 10 GB raw file, strip the top 16 bits, rearrange pixels, byte swap, and write it all back to a 5 GB file in 16 minutes flat. Not bad at all. And I've verified that the output is correct ... If someone can explain the rules of engagement for Lightning Talks, I'm thinking about presenting this at SciPy 2006. Then you'll see there is a reason for my madness. As an aside, the developer pages could use some polish on explaining the different svn areas, and how to get what one wants. An svn checkout as described on the page gets you the 1.1 branch that DOES NOT have the updated memmap fix. After a minute or two of exploring, I found that "svn co http://svn.scipy.org/svn/numpy/branches/ver1.0/numpy numpy" got me what I wanted. Thanks for your help and the quick solution. FWIW, I got my copy of the book a couple of weeks ago; very nice. Mike -- mik...@al... -- mik...@al... |
From: Robert K. <rob...@gm...> - 2006-07-27 14:08:40
|
Mike Ressler wrote: > My apologies if this is a duplicate - my first attempt doesn't seem to > have gone back to the list. SF if being nasty with GMail. I'll have to speed up moving the list to scipy.org. > If someone can explain the rules of engagement for Lightning Talks, I'm > thinking about presenting this at SciPy 2006. Then you'll see there is a > reason for my madness. Unfortunately, we only have scheduled 30 minutes of lightning talks this year. We have twice as many full talks as we did last year. We'll probably only get about 5 or 6 lightning talks clocking in at 5 minutes, tops. In the opening remarks on the first day, we'll tell people to come talk to us (and by "us," I mean "Travis Vaught") during the break and tell us that they want to do a lightning talk about "Foo." > As an aside, the developer pages could use some polish on explaining the > different svn areas, and how to get what one wants. An svn checkout as > described on the page gets you the 1.1 branch that DOES NOT have the > updated memmap fix. After a minute or two of exploring, I found that > "svn co http://svn.scipy.org/svn/numpy/branches/ver1.0/numpy > <http://svn.scipy.org/svn/numpy/branches/ver1.0/numpy> numpy" got me > what I wanted. Grr. That means developers are not merging changes appropriately. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |