From: <kr...@po...> - 2002-02-18 20:07:26
|
(I thought I had sent this mail on January 30, but I guess I was mistaken.) Eric Nodwell writes: > Since I have a 2.4GB data file handy, I thought I'd try this > package with it. (Normally I process this data file by reading > it in a chunk at a time, which is perfectly adequate.) Not > surprisingly, it chokes: Yep, that's pretty much what I expected. I think that adding code to support mapping some arbitrary part of a file should be fairly straightforward --- do you want to run the tests if I write the code? > File "/home/eric/lib/python2.2/site-packages/maparray.py", line 15, > in maparray > m = mmap.mmap(fn, os.fstat(fn)[stat.ST_SIZE]) > OverflowError: memory mapped size is too large (limited by C int) This error message's wording led me to something that was *not* what I expected. That's a sort of alarming message --- it suggests that it won't work on >2G files even on LP64 systems, where longs and pointers are 64 bits but ints are 32 bits. The comments in the mmap module say: The map size is restricted to [0, INT_MAX] because this is the current Python limitation on object sizes. Although the mmap object *could* handle a larger map size, there is no point because all the useful operations (len(), slicing(), sequence indexing) are limited by a C int. Horrifyingly, this is true. Even the buffer interface function arrayfrombuffer uses to get the size of the buffer return int sizes, not size_t sizes. This is a serious bug in the buffer interface, IMO, and I doubt it will be fixed --- the buffer interface is apparently due for a revamp soon at any rate, so little changes won't be welcomed, especially if they break binary backwards compatibility, as this one would on LP64 platforms. Fixing this, so that LP64 Pythons can mmap >2G files (their birthright!), is a bit of work --- probably a matter of writing a modified mmap() module that supports a saner version of the buffer interface (with named methods instead of a type object slot), and can't be close()d, to boot. Until then, this module only lets you memory-map files up to two gigs. > (details: Python 2.2, numpy 20.3, Pentium III, Debian Woody, Linux > kernel 2.4.13, gcc 2.95.4) My kernel is 2.4.13 too, but I don't have any large files, and I don't know whether any of my kernel, my libc, or my Python even support them. > I'm not a big C programmer, but I wonder if there is some way for > this package to overcome the 2GB limit on 32-bit systems. That > could be useful in some situations. I don't know, but I think it would probably require extensive code changes throughout Numpy. -- <kr...@po...> Kragen Sitaker <http://www.pobox.com/~kragen/> The sages do not believe that making no mistakes is a blessing. They believe, rather, that the great virtue of man lies in his ability to correct his mistakes and continually make a new man of himself. -- Wang Yang-Ming |