From: Andrew P. M. <amu...@ze...> - 2000-02-09 07:19:13
|
> Travis Oliphant writes: > > > > 3) Facility for memory-mapped dataspace in arrays. > > For the NumPy users who are as ignorant about mmap, msync, > and madvise as I am, I've put a couple of documents on > my web site: I have Kevin's "Why Aren't You Using mmap() Yet?" on my site. Kevin is working on a new (11th anniversary edition? 1xth anniversary edition?). By the way, Uresh Vahalia's book on Unix Internals is a very good idea for anyone not yet familiar with modern operating systems, especially Unices. Kevin is extremely knowledgable on this subject, and several others. > Executive summary: > > i) mmap on Solaris can be a very big win Orders of magnitude. > (see bottom of > http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when > used in combination with WILLNEED/WONTNEED madvise calls to > guide the page prefetching. And with the newer versions of Solaris, madvise() is a good way to go. madvise is _not_ SVR4 (not in SVID3) but it _is_ in the OSF/1 AES which means it is _not_ vendor specific. But the standard part of madvise is that it is a "hint". However everything it actually _does_ when you hint the kernel with madvise is specific usually to some versions of an operating system. There are tricks to get around madvise not doing everything you want (WONTNEED didn't work in Solaris for a long time. Kevin found a trick that worked really well instead. Kevin knows people at Sun, since he was one of the very earliest employees there, and so now the trick Kevin used to suggest has now been found to be the implementation of WONTNEED in Solaris.) And that trick is well worth understanding. It happens that msync() is a good call to know. It has an undocumented behavior on Solaris that when you msync a memory region with MS_INVALIDATE | MS_ASYNC, what happens is the dirty pages are queued for writing and backing store is available immediately, or if dirty, as soon as written out. This means that the pager doesn't have to run at all to scavenge the pages. Linux didn't do this last time I looked. I suggested it to the kernel guys and the idea got some positive response, but I don't know if they did it. > ii) IRIX and some other Unices (Linux 2.2 in particular), haven't > implemented madvise, and naive use of mmap without madvise can produce > lots of page faulting and much slower io than, say, asynchronous io > calls on IRIX. (http://www.geog.ubc.ca/~phil/mmap/msg00009.html) IRIX has an awful implementation of mmap. And SGI people go around badmouthing mmap; not that they don't have cause, but they are usually very surprised to see how big the win is with a good implementation. Of course, the msync() trick doesn't work on IRIX last I looked, which leads to the SGI people believing that mmap() is brain damaged because it runs the pager into the ground. It's a point of view that is bound to come up. HP/UX was really wacked last time I looked. They had a version (10) which supported the full mmap() on one series of workstations (700, 7000, I forget, let's say 7e+?) and didn't support it except in the non-useful SVR3.2 way on another series of workstations (8e+?). The reason was that the 8e+? workstations were multiprocessor and they hadn't figured out how to get the newer kernel flying on the multiprocessors. I know Konrad had HP systems at one point, maybe he has the scoop on those. > So I'd love to see mmap in Numpy, but we may need to produce a > tutorial outlining the tradeoffs, and giving some examples of > madvise/msync/mmap used together (with a few benchmarks). Any mmap > module would need to include member functions that call madvise/msync > for the mmapped array (but these may be no-ops on several popular OSes.) I don't know if you want a separate module; maybe what you want is the normal allocation of memory for all Numerical Python objects to be handled in a way that makes sense for each operating system. The approach I took when I was writing portable code for this sort of thing was to write a wrapper for the memory operation semantics and then implement the operations as a small library that would be OS specific, although not _that_ specific. It was possible to write single source code for SVID3 and OSF/AES1 systems with sparing use of conditional defines. Unfortunately, that code is the intellectual property of another firm, or else I'd donate it as an example for people who want to learn stuff about mmap. As it stands, there was some similar code I was able to produce at some point. I forget who here has a copy, maybe Konrad, maybe David Ascher. Later, Andrew Mullhaupt |