From: Perry E. Metzger <perry@pi...> - 2004-03-27 18:06:28
I have been working a bit on finishing porting SBCL to NetBSD now that
NetBSD has working siginfo support.
In doing so, I noted the mmap "trick" to zero large chunks of memory
-- i.e. unmapping and remapping chunks of memory.
Let me note that this is NOT actually a good idea. Here is why.
It is true that this produces zeroed pages for you -- but the kernel
has no magic way to zero pages, any more than userland does. To
deliver zeroed pages, the kernel has to bzero/memset them using the
exact same code that is used in userland on most platforms.
On the minus side, playing this trick requires that the system go
through multiple user/kernel context switches, which are very
expensive, and on platforms that do lazy zeroing, it will result in
multiple page faults, which are also expensive.
As a result, I am not sure there will be a significant efficiency gain
from it on systems I'm aware of. The one possible exception is if your
machine does something like zeroing pages in the idle loop -- to my
knowledge, only NetBSD has ever done this, and we turned it off
because for various reasons it turned out to be a net lose for the
Anyway, are there benchmarks demonstrating this is a real win on real
platforms? If not, I would suggest using libc's memset, which on gcc
will be inlined on most modern platforms, with a likely substantial
win, and would be much simpler from a logic point of view, too.
Perry E. Metzger perry@...
From: Perry E. Metzger <perry@pi...> - 2004-03-27 20:59:11
Daniel Barlow <dan@...> writes:
>> In doing so, I noted the mmap "trick" to zero large chunks of memory
>> -- i.e. unmapping and remapping chunks of memory.
> I played with this a little when I was last hacking the gc, but didn't
> manage to create a test case that showed anything interesting either
> way (on x86 linux).
That is what I would expect if the region was sufficiently large. The
smaller the region, the fewer the pages to amortize the cost of
dealing with the kernel switch + vm magic over, so I'd expect it would
get worse as the region got smaller and smaller. I find it hard to
believe it would ever actually be better than memset/bzero...
> I'd expect (theorising wildly) that the kernel could potentially map
> all the zeroed pages to the same 4k (or whatever) block of memory, and
> would only need to zero more pages when something later wrote into a
> zeroed page - so the cost of clearing memory is amortized over a
> longer period, and may even be skipped altogether if the memory is not
> reused. I have no idea whether this is commonly implemented or makes
> any significant difference if so, though.
That's not how any OS I know of is implemented. You're describing a
sort of copy on write scheme, of course, and if one did that, the cost
of the page faults etc. would not be insignificant -- probably in most
cases comparable in itself to the length of time needed to zero a page
>> On the minus side, playing this trick requires that the system go
>> through multiple user/kernel context switches, which are very
>> expensive, and on platforms that do lazy zeroing, it will result in
>> multiple page faults, which are also expensive.
> Um, um. Is "lazy zeroing" what i just described,
No. For that, you simply fail to map anything at all until the page is
accessed. There's no copy on write -- there is a straight page fault,
just the way that demand paging works. Again, the cost of fielding the
page fault is in the thousands of cycles on most platforms.
> and if so, how /should/ we arrange matters to take advantage of it?
You can't "take advantage of it" -- its worse performing than just
zeroing things yourself, so there is nothing to take advantage of. :)
>> Anyway, are there benchmarks demonstrating this is a real win on real
>> platforms? If not, I would suggest using libc's memset, which on gcc
>> will be inlined on most modern platforms, with a likely substantial
>> win, and would be much simpler from a logic point of view, too.
> Simpler is good.
Perry E. Metzger perry@...