Daniel Barlow <dan@...> writes:
>> In doing so, I noted the mmap "trick" to zero large chunks of memory
>> -- i.e. unmapping and remapping chunks of memory.
> I played with this a little when I was last hacking the gc, but didn't
> manage to create a test case that showed anything interesting either
> way (on x86 linux).
That is what I would expect if the region was sufficiently large. The
smaller the region, the fewer the pages to amortize the cost of
dealing with the kernel switch + vm magic over, so I'd expect it would
get worse as the region got smaller and smaller. I find it hard to
believe it would ever actually be better than memset/bzero...
> I'd expect (theorising wildly) that the kernel could potentially map
> all the zeroed pages to the same 4k (or whatever) block of memory, and
> would only need to zero more pages when something later wrote into a
> zeroed page - so the cost of clearing memory is amortized over a
> longer period, and may even be skipped altogether if the memory is not
> reused. I have no idea whether this is commonly implemented or makes
> any significant difference if so, though.
That's not how any OS I know of is implemented. You're describing a
sort of copy on write scheme, of course, and if one did that, the cost
of the page faults etc. would not be insignificant -- probably in most
cases comparable in itself to the length of time needed to zero a page
>> On the minus side, playing this trick requires that the system go
>> through multiple user/kernel context switches, which are very
>> expensive, and on platforms that do lazy zeroing, it will result in
>> multiple page faults, which are also expensive.
> Um, um. Is "lazy zeroing" what i just described,
No. For that, you simply fail to map anything at all until the page is
accessed. There's no copy on write -- there is a straight page fault,
just the way that demand paging works. Again, the cost of fielding the
page fault is in the thousands of cycles on most platforms.
> and if so, how /should/ we arrange matters to take advantage of it?
You can't "take advantage of it" -- its worse performing than just
zeroing things yourself, so there is nothing to take advantage of. :)
>> Anyway, are there benchmarks demonstrating this is a real win on real
>> platforms? If not, I would suggest using libc's memset, which on gcc
>> will be inlined on most modern platforms, with a likely substantial
>> win, and would be much simpler from a logic point of view, too.
> Simpler is good.
Perry E. Metzger perry@...