On Wednesday 28 September 2005 21:25, Young Koh wrote:
> On 9/28/05, Jeff Dike <jdike@...> wrote:
> > On Wed, Sep 28, 2005 at 09:47:41AM -0400, Young Koh wrote:
> > No, just that page is bad.
Again, that page is not bad. There is no page yet for this address, and the
host won't allocate one for now.
> > Another page could have been dirtied and thus
> > allocated on the host, and it would be usable. So, it's not a fatal
> > problem.
Ok, this makes a bit of sense, even if IMHO it doesn't work, I now see your
point (but I still insist with what said above).
However, even a dirtied page could be "bad", if it has been swapped. If we're
getting a SIGBUS, it meant that it didn't succeed in freeing any memory.
And, frankly, unless the UML ram file is kept on ramfs (which is RAM-only), it
can be swapped (both for disk-based filesystem and for tmpfs).
So, I don't think what you suggest could work.
> then, if just that page is bad, shouldn't UML kernel wait until
> another page is usable (or force another page to be swapped out)
We're talking about the host, and if the host is on OOM, you can't help it.
The best you can do is to reclaim cache memory. But that must be done when the
host is starting to swap, not at SIGBUS time.
> allocate the free page? and proceed normal? i may be still confused.
Jeff's point is that once we have dirtied a page, and the host hasn't yet
swapped it, it could be in memory - and accessing it would work. Having
dirtied it means we allocated it with (say) kmalloc and then dirtied it.
> Ok, my thought/idea/suggestion is that what if UML uses a TLB-like
> table before it does the address translation?
> i mean, once there is a
> valid mapping, UML inserts the address mapping(or page mapping) into a
> software TLB. after that, for that userspace address, UML can search
> the software TLB table and use the mapping without calling sigsetjmp()
> and walking through page tables.
> it seems that sigsetjmp() has
> relatively large overhead, we could reduce some overhead by not
> calling it.
How do you measure it? I'm curious myself - I know there's the possibility to
use gprof, but I've never used that myself.
Surely the "sig" thing is heavy (some syscalls like sigprocmask() for
blocking/unblocking signals). Or better, it's the only heavy thing - the rest
consists only of saving a couple of registers (6, IIRC) in memory, there's no
interest in optimizing it away (I assume).
But that part will go away - there's a "softints" patch for this (i.e. moving
the blocking/unblocking to userspace - the signal handler notices the signal
is stopped and queues the handling via userspace mechanisms). It's at:
> but surely the problem is the mapping can go corrupted.
> for that, UML may invalidate a TLB entry if the corresponding page is
> swapped out or any change is made. what do you think?
In short, it's a really interesting idea.
The kernel (arch-independent) infrastructure for this exists, for managing the
real TLBs. You already need to invalidate TLB entries when you swap a page and
such. Reading Documentation/cachetlb.txt is definitely worth the time spent.
And actually, currently that is used to update the host mappings.
Using TLBs to save the page table walk is interesting, especially since that
would avoid taking a spinlock on SMP (the current implementation of
maybe_map() doesn't, but it should), and more important because the TLBs would
likely be hotter than all the page tables, so it would probably fit in the L2
cache (while, to walk page tables, we're likely going to have a L2 miss -
they're too big). Hotter means "more likely to be accessed, and thus more
worth to keep in cache". The cache usage discussion in Reiser4 whitepaper
(www.namesys.com) is really enlightening on this point.
Don't know if we can optimize the locking on the TLBs, though - we could use
maybe atomic ops, or have per-processor TLBs (which is the way it's
implemented in hardware - you get IPIs on flush, though, and on i386
atomic_read and atomic_set have no additional cost over non-atomic
counterparts. So maybe shared TLBs are ok - they'd need to be tagged,
I've not yet thought about an efficient data structure - an array means that
invalidation checks each entry, and I'd like to avoid that. The other way is
to empty the TLB on flushing a single entry.
The only problem is when there is a fault on the kernelspace address.
However, we may implement some checking, if setjmp() is still costly:
only kernelspace addresses upper than TASK_SIZE are valid (or something of
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB