From: Jeff D. <jd...@ad...> - 2006-03-28 19:44:14
|
On Tue, Mar 28, 2006 at 12:32:30AM +0200, Blaisorblade wrote: > There are tons of places where spinlocks are used improperly. We need > stricter locking, however that is complicated by the addition of atomic > sections where UML can't sleep. > > For instance, sigio_lock needs to be irqsave (I'll test the patch for this > shortly). On the other side, uml_console_write shouldn't take any lock, it > seems (see Documentation/tty.txt). Yeah, the locking needs work. We just need to make an SMP pass over the arch to clean this up. Jeff |
From: Blaisorblade <bla...@ya...> - 2006-03-29 02:48:30
Attachments:
ubd-sequence
|
I'm CC:ing this to uml-devel, finally. On Wednesday 29 March 2006 02:41, Jeff Dike wrote: > On Wed, Mar 29, 2006 at 12:40:00AM +0200, Blaisorblade wrote: > > Since you've now split out delete-hostfs, why don't you merge the new > > hostfs, possibly labeling it as "EXPERIMENTAL"? > My two main gripes right now are > =A0=A0=A0=A0=A0=A0=A0=A0there are three filesystems (or one framework and= two > filesystems) in one directory That's not a big problem, but is one which we'll solve after having users. > =A0=A0=A0=A0=A0=A0=A0=A0it depends on filehandle, which has aspects that = neither of us likes We're working on this too slowly. Give me bugreports by users and filehandle in mainline and I'll probably fi= x=20 it. Right now, I want to do but I'm worried about other stuff (I should go= =20 now back working on remap_file_pages). Probably I'll be able to work on bot= h,=20 since I can't work on that more than a certain amount of time. > > With the refcount, likely you don't need to move it off-list, but that > > could maybe be useful to avoid looping on unused fd; however, this > > requires taking the spinlock when reinserting the element on the list. > I think you're agreeing with me, but I'm missing why we want to move > things on and off the list. I just said "we could do that so when do do reclaim we don't iterate on fds= =20 which are held somewhere", but IMHO the other side has better reasons to be= =20 preferred (i.e. possibly skipping taking the lock when reputting it on list= ). > > Note: testing that a fd has refcount 0 must be done atomically with > > freeing it; i.e. even with an atomic_t refcount, it must be incremented > > only while you have a lock on the list;=20 > > and while freeing it,=20 I'd have to say "while closing it". But we don't close fd's when they reach= =20 refcount 0, so in that case we can probably avoid taking the lock. Instead, when closing an fd, we must still take the lock, move it off-list,= =20 drop the lock and close the fd. > > you must=20 > > use > > atomic_dec_and_lock() so you get a lock on the list if the refcount goes > > to 0 (atomic_dec_and_lock() is equivalent to taking the lock, doing dec > > and test, and releasing it, but is faster). > Yes, this looks reasonable. I've been especially careful because after studying Operating Systems=20 (previous semester) I noticed a bug in the description of your ubd_sequence= =20 patch (against 2.6.14-rc1-mm1, sent as attachment in a mail); I don't recal= l=20 the details, including names, but I remember the bug. I've described it=20 below, but now I've discovered that it doesn't apply given the nature of th= e=20 wait_queue's in Linux. I've not had the courage to press Del... However, while looking at this, I think I've found all sort of race conditi= ons=20 and locking problems in the ubd driver. Saving what I see in a patch, I'll look well at this later, but however I=20 think I've seen tons of problems. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D I decided not to describe it at that time because you dropped that=20 implementation, but I'm doing it now below (maybe because I'm silly). There were two atomic_t (started_req and completed_req) which were seq.=20 counts, and one wait_list - one task would hang onto a wait list until=20 completed_req reached start_req and it was waken up. Well, there was a bug since there's no semaphore in the design - task A, wh= ich=20 submits requests, could check that started_req > completed_req (they're=20 atomic but reading the couple of them is not atomic - for it getting=20 started_req < completed_req was perfectly possible), then task B could=20 increment completed_req and send the wakeup to nobody, then task A could go= =20 to sleep to be never waken up. *** In Linux, however, wait_event() puts the task on the wait queue, so that it= =20 can catch wakeups, tests the condition, and calls schedule(); since it's on= =20 the wait_queue before testing the condition, I assume that wakeups done whi= le=20 it's still running after testing the condition are caught. *** Actually, depending on the details, since task A submits all requests and i= s=20 sleeping, started_req could never increase again, and it would never be=20 awakened; otherwise it would be waken up later if somehow there were other= =20 started and completed requests causing another wake-up. You need, indeed, the check on the condition and the sleep to be atomic, i.= e.=20 to protect them with a semaphore; actually in Linux they use a spinlock whi= ch=20 they manage to drop, when they are on the queue, have decided to go to slee= p=20 but haven't yet gone to. =2D-=20 Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade |
From: Jeff D. <jd...@ad...> - 2006-03-29 04:35:55
|
On Wed, Mar 29, 2006 at 12:40:00AM +0200, Blaisorblade wrote: > Since you've now split out delete-hostfs, why don't you merge the new hostfs, > possibly labeling it as "EXPERIMENTAL"? My two main gripes right now are there are three filesystems (or one framework and two filesystems) in one directory it depends on filehandle, which has aspects that neither of us likes > Btw, there's a ton of other patches which I don't see reasons for not > merging like Al Viro's cleanups (I've given a look to them and they > seem safe). Just sent in today. > And is punctuaction_fixes likely to cause instability? No, but I recall that you didn't really like it. > Finally, I'd like to get devshm merged if there aren't problems with the > code. That one needs to check if /dev/shm is present, and fall back to /tmp if not. > Don't know how do you implemented it this time, but since we had special > read/write functions for filehandles, the get_fh() and put_fh() could be > simply put in their body. Has this changed? No, but you didn't like those (and I agree) because of the extra layering they added. > > In this case, get_fh should just increment a count, put_fh should > > decrement it, and the only list operation should be to move it to the > > end of the list so it's last to be reclaimed. The reclaimer would not > > reclaim filehandles with non-zero counts. I see no point in removing > > it from the list and adding it back, as that seems not to protect > > against anything. > > You need a spinlock on the list. With the non-reference-counted approach, > removing that from the list allowed dropping the spinlock over the I/O call > on the host. That's not needed with the refcount. > > With the refcount, likely you don't need to move it off-list, but that could > maybe be useful to avoid looping on unused fd; however, this requires taking > the spinlock when reinserting the element on the list. I think you're agreeing with me, but I'm missing why we want to move things on and off the list. > Instead, with the refcount, if you decrease to 0 the refcount of FH_1 while a > reclaim loop is iterating over FH_1, you only risk that FH_1 is missed on > that reclaim pass, which isn't a race. Yup. > Note: testing that a fd has refcount 0 must be done atomically with freeing > it; i.e. even with an atomic_t refcount, it must be incremented only while > you have a lock on the list; and while freeing it, you must use > atomic_dec_and_lock() so you get a lock on the list if the refcount goes to 0 > (atomic_dec_and_lock() is equivalent to taking the lock, doing dec and test, > and releasing it, but is faster). > > get_fh() { > spin_lock(&list_lock); > <iterate on list> > atomic_inc(fd->count); //while still holding the lock!! > if (atomic_read(fd->count) <= 0) > BUG(); > spin_unlock(&list_lock); > } > > put_fh(pointer to fd on list "fd") { > if(atomic_dec_and_lock(fd->count, &list_lock)) { > //Here we have the lock! > //Remove the thing from list and free it > } > } Yes, this looks reasonable. > > You're envisioning the os_* interfaces calling back into filehandle.c > > to get a descriptor if needed? > > To cause fd reclaim, that's what I mean (not sure if you meant the > same). Yes. > I suggested last time to rename this as "make_reclaimable" (or > set_reclaimable), is_reclaimable() sounds as an interrogation > method. Fine by me. |
From: Anthony B. <br...@st...> - 2006-03-29 20:38:12
|
Quoting Jeff Dike <jd...@ad...>: > On Wed, Mar 29, 2006 at 12:40:00AM +0200, Blaisorblade wrote: >> Finally, I'd like to get devshm merged if there aren't problems with the >> code. > > That one needs to check if /dev/shm is present, and fall back to /tmp > if not. Are you proposing that virtual servers automatically using /dev/shm if available and only falling back to the $TMP variable and /tmp directory if not? Or would this keep the use of $TMP if present and fall back to /dev/shm followed by /tmp? I'm asking because I've optimized multiple host machines to offer a ramfs for virtual servers. This gives me locked memory (which still provides a significant performance improvement over time) for each instance. I would prefer to avoid using tmpfs if possible. Alternatively, some means for preventing the host from swapping out instances would be appreciated. Tony |
From: Blaisorblade <bla...@ya...> - 2006-03-30 21:04:14
|
On Wednesday 29 March 2006 22:38, Anthony Brock wrote: > Quoting Jeff Dike <jd...@ad...>: > > On Wed, Mar 29, 2006 at 12:40:00AM +0200, Blaisorblade wrote: > >> Finally, I'd like to get devshm merged if there aren't problems with the > >> code. > > That one needs to check if /dev/shm is present, and fall back to /tmp > > if not. > Are you proposing that virtual servers automatically using /dev/shm if > available and only falling back to the $TMP variable and /tmp directory > if not? Or would this keep the use of $TMP if present and fall back to > /dev/shm followed by /tmp? By looking at: http://user-mode-linux.sourceforge.net/work/current/2.6/2.6.16/patches/devshm > I'm asking because I've optimized multiple host machines to offer a > ramfs for virtual servers. This gives me locked memory (which still > provides a significant performance improvement over time) for each > instance. I would prefer to avoid using tmpfs if possible. > Alternatively, some means for preventing the host from swapping out > instances would be appreciated. ramfs is the better way IMHO. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com |
From: Blaisorblade <bla...@ya...> - 2006-03-30 21:30:17
|
On Thursday 30 March 2006 23:03, Blaisorblade wrote: > On Wednesday 29 March 2006 22:38, Anthony Brock wrote: > > Quoting Jeff Dike <jd...@ad...>: > > > On Wed, Mar 29, 2006 at 12:40:00AM +0200, Blaisorblade wrote: > > >> Finally, I'd like to get devshm merged if there aren't problems with > > >> the code. (Jeff wrote): > > That one needs to check if /dev/shm is present, and fall back to /tmp > > if not. I'd add that it needs to retry even if the PROT_EXEC test fails: *) if TMP, TEMP or TMPDIR are set, use that and fail if the check fails, abort, because the user wanted to use that but forgot to change permissions. *) if instead we pick /dev/shm but it has not PROT_EXEC, we should switch to /tmp and fail only then. > > Are you proposing that virtual servers automatically using /dev/shm if > > available and only falling back to the $TMP variable and /tmp directory > > if not? Or would this keep the use of $TMP if present and fall back to > > /dev/shm followed by /tmp? > By looking at: > http://user-mode-linux.sourceforge.net/work/current/2.6/2.6.16/patches/devs >hm Sorry, forgot to finish. That patch keeps checking TMP,TMPDIR and TEMP (don't remember the order), but if none is set, it switches the default from /tmp to /dev/shm. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: Anthony B. <br...@st...> - 2006-03-31 06:54:15
|
> On Thursday 30 March 2006 13:30, Blaisorblade wrote: > Sorry, forgot to finish. That patch keeps checking TMP,TMPDIR and > TEMP (don't > remember the order), but if none is set, it switches the default > from /tmp > to /dev/shm. This makes sense. It does appear that the patch first checks the environmental variable before using the defaults. While I would like to take advantage of the neat feature Jeff is working on (hot plug memory), I have a greater need for consistent and reliable speed. Thanks! Tony |