Thread: [uml-devel] Review needed for ubd fixes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I've spent some time working on the ubd and AIO problems that have cropped up 
recently.  Patches are attached - a critical look at them would be appreciated.

I'm going to start with a problem that hasn't exactly cropped up, and move on
to the other ones from there.

That is that the ubd driver must issue I/O requests to the host even if there
is no memory available.  It must make progress, because the request for which
it can't get memory might be the swapping that will free some up.

The driver allocates memory for several purposes -
	a header-type buffer per request which holds the bitmap data
	one aio request per chunk - a request may be broken into chunks because
of COWing - some pieces from the backing file, some from the COW file

To allow the driver to always make progress, even when no memory is available,
I added a static instance of each buffer and made the kmalloc calls atomic.  
When kmalloc fails, then the static buffer is used.  When it is in use, the
user acquires a semaphore, which is released in the interrupt handler when the
buffer is freed.  This implies that future I/O submissions will sleep on the
semaphore, so we have changed the reason for sleeping rather than eliminating 
it.

Any I/O submissions which need memory will sleep on the semaphore until the
buffer is released, resulting in synchronous, one at a time, I/O submission
until the memory shortage has cleared up.

Now that it is established that we must sleep, the next problem is making sure
the requests get submitted to the host in the order that they were queued to
the driver.  Because we must sleep, we must drop the queue spinlock, which opens
the queue to any other processes that want to do I/O.  One of these can queue
to the allocation semaphore (or just kmalloc memory now that some is available)
and sneak ahead of a process handling an earlier request.

To solve this, I add two atomic variables and a wait queue.  One of the 
variables, started, counts the number of requests that have been submitted 
to the driver and the other, submitted, counts the number that have been 
submitted to the host.  started is incremented before any submissions, submitted
is incremented after all submissions for the request.

The driver gets the value of started and waits for submitted to catch up to
it.  When that happens, it is now that requests's turn, and it wakes up from 
the sequence wake queue.

I believe these three fixes - dropping the spinlock around host I/O submission,
adding the static buffers, and sequencing requests - cover the scheduling while
atomic problem and associated problems.

Next is a deadlock that I found during a kernel build loop.  The AIO thread
seems to be able to fill the pipe back to UML with I/O completions.  When this
happens, it sleeps in a write and stops calling io_getevents.  If UML keeps
submitting events, the AIO ring buffer will fill, and io_submit will start
returning -EAGAIN.  At this point, UML will try to write the error to itself
through the same pipe that the AIO thread has filled.  It will now sleep.
Since interrupts are off, that pipe can never be emptied since it is handled
by the driver's interrupt routine, and we are deadlocked.

Enabling interrups during the submission seems to help, but doesn't eliminate
the deadlock.  To fix it for real, I now have the -EAGAIN returned to the 
driver, which sleeps on a wait queue which is woken up by the interrupt 
routine.  Interrupts are enabled, as they must be during sleep, so the handler
can wake the queue after it has processed some completions, and the driver
can retry.

As for the interactions between the wait queues and the semaphores - the 
sequence wait queue controls access to everything else.  Once a request is
past that, it can only deadlock with itself.  The allocation semaphore is used
by a finishing request to wake up the next one.  The eagain wait queue is taken
in a loop with no sleeping on anything else.

The patches are attached.  They are:

ubd-drop-lock - drops the ubd_io_lock around the call to do_io and takes it
before dequeuing the next request.  It also removes the call of do_ubd_request
from the interrupt handler since the request handler now processes the entire 
queue in one call.

ubd-atomic - makes the kmalloc calls atomic and adds the static buffers, 
associated semaphores and related synchronization.  Unfinished - it doesn't
have a static version of the bitmap, since that is variable length, so it's not
so easy to have a static copy of it.

ubd-sequence - makes the requests go out in the right order.  Adds the sequence
wait queue which controls access to the actual submission code.

ubd-eagain - the -EAGAIN handling - waits in the eagain wait queue to be woken
up by the interrupt handler after it has processed some completions.

These are all against 2.6.14-rc1-mm1.  They should apply to 2.6.14-rc1, but I
haven't checked that.

				Jeff

P.S. exmh insisted on a crazy attachment order - the order is the one listed
above, not the order of attachment.

Thread: [uml-devel] Review needed for ubd fixes

user-mode-linux-devel