From: Jeff Dike <jdike@ka...> - 2002-02-01 06:06:28
There is a SMP SIGIO handling bug in Linux (on the host) that is causing
UML IO to hang. This is most prevalent with disk IO, but can be seen
with network traffic as well.
The basic problem is that SIGIO delivery and F_SETOWN are not protected
by the same lock. This means that SIGIO can be delivered to a process
after it has returned from the fcntl that passed the SIGIO off to another
There is nothing I can do in UML to fix this. There is a possible workaround
but it only narrows the race, it doesn't close it. The disk IO problem
was made worse by my switching from a pts device to a socket for communication
between the IO thread and UML. If you build UML from source, you can
greatly alleviate this problem by switching back. In
arch/um/drivers/ubd_user.c, there are two versions of start_io_thread.
The one that's ifdef-ed out (contrary to the claim I made in the -10 changelog)
is the comparatively reliable pts one. Ifdef out the socket one above it,
remove the ifdef around the pts one, rebuild, and UML will work a lot better.
It may not work perfectly. The bug will still be there and you may still
be able to exercise it.
I'll be working up a patch to the host and I'll make it available from
the site when it's ready.
Thanks to jbearce for figuring out which UML patch apparently introduced
the problem and confirming which part of that patch did it. Also, thanks
to blinky who let me stare at the problem on his box tonight which let
me figure out what was going on.