There appears to be a problem in the function
lu_getslot() in file kernel/Linux/2.4/proc.c, which
becomes evident if LUFS is used to implement a
filesystem that performs potentially time-consuming
network operations.
This function first appears to try to find a slot that
was previously used for the same PID. It seems to
expect that this will always be unlocked - though there
appears to be a scenario to do with inode refreshing
where it gets invoked re-entrantly and the "oops! I
still hold the lock!" message appears. This is handled
OK though.
The problem arises when no unlocked slot for the same
PID was found, and the first entry for another PID
(nd_best) is taken. If that is already locked and
performing a slow operation, the current operation will
be blocked, rather than taking another slot further
down the list that might be free. Furthermore, because
slots aren't rotated to the tail of the list until they
have been locked, all subsequent operations will also
wait on the same slot until its operation completes.
I think that fixing this requires the loop that sets
nd_best to try locking it, remembering whether this
succeeds and trying again with subsequent slots if not,
hence choosing an unlocked slot in preference to a
locked one. Then if a slot with matching PID is found
and taken in preference, it can be unlocked again.
However I'm not 100% sure that I've found the best fix
so I haven't attempted it as yet.
Logged In: YES
user_id=2063
What would be the effects for the end user? What kind of
things happen when the bug happens?