From: Thomas H. <th...@tu...> - 2008-04-24 09:30:35
|
There is a possibility of a deadlock in the current Intel superioctl path, which can be illustrated by two contexts rendering simultaneously from the same textures, One context using fallback rendering, the other using the GPU. Context 1 will start mapping the texture buffers, Context 2 will take the cmdbuf mutex and start validating the same texture buffers. Now if they end up having the buffer lists reversed, context 1 might end up waiting to map a buffer that context2 has validated for the GPU, while context2 will wait for a buffer that context1 has mapped => deadlock. One way around this is to use the hardware lock around all buffer mapping (including client buffer object mapping) and command submission, I believe the old i915tex driver did this to a large extent. I'm not sure what the current i915 driver does. Anyway, we need a more fine-grained approach. So the idea will be for the execbuf ioctl to back off if it encounters a mapped buffer, something like the following "pre-validate" functionality to make sure all buffers are unmapped before we run the full validation cycle. block_other_validators(); while (1) { for (i=0; i<num_buffers; ++i) { if (buffer[i]->mapped) goto backoff; buffer[i]->validated = 1; } break; backoff: for (j=0; j<i; ++j) { buffer[j]->validated = 0; } unblock_other_validators(); err = wait_for_unmapped(buffer[i]); if (err) return err; block_other_validators(); /* * Restart from the beginning. */ } Unfortunately, pre-validation won't work unless post-relocs are used, which means the full validation cycle needs to be aborted and rerun if we encounter a mapped buffer; Typically if a DRI client needs to wait for the X server to unmap the front buffer after a software fallback. I'll at least try to fix this up in the post-reloc case. /Thomas |
From: Eric A. <er...@an...> - 2008-04-30 18:29:28
|
On Thu, 2008-04-24 at 11:30 +0200, Thomas Hellström wrote: > There is a possibility of a deadlock in the current Intel superioctl path, > which can be illustrated by two contexts rendering simultaneously from > the same textures, > One context using fallback rendering, the other using the GPU. > > Context 1 will start mapping the texture buffers, Context 2 will take > the cmdbuf mutex and start validating the same texture buffers. > Now if they end up having the buffer lists reversed, context 1 might end > up waiting to map a buffer that context2 has validated for the GPU, > while context2 will wait for a buffer that context1 has mapped => deadlock. > > One way around this is to use the hardware lock around all buffer > mapping (including client buffer object mapping) and command submission, > I believe the old i915tex driver did this to a large extent. I'm not > sure what the current i915 driver does. Anyway, we need a more > fine-grained approach. In master we hold the lock around execbuffer. Is getting multiple cpus in the validate path a bottleneck, really, where a finer-grained approach is needed? -- Eric Anholt anholt@FreeBSD.org er...@an... eri...@in... |
From: Thomas H. <th...@tu...> - 2008-04-30 18:54:13
|
Eric Anholt wrote: > On Thu, 2008-04-24 at 11:30 +0200, Thomas Hellström wrote: > >> There is a possibility of a deadlock in the current Intel superioctl path, >> which can be illustrated by two contexts rendering simultaneously from >> the same textures, >> One context using fallback rendering, the other using the GPU. >> >> Context 1 will start mapping the texture buffers, Context 2 will take >> the cmdbuf mutex and start validating the same texture buffers. >> Now if they end up having the buffer lists reversed, context 1 might end >> up waiting to map a buffer that context2 has validated for the GPU, >> while context2 will wait for a buffer that context1 has mapped => deadlock. >> >> One way around this is to use the hardware lock around all buffer >> mapping (including client buffer object mapping) and command submission, >> I believe the old i915tex driver did this to a large extent. I'm not >> sure what the current i915 driver does. Anyway, we need a more >> fine-grained approach. >> > > In master we hold the lock around execbuffer. Is getting multiple cpus > in the validate path a bottleneck, really, where a finer-grained > approach is needed? > I think in most of the cases we're dealing with today, a single thread in the execbuf path is sufficient. However, I think if you use the HW lock to resolve this deadlock you'll also need the hw lock around all code paths where you have multiple buffers mapped simultaneously. Since mapping may require a (possibly lengthy) wait for idle, one needs to be careful to try to idle all buffers before taking the hardware lock. Anyway, regardless whether we can work around this using the hardware lock in current drivers, I think a more general solution would need to allow multiple threads into the validate path, and also allowing lock-free operation where possible. It's a problem that isn't too hard to solve. /Thomas |
From: K. H. <kr...@bi...> - 2008-04-30 20:56:30
|
On Wed, Apr 30, 2008 at 2:53 PM, Thomas Hellström <th...@tu...> wrote: > Anyway, regardless whether we can work around this using the hardware > lock in current drivers, > I think a more general solution would need to allow multiple threads > into the validate path, and also allowing lock-free operation where > possible. It's a problem that isn't too hard to solve. I agree, it should definitely be doable. Kristian |