From: Dave A. <ai...@gm...> - 2010-03-07 07:47:52
Attachments:
cached-bufmgr-busy.patch
|
I've been playing with strategies to get r300 buffer management a bit more efficient, I've reworked the r300g screen/winsys and create a pb_bufmgr compliant buffer class, and stacked the cached buffer manager on top of it. Now I've hit a problem in that we expose buffer busy state, but the cached bufmgr doesn't know to check if the buffer is busy before reusing it. The attached patch adds a query for buffer busy to pb_bufmgr.h. I've wondered if I can do this with some hacked up fences but it seems like a lot more work for not much gain. Dave. |
From: Luca B. <luc...@gm...> - 2010-03-07 11:44:48
|
I think you are supposed to do this using the fenced bufmgr over cached along with a (ideally userspace) fencing mechanism. If you can implement pb_busy, you should be able to implement fence_signalled in exactly the same way (making the fence handle a pointer to buffers should work for this purpose, if standalone fences are hard to do). The fenced bufmgr will only pass destruction requests to the wrapped bufmgr once the fences are signalled. |
From: Dave A. <ai...@gm...> - 2010-03-07 20:36:45
|
On Sun, Mar 7, 2010 at 9:44 PM, Luca Barbieri <luc...@gm...> wrote: > I think you are supposed to do this using the fenced bufmgr over > cached along with a (ideally userspace) fencing mechanism. > If you can implement pb_busy, you should be able to implement > fence_signalled in exactly the same way (making the fence handle a > pointer to buffers should work for this purpose, if standalone fences > are hard to do). > The fenced bufmgr will only pass destruction requests to the wrapped > bufmgr once the fences are signalled. > It just seemed a bit heavyweight, I don't want userspace fences, so why do I have to jump though abstraction hoops? The fencing solution isn't near as efficent from what I can see, as it is designed around fences not buffer busy, I'll see if I can give it a try, but I suspect it look and smell like a hack. Dave. |
From: Jose F. <jfo...@vm...> - 2010-03-08 07:51:49
|
Dave, I don't oppose this new method -- it shouldn't be necessary to add fencing just to use pb_cache --, but this method adds a new way of doing the same thing. Does the underlying buffer support PIPE_BUFFER_USAGE_DONT_BLOCK? If so what about doing: boolean pb_cache_is_buffer_compat() { void map; map = pb_map(buf->buffer, PIPE_BUFFER_USAGE_DONT_BLOCK | PIPE_BUFFER); if (map) { /* TODO: cache the map value for the first map */ pb_unmap(buf->buffer); return TRUE; } return FALSE; } Jose ________________________________________ From: Dave Airlie [ai...@gm...] Sent: Sunday, March 07, 2010 20:36 To: Luca Barbieri Cc: mesa Subject: Re: [Mesa3d-dev] gallium cached bufmgr busy change On Sun, Mar 7, 2010 at 9:44 PM, Luca Barbieri <luc...@gm...> wrote: > I think you are supposed to do this using the fenced bufmgr over > cached along with a (ideally userspace) fencing mechanism. > If you can implement pb_busy, you should be able to implement > fence_signalled in exactly the same way (making the fence handle a > pointer to buffers should work for this purpose, if standalone fences > are hard to do). > The fenced bufmgr will only pass destruction requests to the wrapped > bufmgr once the fences are signalled. > It just seemed a bit heavyweight, I don't want userspace fences, so why do I have to jump though abstraction hoops? The fencing solution isn't near as efficent from what I can see, as it is designed around fences not buffer busy, I'll see if I can give it a try, but I suspect it look and smell like a hack. Dave. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mes...@li... https://lists.sourceforge.net/lists/listinfo/mesa3d-dev |
From: Luca B. <luc...@gm...> - 2010-03-09 01:33:50
|
> The fencing solution isn't near as efficent from what I can see, as it > is designed around fences not buffer busy, I'll see if I can give it a try, > but I suspect it look and smell like a hack. The problem I see is that while fenced is guaranteed to make at most one fence_signalled query for a busy buffer per allocation, cached could potentially query the busy status of _all_ destroyed buffers for every allocation. The reason of this is that fenced orders the buffers in fence order and stops at the first busy one, only handing them to the lower layer (e.g. cached) once they are no longer busy. If kernel calls are used instead of userspace fencing, the problem probably gets worse. So IMHO it should be much more efficient to use fenced over cached (perhaps with slab between them too). |
From: Dave A. <ai...@gm...> - 2010-03-09 06:32:06
|
On Tue, Mar 9, 2010 at 11:30 AM, Luca Barbieri <luc...@gm...> wrote: >> The fencing solution isn't near as efficent from what I can see, as it >> is designed around fences not buffer busy, I'll see if I can give it a try, >> but I suspect it look and smell like a hack. > > The problem I see is that while fenced is guaranteed to make at most > one fence_signalled query for a busy buffer per allocation, cached > could potentially query the busy status of _all_ destroyed buffers for > every allocation. > > The reason of this is that fenced orders the buffers in fence order > and stops at the first busy one, only handing them to the lower layer > (e.g. cached) once they are no longer busy. > > If kernel calls are used instead of userspace fencing, the problem > probably gets worse. > > So IMHO it should be much more efficient to use fenced over cached > (perhaps with slab between them too). > We can do this optimisation with busy as well. As long as you add things to the busy list at the end, and stop testing after the first busy call. At least for a single linear GPU context, which is all I expect this code will ever be handling. Dave. |
From: Luca B. <luc...@gm...> - 2010-03-09 13:42:12
|
> We can do this optimisation with busy as well. As long as you add > things to the busy list at the end, and stop testing after the first > busy call. At least for a single linear GPU context, which is all > I expect this code will ever be handling. Wouldn't this just end up reinventing the fenced bufmgr? Basically cached needs a list of all destroyed buffers (ideally in destruction order, so it can do the stopping optimization when expiring buffers), while the busy mechanism needs a list of all used buffers (destroyed or not) in usage order. So it seems it would need two lists, and essentially result in something that replicates fenced inside cached. BTW, right now I think all drivers use a single GPU context in userspace. Even Nouveau multiplexes Gallium contexts on a single channel (this is probably broken though). |
From: Dave A. <ai...@gm...> - 2010-03-23 09:07:55
Attachments:
0001-gallium-add-is_busy-flag-to-cached-bufmgr.patch
|
On Mon, Mar 8, 2010 at 5:51 PM, Jose Fonseca <jfo...@vm...> wrote: > Dave, > > I don't oppose this new method -- it shouldn't be necessary to add fencing just to use pb_cache --, but this method adds a new way of doing the same thing. > > Does the underlying buffer support PIPE_BUFFER_USAGE_DONT_BLOCK? If so what about doing: > > boolean > pb_cache_is_buffer_compat() > { > void map; > > map = pb_map(buf->buffer, PIPE_BUFFER_USAGE_DONT_BLOCK | PIPE_BUFFER); > if (map) { > /* TODO: cache the map value for the first map */ > pb_unmap(buf->buffer); > return TRUE; > } > > return FALSE; > } Hi Jose I've used your scheme but I'm not sure 100% of its correctness since we might expect different behaviour from maps that are referenced in flushed command streams and maps referenced in the current command stream. I've pushed the r300g bits that do it correctly and we haven't seen any regressions. So my next bit is to bail out after is_busy check to avoid kernel overheads for checking all the buffers, but I'm a bit worried it might change behaviour of the fenced/cached use case, so I'd appreciate a bit of a review of it. If this isn't possible I'm tempted to create pb_bufmgr_busy_cached and do it all there, I think nouveau could use something similiar. Dave. |
From: José F. <jfo...@vm...> - 2010-03-29 14:10:45
|
On Tue, 2010-03-23 at 02:07 -0700, Dave Airlie wrote: > On Mon, Mar 8, 2010 at 5:51 PM, Jose Fonseca <jfo...@vm...> wrote: > > Dave, > > > > I don't oppose this new method -- it shouldn't be necessary to add fencing just to use pb_cache --, but this method adds a new way of doing the same thing. > > > > Does the underlying buffer support PIPE_BUFFER_USAGE_DONT_BLOCK? If so what about doing: > > > > boolean > > pb_cache_is_buffer_compat() > > { > > void map; > > > > map = pb_map(buf->buffer, PIPE_BUFFER_USAGE_DONT_BLOCK | PIPE_BUFFER); > > if (map) { > > /* TODO: cache the map value for the first map */ > > pb_unmap(buf->buffer); > > return TRUE; > > } > > > > return FALSE; > > } > > Hi Jose > > I've used your scheme but I'm not sure 100% of its correctness since > we might expect > different behaviour from maps that are referenced in flushed command > streams and maps > referenced in the current command stream. I've pushed the r300g bits > that do it correctly and > we haven't seen any regressions. Hi Dave, I'm fine with adding an is_busy method as your original implementation if you find it useful. And we could provide a default implementation thatr uses PIPE_BUFFER_USAGE_DONT_BLOCK map/unmap to check. > So my next bit is to bail out after is_busy check to avoid kernel > overheads for checking all > the buffers, but I'm a bit worried it might change behaviour of the > fenced/cached use case, > so I'd appreciate a bit of a review of it. It looks good to me. It would benefit readability to split the is_busy parts of pb_cache_is_buffer_compat() into a new pb_cache_is_buffer_busy() function but it is purely cosmetic. > If this isn't possible I'm tempted to create pb_bufmgr_busy_cached and > do it all there, > I think nouveau could use something similiar. I don't know if it's still relevant given your patch looks good to me, but if so what would this pb_bufmgr_busy_cached function do exactly? Jose |
From: Luca B. <luc...@gm...> - 2010-03-23 13:41:43
|
Do Radeons have a CP command to write an arbitrary value to some place in memory? If so, you may want to use that to implement userspace-accessible fencing in the obvious way and then use the fenced bufmgr, which would do that with no pipebuffer changes and no kernel calls. Otherwise, maybe change the kernel module to write a fence number in an mmapable place on fence irqs? |
From: Dave A. <ai...@gm...> - 2010-03-23 20:09:14
|
On Tue, Mar 23, 2010 at 11:41 PM, Luca Barbieri <luc...@gm...> wrote: > Do Radeons have a CP command to write an arbitrary value to some place > in memory? > > If so, you may want to use that to implement userspace-accessible > fencing in the obvious way and then use the fenced bufmgr, which would > do that with no pipebuffer changes and no kernel calls. > > Otherwise, maybe change the kernel module to write a fence number in > an mmapable place on fence irqs? Why? I don't need any of this. exposing 32-bit numbers to userspace means you have to make sure everyone correctly deals with wrapping, its not worth the effort, and I've no idea why you keep pushing for it, when I've demonstrated I've no need for it at all. Dave. > |