From: Andrew J. R. <and...@uc...> - 2001-08-22 16:11:01
|
On Wednesday 22 August 2001 10:30, Brian Paul wrote: > Grzegorz Jaskiewicz wrote: > > On Wed, 22 Aug 2001 07:43:06 -0600, Brian Paul wrote: > > >Getting back to threaded rasterization, this is something I've > > >wanted to > > >experiment with myself. Here's something to look into... > > > > > >In the latest Mesa 3.5 triangle rasterization code I'm using the > > >triangle_span struct (defined in src/swrast/s_trispan.h) to describe > > >the per-scanline interpolated parameters (such as color, Z, fog, > > >texcoords, etc). > > > > > >As it is now, one of these structs is allocated on the stack when the > > >triangle function is called. The triangle template code computes the > > >triangle_span values then relies on a specialized macro or function > > >to compute the incremental values in the span and perform the per- > > >fragment > > >processing. This organization cleaned things up a lot, BTW. > > > > > >One idea I had when I wrote this is have the triangle rasterizer > > >produce > > >triangle_span structs, put them into a queue, then have a > > >rasterization > > >thread consume/process those structs in parallel. You could > > >actually have > > >multiple consumers processing spans in parallel. Since the scan > > >lines > > >will never overlap (within one triangle) they could be safely > > >processed > > >simulataneously. > > > > It will be great, especialy when you have 2 or more CPUs in your mashine > > (like i do) :-), the only problem to solve is that whenever you are > > trying to acces memory in some range and other processor(s) too some of > > them stalle execution till they get acess to that memory. So it will > > require some sort of triplle buffering. > > I don't see how you come to that conclusion. There will always be times > when different threads (processors) will need to touch the same page of > (cached) memory. But I don't think it's something to spend too much time > worrying about up front. (correctness first, optimization later) > > I suggested using a queue between the triangle rasterizer and the span > processor(s), not a two-slot ring, as you seem to imply. A queue will > absort some amount of irregularity between the producer/consumer rates. > > > Another thing is that, many things can be done in parallel ( in Mesa ) - > > and for fast operating Mesa at start will have to make few threads and > > just suspend them till the time their are needed. > > If i would like to write it, can you send me latest version (with your > > changes) ? > > You can get the latest code out of CVS at any time. > My thoughts are effectively this. Have Mesa (rasterization and t&l) running on one thread and the client (a game?) on another. One way to do this is to wrap openGL function calls in a special wrapper that passes function calls into a shared data area that the other thread looks at to get the operations that it should do. This can be done reasonably easyily, but I thought that if the internalls of mesa were such that just passing a thread the vertex buffer (or ctx or what ever) and letting it chomp up that was easier then that would be the way of doing it. The wrapper has the advantage that it alows other openGL implementations to have this facility (I think) (Nvidia??). Load balancing can occur when needed if the vertex transformation statges are parralellised too (which I have done a bit of already). However all this talk is great, but I'm in my last 3 Months of my ph.D so Im working like buggery on my thesis. After that I'm going to make a stearling effort to get the SMP stuff working. Grzegorz are you interested in helping too? (Im on a PIII dual and it would be nice to use the other proc for GL stuff). This is why I wanted to get rasterisation turned off; I want to concentrate on bench marking the vertex transformation stages only. Andy |