I haven't looked into the aqsis code much but two things that are kind of new in renderman are the SIMD rsl plugins and deep buffer display driver support.

What I find most interesting about supporting these two features is they would affect the design of the data through each stage of pipeline.

I can see having the ideas of grid point iterators would help structure the data flow to work well in parallel, not just SIMD but on different threads. While the idea of sending raw fragment sample lists to display drivers could help with the problem of bucket filtering overlap, where a bucket would be filtered and sent when all the fragments are done. Grids of fragment could be prioritised based on which neighbouring buckets have been completed, which would have fragments inside the filter width.



Some SIMD prman reference

From the 3delight docs Accessing 3Delight’s Deep Buffer
Contrary to a standard framebuffer -which stores the final color at each pixel- a deep buffer stores a list of surfaces touching a specific pixel. 3Delight provides access to such a buffer with a notable difference: surface lists are provided per sub-sample and not per-pixel. This means that the user has access to the raw, unfiltered, surface lists directly from the render engine. The surface lists, which are called fragment lists in 3Delight, can be accessed using a display driver. By default, all display drivers receive pixels and not fragment lists. To enable fragment lists one has to reply consequently to the PkCookedQuery query, as described in [DspyImageQuery], page 188. If this is done, the display driver will receive the requested lists of fragments instead of a buffer of pixels. The fragments are received through a PtDspyRawData structure. The structure is shown in Listing 9.1.

Follows some important remarks about fragment lists:
1. The lists are sorted in Z. Furthest fragments come first in the least.
2. Lists are truncated at first opaque object unless special culling attributes are set to disable hidden surface removal (see Table 5.5). In general, lists longer than one will contain fragments that are not opaque.
3. The length of each list is not constant per sample, of course.
4. In order to obtain the final color for a pixel, the user must composite the fragments and then filter them.

On 08/05/2010, at 1:29 AM, Paul Gregory wrote:

Well, that didn't foster as much discussion as I'd hoped. I've had a little chance to think about some of this, I'll add my thoughts to the thread.

If we were to identify particular stages of the pipeline that could be implemented in parallel, such as the four described below, tessellate-->shade-->sample--filter, we could then weight them in terms of the amount of work required, the weighting would probably have to be dynamic, so that as one stage backs up, more weight is assigned to it. Then the main scheduler would assign work units in the different stages to the available threads based on priority.

For example, at first, stages 2-4 are redundant, so all threads are assigned to tessellation. The results of this stage are that more work units are stalled in stage 2 waiting shading, so when a thread has completed it's tessellation work unit, it will be assigned to stage 2, and so on as the stages start to back up.

There would have to be some throttling control of the weights so that we don't just suddenly shift all available thread to one stage, but instead maintain a balance.



On Sun, May 2, 2010 at 11:36 PM, Paul Gregory <aqsis1@gmail.com> wrote:

A discussion took place at this weeks meeting regarding ideas for approaches to multi-threading. I'll quickly reiterate some of the points made here, with a view to kicking off a detailed discussion that will cover more than the short time we have available in the meeting. We'll review the contributions to this thread at subsequent meetings.

We need to address our inability to maximise the increasingly popular multi-core processors in modern machines. Unfortunately, threading in our current architecture is inherently difficult, so we need to think broader to identify threading bottlenecks in our architecture and how we might adapt the architecture to overcome them. Rather than trying to find ways to force our current approach to work in a threaded environment, we should favor adapting our architecture to something more suitable.

The main point discussed on Sunday was the idea of distributing processing at any point in the pipeline. Previous threading discussions have been focused around threaded processing at the bucket level. This approach is problematic because there is a lot of sharing of information between buckets, that is, buckets are not a clean separation point, primitives cross buckets, grids cross buckets and so do micropolygons. This results in a lot potential for blocking threads and/or duplication of work. A better solution is to thread the pipeline. Our pipeline (simplified immensely) is, tessellate (split/dice)-->shade-->sample-->filter. There is a lot of potential for parallel processing of those stages, and it was suggested that the processing should be 'lazy', that is, the filtering stage requests more data from the tessellation stage.

Concerns include potential feedback points in the pipeline. If each stage were isolated, life would be simple, but the sampling stage can feedback to the tessellation stage as part of occlusion, this needs to be considered carefully to avoid over-rendering by reducing the efficiency/accuracy of the occlusion processing. Another concern is how the stages are scheduled to threads, and how the load is balanced. It's highly unlikely that each stage will have an equal amount of processing requirement, many scenes for example will have a bias towards the shading stage. The scheduling will need to be able to take this into account. Another point for discussion is how do threads get allocated to the stages, especially if we have n<>4 where n is the available number of threads in the pool. Can any of the stages be efficiently handled by more than one thread, can any stage explicitly not be processed by more than one thread?

Hopefully this is enough to get discussion going, lets get as much happening on this topic as possible before next weeks meeting. Any and all input is welcomed, there are no stupid questions (if there are, I'm usually the one to ask them, so don't worry).


Paul Gregory

Paul Gregory

Aqsis-development mailing list