|
From: Wolfgang W. <ww...@gm...> - 2005-03-18 00:12:53
|
On Tuesday 15 March 2005 02:39, ajc...@en... wrote: > I'm not sure whether there is any other benefit to the ray-level > partitioning except to make it easier to split up the scene database when > it is too large for a single host. > I guess there is in case the image is very inhomogene like having tiny spots which trigger lots of secondary rays. Something which could happen when rendering a image from space (lots of black background, some interesting spots). > Yes, I think their partitioning strategy is only going to work with simple > primitives. Other stuff will need to be duplicated on all the nodes > (doesn't seem too harsh a constraint, I don't think we expect to have > millions of isosurfaces or CSG models). > Well, but one can do CSG operations on meshes when defining inside/outside of the mesh. IIRC POVRay can do that. > > > If we take this approach to threading, we'll need to worry about the > > > following access control issues: > > > 1. Access to mailbox numbers on primitives in the accelerator > > > 2. Access to a read/write irradiance cache (if we use one) > > > 3. Writeback to the frame buffer (we can do this in blocks - this is > > > one reason we should probably use blocks of pixels rather than rays as > > > the basic unit of parallelism). > 1. Mailboxes are used to store the ray number on a primitive so that while > traversing the accelerator we only intersect it once (before intersecting > any primitive, compare the mailbox number with the current ray => if equal, > then the primitive has already been tested). > In order for multi-threading to work we may neither store the mailbox number in the accelerator nor in the scene graph. Both should be used read-only by the rendering threads. It seems to me that the natural way which remains is to store a list of intersected primitives in the ray. The downside is that this sort of thing depends a bit on the accelerator used (BVH does not need mail box numbers, I guess...) If the maximum number of "threads" is known in advance (and this may not be the case if coroutines are created on demand), then the array method you mentioned could as well be used. (Anyways, the above about mailbox numbers was probably not the whole story because we may want to get all intersections with an object, so some sort of inside/outside information must as well be used, I guess.) > [2] ... Depending on how we implement it, it will result in > nondeterministic image output: eg. a simple race between two threads will > cause a different image result when one gets the cached value and the other > needs to create it (or vice-versa). > Well, it is right that this will cause the image to be non-deterministic. But this is due to the fact that the irradiance cache is somewhat flawed by design in this respect because the result depends on the order in which rays are calculated (and thereby cache entries are generated). > 3. I mentioned so that we should avoid locking whenever a ray makes a > contribution back into the buffer. > Don't worry too much about locking. Locking mainly becomes a problem for scalability if the thread holding the lock is likely to hold it for longer time. For the above problem, there is e.g. a very simple solution which can work on single pixels as well by using private output queues in each thread: The thread will queue 8x8=64 pixels in his private output queue and then lock one time to transfer the complete queue to the framebuffer handler (this is 4 pointer assignments and hence the lock will not be held very long). BTW, thanks for pointing me at the paper of Cristensen (ray differentials). Although I find the "original" Igehy paper more informative concerning ray differential computation, the former has some more information about Kilauea. Seems like they tesselate everything before. This (IMO) means that 1. they do not have a problem splitting the scene (It's just meshes) 2. all ray intersections take about equally long. Concerning (2): In case Kilauea (as described) had to calculate isosurface intersections as well, those boxes which happen to have that part of the scene without an isosurface object will in the end be idle and wait for all those computers which still have slow isosurface intersections to calculate. This is due to the way they divide up the scene but probably they do not see the problem because they'd tesselate everything first in any case. One other thing which is not described in the paper is how the "first- come-first-served" works. If box A wants to have a ray traced and sends a message to all the other boxes, how can box C and D know that B accepted it without introducing a race? I see only 2 solutions: 1. Use one server S which keeps track of all rays. A sends the ray to S which sends it to B. 2. A directly sends the ray to B without telling anybody else. Unfortunately, 1 introduces a bottleneck: the server S. Fortunately, I already have an idea how one could do 2. (later) Wolfgang |