Re: [Ray-devel] VM Redesign, GC, Coroutines (please read)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tuesday 15 March 2005 02:39, ajc...@en... wrote:
> I'm not sure whether there is any other benefit to the ray-level
> partitioning except to make it easier to split up the scene database when
> it is too large for a single host.
>
I guess there is in case the image is very inhomogene like having tiny 
spots which trigger lots of secondary rays. Something which could happen 
when rendering a image from space (lots of black background, some 
interesting spots). 

> Yes, I think their partitioning strategy is only going to work with simple
> primitives.  Other stuff will need to be duplicated on all the nodes
> (doesn't seem too harsh a constraint, I don't think we expect to have
> millions of isosurfaces or CSG models).
>
Well, but one can do CSG operations on meshes when defining inside/outside 
of the mesh. IIRC POVRay can do that. 

> > > If we take this approach to threading, we'll need to worry about the
> > > following access control issues:
> > > 1. Access to mailbox numbers on primitives in the accelerator
> > > 2. Access to a read/write irradiance cache (if we use one)
> > > 3. Writeback to the frame buffer (we can do this in blocks - this is
> > > one reason we should probably use blocks of pixels rather than rays as
> > > the basic unit of parallelism).
> 1. Mailboxes are used to store the ray number on a primitive so that while
> traversing the accelerator we only intersect it once (before intersecting
> any primitive, compare the mailbox number with the current ray => if equal,
> then the primitive has already been tested).  
>
In order for multi-threading to work we may neither store the mailbox 
number in the accelerator nor in the scene graph. Both should be used 
read-only by the rendering threads. 

It seems to me that the natural way which remains is to store a list of 
intersected primitives in the ray. The downside is that this sort of thing 
depends a bit on the accelerator used (BVH does not need mail 
box numbers, I guess...)

If the maximum number of "threads" is known in advance (and this may 
not be the case if coroutines are created on demand), then the array method 
you mentioned could as well be used. 

(Anyways, the above about mailbox numbers was probably not the whole 
story because we may want to get all intersections with an object, so some 
sort of inside/outside information must as well be used, I guess.)

> [2] ...   Depending on how we implement it, it will result in
> nondeterministic image output: eg. a simple race between two threads will
> cause a different image result when one gets the cached value and the other
> needs to create it (or vice-versa).
>
Well, it is right that this will cause the image to be non-deterministic. 
But this is due to the fact that the irradiance cache is somewhat flawed by 
design in this respect because the result depends on the order in which rays 
are calculated (and thereby cache entries are generated). 

> 3. I mentioned so that we should avoid locking whenever a ray makes a
> contribution back into the buffer.  
>
Don't worry too much about locking. Locking mainly becomes a problem 
for scalability if the thread holding the lock is likely to hold it for longer 
time. 

For the above problem, there is e.g. a very simple solution which can work on 
single pixels as well by using private output queues in each thread: 
The thread will queue 8x8=64 pixels in his private output queue and then 
lock one time to transfer the complete queue to the framebuffer handler 
(this is 4 pointer assignments and hence the lock will not be held very 
long). 

BTW, thanks for pointing me at the paper of Cristensen (ray differentials). 
Although I find the "original" Igehy paper more informative concerning ray 
differential computation, the former has some more information about 
Kilauea. Seems like they tesselate everything before. This (IMO) means that 
1. they do not have a problem splitting the scene (It's just meshes)
2. all ray intersections take about equally long. 
Concerning (2): In case Kilauea (as described) had to calculate isosurface 
intersections as well, those boxes which happen to have that part of the 
scene without an isosurface object will in the end be idle and wait for all 
those computers which still have slow isosurface intersections to calculate. 
This is due to the way they divide up the scene but probably they do not 
see the problem because they'd tesselate everything first in any case. 

One other thing which is not described in the paper is how the "first- 
come-first-served" works. If box A wants to have a ray traced and 
sends a message to all the other boxes, how can box C and D know 
that B accepted it without introducing a race? I see only 2 solutions: 
1. Use one server S which keeps track of all rays. A sends the ray to S 
  which sends it to B. 
2. A directly sends the ray to B without telling anybody else. 
Unfortunately, 1 introduces a bottleneck: the server S. Fortunately, I 
already have an idea how one could do 2. (later)

Wolfgang