Re: [Ray-devel] VM Redesign, GC, Coroutines (please read)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> (If you don't have ACM access I can mail you the paper)
>
Well, it turned out that we do not have ACM in the list of subscribed 
journals at the LMU, so I walked over to the guys at the TUM to get an 
account there (something I should have done eariler anyways). 

The design seems to be a quite good one. Especially, I like the idea of 
using and passing rays and photons because the simplicity in which 
the massive parallelism is achieved is appealing. Furthermore, this design 
is highly parallel without using a "hard-core" stream design with small 
kernels for everything and hence nearer to a traditional sequential renderer 
and also easier to implement. Actually, it is somehow like I dreamt of 
being able to do some months ago: Dividing up work at the ray level. 

However, as usual, people don't comment on the interesting details. 
Let's consider a scene which consists only of one very large CSG tree. 
Can it be split without having to copy some non-leaf nodes?
If not, a uniform distribution would cause lots of overhead and consequently,  
subtrees would have to be distributed. This, in turn, may make some boxes 
(with important subtrees) much more busy than others. 

Or let's say one box happens to get the isosurface while the others only 
have triangle meshes. Guess who will be waiting?

Furthermore, where is the texture information stored? Where the interior 
information (like index of refraction, media)?

On Monday 14 March 2005 00:26, Andrew Clinton wrote:
> I agree; they needed to develop distributed algorithms for ray intersection
> and photon gather, which require rays to be sent to multiple hosts and then
> the results combined to get the result.  It would be much more useful to
> encourage users to develop scenes so that objects can be demand-loaded or
> built on the fly, so that the system can manage memory effectively on a
> single host.
>
Of course. 
However, being able to handle such large scenes as well would as well be 
nice in any case...

I'll try and see if I can come up with a rough design based on what I've 
read before going on "dreaming"...

> SSE instructions fit 4 floats but only 2 doubles.
>
...which unfortunately makes them pretty useless in a 3dimensional world 
represented with double floats...
I'll add something like a "main" floating point type which can be float or 
double switchable at compile time. 

> This is usually done by making the accelerator have the same interface as
> the primitives themselves.  This is what I've done in my raytracer, and is
> also what is done in pbrt.
>
...and would you advise against doing so?

Another advantage of this would be to allow procedural transformations (as 
defined previously) to have their own local accelerator. 
(Of course O(log(n)+log(m)) > O(log(n+m)), so one big accelerator is better 
than several nested ones.)

> I would focus only on KD trees.  You can use my patch to start, 
>
I've been having it on my HD for some time already :)

> Additional optimizations in the book include the following:
>
Thanks for pointing these out. 

>> [tesselation]
> finely tesselated nurb vs. an analytic method).  I would make this the
> primary method of handling these types of patch surfaces and ignore
> analytic methods for anything but user-defined isosurfaces or infinite
> objects. 
...and box, sphere, disc, quadric, I guess. 

> Even for isosurface we might want to build in a conversion to be 
> able to get fast previews.
>
One of the main points of using isosurfaces (e.g. for landscapes) is that 
a high degree of complexity can be achieved with low memory footprint. 
I'd rather focus on efficient (mesh-like) bounding schemes for isosurfaces 
to speed up root finding, than tesselating them. (This probably requires a 
rough inside and an outside mesh with roots in between. Not a simple 
thing if the surface has bubbles and holes.)

> I think our system should make use of ray differentials to choose the level
> of tesselation.  Check online for the paper "Ray differentials for
> multiresolution geometry caching" by Christensen.  This way, highly
> incoherent rays for GI can use a low-res tesselation, saving lots of
> effort.
>
Something similar could be done for isosurfaces by bowering the solver 
accuracy in this case. I actually spent some thoughts on that already. 
(I'll check out the paper because estimating the ray differential was the 
point I got stuck at.)

> One more thing (I don't think I was that clear) is that tesselated geometry
> can be put directly into the accelerator.  
>
Actually, you were clear enough, although I was not aware on how to exactly 
put the information into the leaf. 
And I do not yet fully understand it (triangle duplication problem). 

[previously]
> If we take this approach to threading, we'll need to worry about the
> following access control issues:
> 1. Access to mailbox numbers on primitives in the accelerator
> 2. Access to a read/write irradiance cache (if we use one)
> 3. Writeback to the frame buffer (we can do this in blocks - this is one
> reason we should probably use blocks of pixels rather than rays as the
> basic unit of parallelism).
>
Maybe I am blind but I barely see problems. 
1. is used read-only (don't we?), 
2. is a classical locking issue and probably a distribution problem (OK) and 
3. is just a minor locking/queuing problem
(Using blocks of pixels for each node is interesting because of the 
resulting positive effect on caching due to image space coherence -- 
especially when demand-loading textures/geometry.)

> Whatever advantage we gain from locality within a
> coroutine might be offset by the cost of maintaining suspended states. 
>
I completely agree. This is a very nice formulation of what I meant with 
"overhead" in previous postings. 

Regards,
Wolfgang