Re: [Ray-devel] VM Redesign, GC, Coroutines (please read)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Quoting Wolfgang Wieser <ww...@gm...>:

> > (If you don't have ACM access I can mail you the paper)
> >
> Well, it turned out that we do not have ACM in the list of subscribed 
> journals at the LMU, so I walked over to the guys at the TUM to get an 
> account there (something I should have done eariler anyways). 
> 
> The design seems to be a quite good one. Especially, I like the idea of 
> using and passing rays and photons because the simplicity in which 
> the massive parallelism is achieved is appealing. Furthermore, this design 
> is highly parallel without using a "hard-core" stream design with small 
> kernels for everything and hence nearer to a traditional sequential renderer
> 
> and also easier to implement. Actually, it is somehow like I dreamt of 
> being able to do some months ago: Dividing up work at the ray level. 
> 

Although they don't mention it in the paper, I'd imagine that the partitioning
of eye rays to rendering engine threads uses some sort of dynamic tile
assignment, I've seen this mentioned in some other papers.  I'm not sure whether
there is any other benefit to the ray-level partitioning except to make it
easier to split up the scene database when it is too large for a single host.

> However, as usual, people don't comment on the interesting details. 
> Let's consider a scene which consists only of one very large CSG tree. 
> Can it be split without having to copy some non-leaf nodes?
> If not, a uniform distribution would cause lots of overhead and consequently,
>  
> subtrees would have to be distributed. This, in turn, may make some boxes 
> (with important subtrees) much more busy than others. 
> 
> Or let's say one box happens to get the isosurface while the others only 
> have triangle meshes. Guess who will be waiting?
> 

Yes, I think their partitioning strategy is only going to work with simple
primitives.  Other stuff will need to be duplicated on all the nodes (doesn't
seem too harsh a constraint, I don't think we expect to have millions of
isosurfaces or CSG models).

> Furthermore, where is the texture information stored? Where the interior 
> information (like index of refraction, media)?
> 

Probably these things are assigned as object level rather than triangle level,
so it is not too unreasonable to duplicate it among the nodes.

> > This is usually done by making the accelerator have the same interface as
> > the primitives themselves.  This is what I've done in my raytracer, and is
> > also what is done in pbrt.
> >
> ...and would you advise against doing so?
> 

I would advise in favor of doing so.

> 
> > I think our system should make use of ray differentials to choose the
> level
> > of tesselation.  Check online for the paper "Ray differentials for
> > multiresolution geometry caching" by Christensen.  This way, highly
> > incoherent rays for GI can use a low-res tesselation, saving lots of
> > effort.
> >
> Something similar could be done for isosurfaces by bowering the solver 
> accuracy in this case. I actually spent some thoughts on that already. 
> (I'll check out the paper because estimating the ray differential was the 
> point I got stuck at.)
> 

Lowering accuracy of isosurface sounds like a good idea, I wonder if it will
work in practise.

> > One more thing (I don't think I was that clear) is that tesselated
> geometry
> > can be put directly into the accelerator.  
> >
> Actually, you were clear enough, although I was not aware on how to exactly 
> put the information into the leaf. 
> And I do not yet fully understand it (triangle duplication problem). 
> 

Triangles that lie on accelerator cell boundaries will need to be duplicated. 
The alternative is to use lists of pointers to refer to them, in which case we
introduce a level of indirection (making caching performance worse), and need to
store lists of pointers (increasing memory use).

> [previously]
> > If we take this approach to threading, we'll need to worry about the
> > following access control issues:
> > 1. Access to mailbox numbers on primitives in the accelerator
> > 2. Access to a read/write irradiance cache (if we use one)
> > 3. Writeback to the frame buffer (we can do this in blocks - this is one
> > reason we should probably use blocks of pixels rather than rays as the
> > basic unit of parallelism).
> >
> Maybe I am blind but I barely see problems. 
> 1. is used read-only (don't we?), 
> 2. is a classical locking issue and probably a distribution problem (OK) and
> 
> 3. is just a minor locking/queuing problem
> (Using blocks of pixels for each node is interesting because of the 
> resulting positive effect on caching due to image space coherence -- 
> especially when demand-loading textures/geometry.)
> 

I was just trying to enumerate the cases where we will need to lock global data
structures.
1. Mailboxes are used to store the ray number on a primitive so that while
traversing the accelerator we only intersect it once (before intersecting any
primitive, compare the mailbox number with the current ray => if equal, then the
primitive has already been tested).  This can improve performance a lot, check
out the source code for my patch (I added mailbox numbers to the OBJECT struct).
 There are multithreading issues because two threads could need to update the
mailbox number of a primitive independently.  One alternative I can think of is
to make an array of mailbox numbers per primitive, but memory use will increase.
 We could also potentially use an alternative method of recording primitives
that are visited within a thread (perhaps some sort of hashing would do the
trick...)
2. This is probably not going to be too much trouble, but we will need to make a
lot of access to the irradiance cache during rendering.  I'm don't really have
any idea how much impact this will have on performance but we should be aware of
it.  Depending on how we implement it, it will result in nondeterministic image
output: eg. a simple race between two threads will cause a different image
result when one gets the cached value and the other needs to create it (or
vice-versa).
3. I mentioned so that we should avoid locking whenever a ray makes a
contribution back into the buffer.  If we are allocating blocks of pixels to
individual threads, then queue up the pixels in local storage and then lock the
frame buffer and write the block all at once.

I'm guessing that other multithreading issues will arise in the future, these
were just what I've thought of so far.

----------------------------------------
This mail sent through www.mywaterloo.ca