|
From: <ajc...@en...> - 2005-03-15 01:40:02
|
Quoting Wolfgang Wieser <ww...@gm...>: > > (If you don't have ACM access I can mail you the paper) > > > Well, it turned out that we do not have ACM in the list of subscribed > journals at the LMU, so I walked over to the guys at the TUM to get an > account there (something I should have done eariler anyways). > > The design seems to be a quite good one. Especially, I like the idea of > using and passing rays and photons because the simplicity in which > the massive parallelism is achieved is appealing. Furthermore, this design > is highly parallel without using a "hard-core" stream design with small > kernels for everything and hence nearer to a traditional sequential renderer > > and also easier to implement. Actually, it is somehow like I dreamt of > being able to do some months ago: Dividing up work at the ray level. > Although they don't mention it in the paper, I'd imagine that the partitioning of eye rays to rendering engine threads uses some sort of dynamic tile assignment, I've seen this mentioned in some other papers. I'm not sure whether there is any other benefit to the ray-level partitioning except to make it easier to split up the scene database when it is too large for a single host. > However, as usual, people don't comment on the interesting details. > Let's consider a scene which consists only of one very large CSG tree. > Can it be split without having to copy some non-leaf nodes? > If not, a uniform distribution would cause lots of overhead and consequently, > > subtrees would have to be distributed. This, in turn, may make some boxes > (with important subtrees) much more busy than others. > > Or let's say one box happens to get the isosurface while the others only > have triangle meshes. Guess who will be waiting? > Yes, I think their partitioning strategy is only going to work with simple primitives. Other stuff will need to be duplicated on all the nodes (doesn't seem too harsh a constraint, I don't think we expect to have millions of isosurfaces or CSG models). > Furthermore, where is the texture information stored? Where the interior > information (like index of refraction, media)? > Probably these things are assigned as object level rather than triangle level, so it is not too unreasonable to duplicate it among the nodes. > > This is usually done by making the accelerator have the same interface as > > the primitives themselves. This is what I've done in my raytracer, and is > > also what is done in pbrt. > > > ...and would you advise against doing so? > I would advise in favor of doing so. > > > I think our system should make use of ray differentials to choose the > level > > of tesselation. Check online for the paper "Ray differentials for > > multiresolution geometry caching" by Christensen. This way, highly > > incoherent rays for GI can use a low-res tesselation, saving lots of > > effort. > > > Something similar could be done for isosurfaces by bowering the solver > accuracy in this case. I actually spent some thoughts on that already. > (I'll check out the paper because estimating the ray differential was the > point I got stuck at.) > Lowering accuracy of isosurface sounds like a good idea, I wonder if it will work in practise. > > One more thing (I don't think I was that clear) is that tesselated > geometry > > can be put directly into the accelerator. > > > Actually, you were clear enough, although I was not aware on how to exactly > put the information into the leaf. > And I do not yet fully understand it (triangle duplication problem). > Triangles that lie on accelerator cell boundaries will need to be duplicated. The alternative is to use lists of pointers to refer to them, in which case we introduce a level of indirection (making caching performance worse), and need to store lists of pointers (increasing memory use). > [previously] > > If we take this approach to threading, we'll need to worry about the > > following access control issues: > > 1. Access to mailbox numbers on primitives in the accelerator > > 2. Access to a read/write irradiance cache (if we use one) > > 3. Writeback to the frame buffer (we can do this in blocks - this is one > > reason we should probably use blocks of pixels rather than rays as the > > basic unit of parallelism). > > > Maybe I am blind but I barely see problems. > 1. is used read-only (don't we?), > 2. is a classical locking issue and probably a distribution problem (OK) and > > 3. is just a minor locking/queuing problem > (Using blocks of pixels for each node is interesting because of the > resulting positive effect on caching due to image space coherence -- > especially when demand-loading textures/geometry.) > I was just trying to enumerate the cases where we will need to lock global data structures. 1. Mailboxes are used to store the ray number on a primitive so that while traversing the accelerator we only intersect it once (before intersecting any primitive, compare the mailbox number with the current ray => if equal, then the primitive has already been tested). This can improve performance a lot, check out the source code for my patch (I added mailbox numbers to the OBJECT struct). There are multithreading issues because two threads could need to update the mailbox number of a primitive independently. One alternative I can think of is to make an array of mailbox numbers per primitive, but memory use will increase. We could also potentially use an alternative method of recording primitives that are visited within a thread (perhaps some sort of hashing would do the trick...) 2. This is probably not going to be too much trouble, but we will need to make a lot of access to the irradiance cache during rendering. I'm don't really have any idea how much impact this will have on performance but we should be aware of it. Depending on how we implement it, it will result in nondeterministic image output: eg. a simple race between two threads will cause a different image result when one gets the cached value and the other needs to create it (or vice-versa). 3. I mentioned so that we should avoid locking whenever a ray makes a contribution back into the buffer. If we are allocating blocks of pixels to individual threads, then queue up the pixels in local storage and then lock the frame buffer and write the block all at once. I'm guessing that other multithreading issues will arise in the future, these were just what I've thought of so far. ---------------------------------------- This mail sent through www.mywaterloo.ca |