gdalgorithms-list Mailing List for Game Dev Algorithms (Page 1404)
Brought to you by:
vexxed72
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(390) |
Aug
(767) |
Sep
(940) |
Oct
(964) |
Nov
(819) |
Dec
(762) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(680) |
Feb
(1075) |
Mar
(954) |
Apr
(595) |
May
(725) |
Jun
(868) |
Jul
(678) |
Aug
(785) |
Sep
(410) |
Oct
(395) |
Nov
(374) |
Dec
(419) |
2002 |
Jan
(699) |
Feb
(501) |
Mar
(311) |
Apr
(334) |
May
(501) |
Jun
(507) |
Jul
(441) |
Aug
(395) |
Sep
(540) |
Oct
(416) |
Nov
(369) |
Dec
(373) |
2003 |
Jan
(514) |
Feb
(488) |
Mar
(396) |
Apr
(624) |
May
(590) |
Jun
(562) |
Jul
(546) |
Aug
(463) |
Sep
(389) |
Oct
(399) |
Nov
(333) |
Dec
(449) |
2004 |
Jan
(317) |
Feb
(395) |
Mar
(136) |
Apr
(338) |
May
(488) |
Jun
(306) |
Jul
(266) |
Aug
(424) |
Sep
(502) |
Oct
(170) |
Nov
(170) |
Dec
(134) |
2005 |
Jan
(249) |
Feb
(109) |
Mar
(119) |
Apr
(282) |
May
(82) |
Jun
(113) |
Jul
(56) |
Aug
(160) |
Sep
(89) |
Oct
(98) |
Nov
(237) |
Dec
(297) |
2006 |
Jan
(151) |
Feb
(250) |
Mar
(222) |
Apr
(147) |
May
(266) |
Jun
(313) |
Jul
(367) |
Aug
(135) |
Sep
(108) |
Oct
(110) |
Nov
(220) |
Dec
(47) |
2007 |
Jan
(133) |
Feb
(144) |
Mar
(247) |
Apr
(191) |
May
(191) |
Jun
(171) |
Jul
(160) |
Aug
(51) |
Sep
(125) |
Oct
(115) |
Nov
(78) |
Dec
(67) |
2008 |
Jan
(165) |
Feb
(37) |
Mar
(130) |
Apr
(111) |
May
(91) |
Jun
(142) |
Jul
(54) |
Aug
(104) |
Sep
(89) |
Oct
(87) |
Nov
(44) |
Dec
(54) |
2009 |
Jan
(283) |
Feb
(113) |
Mar
(154) |
Apr
(395) |
May
(62) |
Jun
(48) |
Jul
(52) |
Aug
(54) |
Sep
(131) |
Oct
(29) |
Nov
(32) |
Dec
(37) |
2010 |
Jan
(34) |
Feb
(36) |
Mar
(40) |
Apr
(23) |
May
(38) |
Jun
(34) |
Jul
(36) |
Aug
(27) |
Sep
(9) |
Oct
(18) |
Nov
(25) |
Dec
|
2011 |
Jan
(1) |
Feb
(14) |
Mar
(1) |
Apr
(5) |
May
(1) |
Jun
|
Jul
|
Aug
(37) |
Sep
(6) |
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
(7) |
Mar
|
Apr
(4) |
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(10) |
2013 |
Jan
|
Feb
(1) |
Mar
(7) |
Apr
(2) |
May
|
Jun
|
Jul
(9) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(14) |
Feb
|
Mar
(2) |
Apr
|
May
(10) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(12) |
Nov
|
Dec
(1) |
2016 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
From: Sam K. <sa...@ip...> - 2000-08-21 12:28:00
|
Thanks for the info tom, exactly what I wanted to hear. A couple of questions: 1) Interesting.. do you have to clear the zbuffer when rendering each partition? 2) Yeah, but if you nail all distant objects to the back clip plane, they have the same z and consequently the same parallax. dont they? I guess I could treat the back of the view volume as the back of the galaxy, and nail the objects into the view volume at their relevant scaled positions. That should work. 3) Your dead right dude, I tried this yesterday after I sent the mail... works very nicely! ;) Cheers, Sam. -----Original Message----- From: Tom Forsyth <to...@mu...> To: gda...@li... <gda...@li...> Date: 21 August 2000 2:59 AM Subject: RE: [Algorithms] Massive spaces >(1) Partition your Z buffer. In D3D this is done using the dvMinZ and dvMaxZ >values in the viewport. There is a pretty direct equivalent in OpenGL - I >forget what it's called. Then you can even out your precision quite happily. > >(2) They're planets. Big spherical things. And they don't (usually - depends >what sort of game you are doing I guess :-) actually intersect in space. >Possibly nothing in the universe is easier to sort from back to front and >draw that way. So you can nail them to the far clip plane, set your Z test >to LESSTHAN, and there you go. > >(3) Move your near clip plane according to the nearest object. So if you are >on a planet's surface, the near clip plane is quite close. If you are >zooming about in space, it's millions of miles away. > >By using all these three, you can get huge spaces done pretty easily. > >If the archives are working, this has been discussed a fair number of times. > >Tom Forsyth - Muckyfoot bloke. >Whizzing and pasting and pooting through the day. >-----Original Message----- >From: Sam Kuhn [mailto:sa...@ip...] >Sent: 21 August 2000 02:33 >To: gda...@li... >Subject: [Algorithms] Massive spaces > > >Hey, > >I've got a little mind game for you lot. > >Say you needed to render an entire solar system, in which you can zoom down >to a planet surface and see detail of about 1/100000 meter, or shoot over to >the sun and see the same sort of detail. There are a couple of immediate >problems: > >You would need need an enormous view volume (what say 20 lightyears big) >which gives an appauling zbuffer resolution. >So whats are you to do? Render the distant objects closer than they actually >are (i.e. at the far end of a planet-sized view volume) and scale them down >accordingly?. This surely would give incorrect parralax between the distant >objects > >Also you need the planets to be of enormous size so that the front clipping >plane doesn't cut through the planets fine detail when you are in very close >(lets say an ant view for example). Which again buggers the zbuffer, since >you're using a planet sized view volume. > >Any ideas? > >Regards, > >Sam. > > > >_______________________________________________ >GDAlgorithms-list mailing list >GDA...@li... >http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: Gil G. <gg...@ma...> - 2000-08-21 12:04:09
|
Along with the things Tom mentioned, you can also put your far clip plane very very far away or even at infinity. Surprisingly, this does not have a significant effect on z buffer precision. -Gil Hey, I've got a little mind game for you lot. Say you needed to render an entire solar system, in which you can zoom down to a planet surface and see detail of about 1/100000 meter, or shoot over to the sun and see the same sort of detail. There are a couple of immediate problems: You would need need an enormous view volume (what say 20 lightyears big) which gives an appauling zbuffer resolution. So whats are you to do? Render the distant objects closer than they actually are (i.e. at the far end of a planet-sized view volume) and scale them down accordingly?. This surely would give incorrect parralax between the distant objects Also you need the planets to be of enormous size so that the front clipping plane doesn't cut through the planets fine detail when you are in very close (lets say an ant view for example). Which again buggers the zbuffer, since you're using a planet sized view volume. Any ideas? Regards, Sam. |
From: Jamie F. <j.f...@re...> - 2000-08-21 11:05:08
|
> Spheres are particularly nice because the sphere radius is the same as the > circles radius after projection Careful. There's a common misconception here (which you may or may not have made :). Let the sphere have radius r. Let S be the centre of the sphere. Let V be some vector perpendicular to the view vector of length r. Let P = S + V Some people claim that projecting point P gives you a point on the edge of the circle which is the rasterisation of the sphere. This is not true. Demonstration that this is so (in 2D, so hopefully it's clearer :) : Take a circle with centre C. Place an arbitrary point P outside the circle. The closer it is to the circle, the clearer my point (unintentional... sorry:) should be. Let the 2 tangents to the circle passing through P be T1, T2. Let P1 be the point of intersection between T1 and C. Define P2 similarly. It should be clear that the projections of P1 and P2 are equivalent to points on the edge of the rasterisation. But (P - C) is not perpendicular to (T1 - C) or (T2 - C). Although as | P - C | approaches infinity, they approach perpendicular. If you can be sure you'll never be close enough to appreciate the error, then you'll be fine :) Back to the sphere: this means that the true rasterisation of the sphere is larger than the circle calculated by projecting P. I'll expand more if anybody needs it... or gives a monkey's :) Jamie |
From: Timur D. <ti...@3d...> - 2000-08-21 10:40:03
|
SwitchToThread only supported on NT and 2k, good enough reason to not use it in games. _______________________ Timur Davidenko. 3Dion Inc. (www.3dion.com) e-mail: ti...@3d... ----- Original Message ----- From: "Matt Adams" <de...@gm...> To: <gda...@li...> Sent: Monday, August 21, 2000 12:03 PM Subject: RE: [Algorithms] Multithreading with Hardware Acceleration > btw, why is everyone using sleep(0) instead of switchtothread ? > since some of my threads do have an different priority, it seems > to be better to give (especially the higher-priority threads ) > part of the regained time, too. > Are there disadvantages in using switchtothread (is it slower, etc.) ? > > Matt > > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...]On Behalf Of Tom > Forsyth > Sent: Montag, 21. August 2000 07:13 > To: gda...@li... > Subject: RE: [Algorithms] Multithreading with Hardware Acceleration > > > The Sleep(0) explicitly gives up the rest of the thread's timeslice to other > threads. So even if two threads are at the same priority, the other one will > get most of the CPU time, instead of spending half of it spinning tightly. > It's a way of surrendering time you don't need - sort of manually lowering > your priority, but with very fine-grained control. > > Tom Forsyth - Muckyfoot bloke. > Whizzing and pasting and pooting through the day. > -----Original Message----- > From: Chris Brodie [mailto:Chr...@ma...] > Sent: 21 August 2000 05:50 > To: 'gda...@li...' > Subject: RE: [Algorithms] Multithreading with Hardware Acceleration > > > Thanks guys. I was only planning on about ~3-4 threads(input+networking, > file, video, main, (sound?)) > Each thread has a threadsafe queue. Then to communicate with it I just send > it a message throught the queue(like a pointer to where the data is it need > to do it's work. > I was once playing with WildTangets Gamedriver and profiled the 'explore' > sample app to see where all the CPU time went. Just about all of it was > spent in the nVidia driver, so I was betting that they were just doing a > whole bunch of waits as bits were being rendered. > Tom, while your way is correct (sleep(0) to return control to the OS to > reschedule), and is a nice way of doing it, shouldn't the scheduler still > switch to another thread if one is waiting and is of the same or higher > priority? > The reason I'm interested in this is that it'll give gains on single > processors as well as the dual's. > Chris > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > |
From: Matt A. <de...@gm...> - 2000-08-21 10:03:01
|
btw, why is everyone using sleep(0) instead of switchtothread ? since some of my threads do have an different priority, it seems to be better to give (especially the higher-priority threads ) part of the regained time, too. Are there disadvantages in using switchtothread (is it slower, etc.) ? Matt -----Original Message----- From: gda...@li... [mailto:gda...@li...]On Behalf Of Tom Forsyth Sent: Montag, 21. August 2000 07:13 To: gda...@li... Subject: RE: [Algorithms] Multithreading with Hardware Acceleration The Sleep(0) explicitly gives up the rest of the thread's timeslice to other threads. So even if two threads are at the same priority, the other one will get most of the CPU time, instead of spending half of it spinning tightly. It's a way of surrendering time you don't need - sort of manually lowering your priority, but with very fine-grained control. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. -----Original Message----- From: Chris Brodie [mailto:Chr...@ma...] Sent: 21 August 2000 05:50 To: 'gda...@li...' Subject: RE: [Algorithms] Multithreading with Hardware Acceleration Thanks guys. I was only planning on about ~3-4 threads(input+networking, file, video, main, (sound?)) Each thread has a threadsafe queue. Then to communicate with it I just send it a message throught the queue(like a pointer to where the data is it need to do it's work. I was once playing with WildTangets Gamedriver and profiled the 'explore' sample app to see where all the CPU time went. Just about all of it was spent in the nVidia driver, so I was betting that they were just doing a whole bunch of waits as bits were being rendered. Tom, while your way is correct (sleep(0) to return control to the OS to reschedule), and is a nice way of doing it, shouldn't the scheduler still switch to another thread if one is waiting and is of the same or higher priority? The reason I'm interested in this is that it'll give gains on single processors as well as the dual's. Chris _______________________________________________ GDAlgorithms-list mailing list GDA...@li... http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: Thatcher U. <tu...@tu...> - 2000-08-21 07:34:53
|
> I'd like to enhance my texture synthesizer for terrain. In particular, I'd > like to make my ecosystem-classification depend on the tangential curvature > of the terrain. I guess that may require some explanation. > > I have a set of ecosystem definititions. Such definitions include things > like: > [1] the upper elevation limit at which the ecosystem can exist > [2] the minimum and maximum slops at which the ecosystem can exist > [3] elevation skewing > [4] etc... > > Now, for each data point in the height field, I go through the set of > ecosystems, and determine which ecosystem exists at this data point. Such a > system is very useful for 'realistic' texture synthesis, and it's also great > to create a natural distribution of trees. > However, there's one thing I haven't been able to implement, yet. I'd like > to make my ecosystem classification depend on the concavity/convexity of the > terrain. I know that this requires me to compute the tangential curvature of > the terrain (which is usually stored in an extra curvature field). > > To make it easier for you to understand, you can have a loot at the > following web-site: > http://www2.gis.uiuc.edu:2280/modviz/ > > The fifth terrain image (from above) shows the tangential curvature of some > terrain. This is exactly what I'd like to do. I know how I need to use the > information from curvature field to modify my ecosystem classification, but > I have no idea how I to compute the tangential curvature field itself. In Ah. I think I understand. Basically, the curvature of a function is its second derivative. The first derivative is the slope. But for a 2D function like a heightfield, these derivatives are vectors, not scalars. This makes sense intuitively by noticing that a heightfield can be convex in one direction and concave in another direction.You're interested in scalar values I think, which means you want the curvature of the heightfield with respect to a certain direction. In the web page you gave, they compute profile curvature and tangential curvature, which are the curvatures of the terrain in two orthogonal directions. The profile curvature is the curvature along the "fall-line": i.e. the direction of maximum slope. You can find the fall-line with something like: fall_line = | up_axis x (normal x up_axis) | That gives you a vector in the plane tangent to the surface, pointing down the hill (Those 'x's are cross-products.) When the normal is vertical, the fall-line is undefined (there's no downhill direction). To estimate the curvature, you measure the difference between the actual slope near the point, and the fall-line. So something like (assuming +y is up): samp = fall_line * delta; profile_curvature = ((terrainheight(p + samp) - (p.y + samp.y)) + (terrainheight(p - samp) - (p.y - samp.y))) / delta; Delta is some "small" distance; it probably makes sense for it to be equal to your heightfield sample spacing. Basically you're measuring the difference between the straight fall-line and what the terrain actually does; if the terrain is concave, the actual terrain samples are higher than the fall-line and the formula comes out positive; if the terrain is convex then the samples are lower than the fall line and the formula comes out negative. The "tangent curvature" is the same thing, except instead of using the fall-line, you use a vector that's perpendicular to the fall-line, and parallel to the terrain surface, i.e. a vector that points across the slope of the terrain. It's easy to compute this tangent vector, it's just (normal x fall_line). Or the negative of that; it doesn't matter. Replace "fall-line" with the tangent vector, and you'll get the tangent curvature you're looking for. Apologies if I abused some calculus concepts or got some scale factors off. But that's the basic idea. Hope this helps, -- Thatcher Ulrich http://tulrich.com |
From: Michael S. H. <mic...@ud...> - 2000-08-21 07:19:42
|
I'm attempting to implement the AABB culling code talked about here some time ago and I'm running into a general problem in visualizing the frustum planes used to cull objects based on their AABB's I'm using both Klaus Hartmann's AABBCull sample and the Viewcull sample included with the OpenGL FAQ. While the code generally works, I'm having some difficulty visualizing the relationship between the plane vector and the "distance" value stored with it, which I'm sure is the reason I can't quite figure out why the code generates intersection results when it should generate a full cull. As you might guess, I'm fairly new to the world of 3d projection and I'm still wrapping my mind around all the transformations that take an object from "world" space to screen space. Can anyone suggest a reference or way to visualize exactly what's going on with the plane extraction and how the vectors relate back to the AABB? |
From: Klaus H. <k_h...@os...> - 2000-08-21 05:20:48
|
Hi, I'd like to enhance my texture synthesizer for terrain. In particular, I'd like to make my ecosystem-classification depend on the tangential curvature of the terrain. I guess that may require some explanation. I have a set of ecosystem definititions. Such definitions include things like: [1] the upper elevation limit at which the ecosystem can exist [2] the minimum and maximum slops at which the ecosystem can exist [3] elevation skewing [4] etc... Now, for each data point in the height field, I go through the set of ecosystems, and determine which ecosystem exists at this data point. Such a system is very useful for 'realistic' texture synthesis, and it's also great to create a natural distribution of trees. However, there's one thing I haven't been able to implement, yet. I'd like to make my ecosystem classification depend on the concavity/convexity of the terrain. I know that this requires me to compute the tangential curvature of the terrain (which is usually stored in an extra curvature field). To make it easier for you to understand, you can have a loot at the following web-site: http://www2.gis.uiuc.edu:2280/modviz/ The fifth terrain image (from above) shows the tangential curvature of some terrain. This is exactly what I'd like to do. I know how I need to use the information from curvature field to modify my ecosystem classification, but I have no idea how I to compute the tangential curvature field itself. In know that there's a paper on the above mentioned web-site, and I also know that there's an implementation in the GRASS package. However, I don't know enough about the math here to understand the paper, and the implementation in the GRASS package is, well, ... have a look at it, and you'll know what I mean. Finally, my question is this. Does anyone know of a more detailed and I-don't-expect-you-know-everything resource on calculating the tangential curvature? Or maybe some readable source? Or even better... can someone explain this to me? Again, I've reached a point, where I wished that my math skills were better. At least I hope that the answer to my question is not too easy, so that maybe some math guys are interested. I cannot even promise that I'll be able to follow your replies, but I'm sure someday I will (after re-reading, re-re-reading, ... :) Thanks, Niki |
From: Tom F. <to...@mu...> - 2000-08-21 05:13:51
|
The Sleep(0) explicitly gives up the rest of the thread's timeslice to other threads. So even if two threads are at the same priority, the other one will get most of the CPU time, instead of spending half of it spinning tightly. It's a way of surrendering time you don't need - sort of manually lowering your priority, but with very fine-grained control. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. -----Original Message----- From: Chris Brodie [mailto:Chr...@ma...] Sent: 21 August 2000 05:50 To: 'gda...@li...' Subject: RE: [Algorithms] Multithreading with Hardware Acceleration Thanks guys. I was only planning on about ~3-4 threads(input+networking, file, video, main, (sound?)) Each thread has a threadsafe queue. Then to communicate with it I just send it a message throught the queue(like a pointer to where the data is it need to do it's work. I was once playing with WildTangets Gamedriver and profiled the 'explore' sample app to see where all the CPU time went. Just about all of it was spent in the nVidia driver, so I was betting that they were just doing a whole bunch of waits as bits were being rendered. Tom, while your way is correct (sleep(0) to return control to the OS to reschedule), and is a nice way of doing it, shouldn't the scheduler still switch to another thread if one is waiting and is of the same or higher priority? The reason I'm interested in this is that it'll give gains on single processors as well as the dual's. Chris |
From: Chris B. <Chr...@ma...> - 2000-08-21 04:51:25
|
Thanks guys. I was only planning on about ~3-4 threads(input+networking, file, video, main, (sound?)) Each thread has a threadsafe queue. Then to communicate with it I just send it a message throught the queue(like a pointer to where the data is it need to do it's work. I was once playing with WildTangets Gamedriver and profiled the 'explore' sample app to see where all the CPU time went. Just about all of it was spent in the nVidia driver, so I was betting that they were just doing a whole bunch of waits as bits were being rendered. Tom, while your way is correct (sleep(0) to return control to the OS to reschedule), and is a nice way of doing it, shouldn't the scheduler still switch to another thread if one is waiting and is of the same or higher priority? The reason I'm interested in this is that it'll give gains on single processors as well as the dual's. Chris -----Original Message----- From: Charles Bloom [mailto:cb...@cb...] Sent: Monday, 21 August 2000 2:22 PM To: gda...@li... Subject: RE: [Algorithms] Multithreading with Hardware Acceleration Of note here is the famed Windows 9x anomaly of rapid decreases in performance when the running thread count goes over 32 (roughly). NT and 2k seem (relatively) free of this problem. At 05:02 AM 8/21/00 +0100, you wrote: >Some people go mad with threads and create them all over the place, e.g. one >for each AI character. That's possibly a little over the top, but can work. >You need to beware of the pitfalls of threading as well as the advantages, >but it sounds like you're aware of some of the problems already. ------------------------------------------------------- Charles Bloom cb...@cb... http://www.cbloom.com _______________________________________________ GDAlgorithms-list mailing list GDA...@li... http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: Charles B. <cb...@cb...> - 2000-08-21 04:16:16
|
Of note here is the famed Windows 9x anomaly of rapid decreases in performance when the running thread count goes over 32 (roughly). NT and 2k seem (relatively) free of this problem. At 05:02 AM 8/21/00 +0100, you wrote: >Some people go mad with threads and create them all over the place, e.g. one >for each AI character. That's possibly a little over the top, but can work. >You need to beware of the pitfalls of threading as well as the advantages, >but it sounds like you're aware of some of the problems already. ------------------------------------------------------- Charles Bloom cb...@cb... http://www.cbloom.com |
From: Tom F. <to...@mu...> - 2000-08-21 04:02:54
|
Yes, lots of people do this for various reasons, amongst them this one. In D3D there are a variety of places you can do this sort of code: while ( TRUE ) { res = DoAnOperation(); if ( res == WORKED ) { break; } Sleep(0); } I'm specifically thinking of Flip, Blt and Lock here. So there is certainly possibilities there. Sadly, so many apps have been written with this style: DoAnOperation(); //Assume it worked without checking any flag or anything. that many drivers have had to simply block internally and not return the "can't do it just yet or it would block" error. And because of the special thing that a driver is on most OSes, they often can't do the equivalent of Sleep(0). So yet another good idea ruined by dodgy apps I'm afraid. Still, some drivers do do this properly, which is cool. The other way to do it, which does always work as far as I know, is code such as the following: while ( !ReadyToDoBlt() ) { Sleep (0); } res = Blt(); // Should only get actual _errors_ back from this. This is certainly a good way to give way to other threads. Some people go mad with threads and create them all over the place, e.g. one for each AI character. That's possibly a little over the top, but can work. You need to beware of the pitfalls of threading as well as the advantages, but it sounds like you're aware of some of the problems already. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. -----Original Message----- From: Chris Brodie [mailto:Chr...@ma...] Sent: 21 August 2000 04:39 To: 'gda...@li...' Subject: [Algorithms] Multithreading with Hardware Acceleration I've had a thought for a while. If I ran my main loop in a thread and the video rendering in a separate thread (with a bit of sync'ing) I could net a bit of time during hardware blocks to start preparing the next frame, maybe do a bit of extra occlusion culling\tesselation or AI. Of course I have no idea about video cards and blocking. (Like maybe on a T&L card it would just gather it's data and go off for a while doing things and not bother the CPU. If the card however keeps accessing the ram however the context switches might bite back at any gains you make) Does anyone know if this is a worth the effort? (I'm already threading the network and file system queues so there is no learning involved) Chris Brodie (I apologise for the HTML. My mail department says your not getting it :) ) |
From: Chris B. <Chr...@ma...> - 2000-08-21 03:40:19
|
I've had a thought for a while. If I ran my main loop in a thread and the video rendering in a separate thread (with a bit of sync'ing) I could net a bit of time during hardware blocks to start preparing the next frame, maybe do a bit of extra occlusion culling\tesselation or AI. Of course I have no idea about video cards and blocking. (Like maybe on a T&L card it would just gather it's data and go off for a while doing things and not bother the CPU. If the card however keeps accessing the ram however the context switches might bite back at any gains you make) Does anyone know if this is a worth the effort? (I'm already threading the network and file system queues so there is no learning involved) Chris Brodie (I apologise for the HTML. My mail department says your not getting it :) ) |
From: jason w. <jas...@po...> - 2000-08-21 02:25:45
|
imperium galactica has some some very nice transitions from solar system level down to surface level. If you want a real mathematical approach, I'm sure there is one out there.. the principle is one of scaling the viewspace 1 unit to something that matches the current scope. The other approach is the animators approach: as you fly down to the ground, you go through clouds, which obscure the screen enough to swap to a completely different scene database. |
From: jason w. <jas...@po...> - 2000-08-21 02:22:14
|
Yeah, it's interesting.. Spheres are particularly nice because the sphere radius is the same as the circles radius after projection (heh.. Ron can tell us what that exact property is properly called... I know it's not invarient). But of course, the disadvantage is that they're a poor fit. Did you ever think of using a sphere tree, somewhat like the qspat systems structures? I think there's some real hope for very organic complexe outdoor environments to be based around a system.. basicly you'd have individual objects represented as in qsplat, and some sort of scene clustering collecting them into a tree. Then, each frame, start at the top of the tree, and check if any of the nodes occlude any of the others.. descend down the tree in similar way until you decide each node must be visable. Obviously, you'd need more smarts than that, since it could potencially be very slow to clip one highly concave object to another, like one tree against another, for example. You also need some form of fusion... storing a coverage ratio for each node in the sphere tree, much like HOM's image pyramid, would at least let you break out at a specific tolerence. The checks would have to be brutally fast tho to get to any really detailed environment. |
From: Akbar A. <sye...@ea...> - 2000-08-21 02:03:38
|
>I think the idea seems very far-fetched in general the development studio lionhead with there title "Black and White" are probably facing some of these same problems. here's a preview of the game that i found while searching for the studios name ;) http://pc.ign.com/previews/3897.html peace. akbar A. "We want technology for the sake of the story, not for its own sake. When you look back, say 10 years from now, current technology will seem quaint" Pixars' Edwin Catmull. -----Original Message----- From: gda...@li... [mailto:gda...@li...]On Behalf Of Scott Justin Shumaker Sent: Sunday, August 20, 2000 8:47 PM To: gda...@li... Subject: Re: [Algorithms] Massive spaces Obviously, you'll need to use some kind of LOD. There's no way you can even handle a single city with that kind of detail if all is being displayed at once. If you look up at the sun from earth, you only see a glowing yellow-orange ball (well, last time I checked, I don't get out much) ;). You don't need to actually model the detail from the range you're at. Although, the amount of data that needs to be generated to give that kind of detail will probably entail dynamic, on-the-fly detail, since there's no way you could store even a single, say, statue or piece of sculpture, with 1/10000 meter resolution (take a look at the Digital Michaelangelo project for an idea of the undertaking). To tell you the truth, I think the idea seems very far-fetched in general, but that's just my $.02. :) -- Scott Shumaker sjs...@um... On Mon, 21 Aug 2000, Sam Kuhn wrote: > Hey, > > I've got a little mind game for you lot. > > Say you needed to render an entire solar system, in which you can zoom down to a planet surface and see detail of about 1/100000 meter, or shoot over to the sun and see the same sort of detail. There are a couple of immediate problems: > > You would need need an enormous view volume (what say 20 lightyears big) which gives an appauling zbuffer resolution. > So whats are you to do? Render the distant objects closer than they actually are (i.e. at the far end of a planet-sized view volume) and scale them down accordingly?. This surely would give incorrect parralax between the distant objects > > Also you need the planets to be of enormous size so that the front clipping plane doesn't cut through the planets fine detail when you are in very close (lets say an ant view for example). Which again buggers the zbuffer, since you're using a planet sized view volume. > > Any ideas? > > Regards, > > Sam. > > > _______________________________________________ GDAlgorithms-list mailing list GDA...@li... http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: Tom F. <to...@mu...> - 2000-08-21 01:56:19
|
(1) Partition your Z buffer. In D3D this is done using the dvMinZ and dvMaxZ values in the viewport. There is a pretty direct equivalent in OpenGL - I forget what it's called. Then you can even out your precision quite happily. (2) They're planets. Big spherical things. And they don't (usually - depends what sort of game you are doing I guess :-) actually intersect in space. Possibly nothing in the universe is easier to sort from back to front and draw that way. So you can nail them to the far clip plane, set your Z test to LESSTHAN, and there you go. (3) Move your near clip plane according to the nearest object. So if you are on a planet's surface, the near clip plane is quite close. If you are zooming about in space, it's millions of miles away. By using all these three, you can get huge spaces done pretty easily. If the archives are working, this has been discussed a fair number of times. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. -----Original Message----- From: Sam Kuhn [mailto:sa...@ip...] Sent: 21 August 2000 02:33 To: gda...@li... Subject: [Algorithms] Massive spaces Hey, I've got a little mind game for you lot. Say you needed to render an entire solar system, in which you can zoom down to a planet surface and see detail of about 1/100000 meter, or shoot over to the sun and see the same sort of detail. There are a couple of immediate problems: You would need need an enormous view volume (what say 20 lightyears big) which gives an appauling zbuffer resolution. So whats are you to do? Render the distant objects closer than they actually are (i.e. at the far end of a planet-sized view volume) and scale them down accordingly?. This surely would give incorrect parralax between the distant objects Also you need the planets to be of enormous size so that the front clipping plane doesn't cut through the planets fine detail when you are in very close (lets say an ant view for example). Which again buggers the zbuffer, since you're using a planet sized view volume. Any ideas? Regards, Sam. |
From: Scott J. S. <sjs...@um...> - 2000-08-21 01:47:29
|
Obviously, you'll need to use some kind of LOD. There's no way you can even handle a single city with that kind of detail if all is being displayed at once. If you look up at the sun from earth, you only see a glowing yellow-orange ball (well, last time I checked, I don't get out much) ;). You don't need to actually model the detail from the range you're at. Although, the amount of data that needs to be generated to give that kind of detail will probably entail dynamic, on-the-fly detail, since there's no way you could store even a single, say, statue or piece of sculpture, with 1/10000 meter resolution (take a look at the Digital Michaelangelo project for an idea of the undertaking). To tell you the truth, I think the idea seems very far-fetched in general, but that's just my $.02. :) -- Scott Shumaker sjs...@um... On Mon, 21 Aug 2000, Sam Kuhn wrote: > Hey, > > I've got a little mind game for you lot. > > Say you needed to render an entire solar system, in which you can zoom down to a planet surface and see detail of about 1/100000 meter, or shoot over to the sun and see the same sort of detail. There are a couple of immediate problems: > > You would need need an enormous view volume (what say 20 lightyears big) which gives an appauling zbuffer resolution. > So whats are you to do? Render the distant objects closer than they actually are (i.e. at the far end of a planet-sized view volume) and scale them down accordingly?. This surely would give incorrect parralax between the distant objects > > Also you need the planets to be of enormous size so that the front clipping plane doesn't cut through the planets fine detail when you are in very close (lets say an ant view for example). Which again buggers the zbuffer, since you're using a planet sized view volume. > > Any ideas? > > Regards, > > Sam. > > > |
From: Sam K. <sa...@ip...> - 2000-08-21 01:33:49
|
Hey, I've got a little mind game for you lot. Say you needed to render an entire solar system, in which you can zoom = down to a planet surface and see detail of about 1/100000 meter, or = shoot over to the sun and see the same sort of detail. There are a = couple of immediate problems: You would need need an enormous view volume (what say 20 lightyears big) = which gives an appauling zbuffer resolution. So whats are you to do? Render the distant objects closer than they = actually are (i.e. at the far end of a planet-sized view volume) and = scale them down accordingly?. This surely would give incorrect parralax = between the distant objects Also you need the planets to be of enormous size so that the front = clipping plane doesn't cut through the planets fine detail when you are = in very close (lets say an ant view for example). Which again buggers = the zbuffer, since you're using a planet sized view volume. Any ideas? Regards, Sam. |
From: <Lea...@en...> - 2000-08-21 01:19:02
|
> Some things that I did a few years ago that may interest people with > gross culling of objects in screen space that worked quite well in the > end, that may interest some people (and may not too... :) It was pretty > much a mix of my own algorithms and the Occlusion Using Shadow Frusta paper. Of course, it may only interest you once... :) Leathal. |
From: <Lea...@en...> - 2000-08-21 00:40:55
|
> I have been looking at something very similar. My version adds functions to > allows enscribed volumes to be set for each primative group. The occulion > list would start empty each frame. primative groups would then be tested > against the list as they pass though the pipeline, and be added to the list. > This means you would really have to be using front to back rendering for at > least 'major' objects but can be implemented totally in the geometry > subsystem. Some things that I did a few years ago that may interest people with gross culling of objects in screen space that worked quite well in the end, that may interest some people (and may not too... :) It was pretty much a mix of my own algorithms and the Occlusion Using Shadow Frusta paper. Sphere Shadow - a sphere that is projected into screen space that consists on the minimal volume that will fit entirely within an object... you can transform the centre of the sphere and keep a screen size radius structure (hardest bit was working out the volume that fits -- but this can be done offline) BBox Shadow - as per sphere, with a BBox in screen space. I found the screen space sphere shadow to be extremely fast, easy and accurate, although it is a very limited method with respect to the volume most spheres end up occupying... for complex objects you can have multiple spheres... The entire cull algorithm for objects is then something like transform object sphere to screen space, getting center and radius for each sphere shadow occluder compare object center + radius with shadow size + radius Advantages: * It's not reliant on getting info back from the buffers * It doesn't take long to check a lot of objects against several occluders * It's flippingly quick... :) Disadvantages: * For mirrored areas, you may need to check and transform twice * Reflective surfaces such as floors in Unreal have the same problem * Useless for transparent objects, but you get that anyway... * Long thin objects suck for shadow sphere occluders For trees and terrain you need to weigh up how visible objects are behind the vegetation -- for example, if a tree has really bushy leaves you can use a distance based shadow sphere occluder that comes into play at longer distances and then actually render a billboarded quad where the shadow sphere occluder is to not have objects just disappear behind the vegetation. (You will need to blend this over time for most effective results though). A simple view independent distance test does fine... Leathal. |
From: Iain N. <i.a...@li...> - 2000-08-21 00:11:52
|
> Nope, think more like: > begin(indexed_array); > setboundshint(my_boundingvolume); > draw(my_iarray); > end(indexed_array); I have been looking at something very similar. My version adds functions to allows enscribed volumes to be set for each primative group. The occulion list would start empty each frame. primative groups would then be tested against the list as they pass though the pipeline, and be added to the list. This means you would really have to be using front to back rendering for at least 'major' objects but can be implemented totally in the geometry subsystem. |
From: jason w. <jas...@po...> - 2000-08-20 23:15:24
|
it appears apple holds a patent on heirarchical z-buffering. However, they seem to have a patent on multi-pass rendering as well, so perhaps it's a non-issue. |
From: jason w. <jas...@po...> - 2000-08-20 23:13:23
|
> So what you are asking is for results at (2) to affect what is done at (1) > on the very next triangle. the only way this can be done is to hold the No.. I was very specific that 'primative' meant a set of triangles, as in a indexed array, or a higher level surface like a patch (such as the n-patches). Of course trying to make the ganularity at the triangle level would fall apart.. there'd be no gain. However, on arrays of traingles, such as my earlier example of a character behind a wall... > OK, well there is only the Radeon that does this at the moment. It's cool, > but it's nowhere near commonplace. And it still requires that the test > triangles be rasterised - converted into pixel representation. The details > of Z-testing are not important. The fact that you first have to rasterise > them is the killer. Right.. but I'm not talking about rasterizing test triangles, I'm talking about using a bounding volume directly. There are *much* faster ways to getting to a conservative test of "will everything within this bounding region be behind everything already rendered to this screen area." Such a conservative test need not know the actual z for the entire region, it merely needs to know the range of z in the bounding primative's screen region, as well as the range containted in the bounding primative (after perspective). Although this seems perhaps overly conservative, I think most games exhibit so much spatial locality that such a test will still have good gains. You need not even render front->back, merely render world geometry befor character geometry (a typical fps thing anyhow), and you'll get gains. Maybe not worth the gates it would take to impliment the feature tho, I'll grant you that. > You: draw & test bounding object. Draw real object. > Me: test previous frame's object results. Draw this frame's low-rez object. nope.. What I'm proposing is a hint that can be used to early out or reject the current triangle array. The hint could even be checked in parellel with processing the first few verticies of the array to avoid any gaps in the pipe for when the hint doesn't cull the array. > Plus, you also had to send down the polygon data for your bounding object, > while I didn't. It's usually small for a bounding object, but it's not zero. > > [snip stuff that also isn't right, but...] > > > You misunderstood.. I never said anything about drawing > > anything. Just a > > bounding volume hint, which is a very different thing. > > There's plenty of > > existing work for converting a OBB to exact screen regions > > *very* quickly > > without resorting to scan conversion/rasterization. We're > > only interested in > > conservative values as well, since it's common for a > > character model to be > > completely seperate from a set of wall polygons. > > OK, if you did this sort of incredibly conservative (i.e. add hardware to > T&L OBB in some quick but conservative way, find enclosing screen BB, test > all Z values using some sort of quickie rectangle rasteriser, somehow > dodging the bullet of concurrent Z-buffer access with polys that are > currently being rasterised), maybe it would work sometimes. But remember > that you're finding the screen BB of an OBB. So the area being tested is > quite big compared to your original shape. And that's still a decent chunk > of hardware. Yes, a very large chunk of hardware.. but hey, so far 3dfx and nvidia have shown no fear of going to some of the largest gate counts ever attempted. I never said I was sure it was a good tradeoff to impliment, but that I think that early rejection in rasterization *is* a good thing in the interests of increased scalability. > I _still_ don't see what is so bad about adding zero hardware to existing > chips and using some frame-to-frame coherency. I never said anything was bad about it.. as an application level scheme where you know about, plan around it's limits, it should work fine. But it's not a general purpose extension, like adding early rejection to the hardware would be. Now, the rejection would have little effect on large classes of scenes, but on the other hand, 90 of all triangles rendered in the universe are from quake or a similar game... in other words, the relevant class, which has lots of spatial coherency, is definately dominate. > latency is terrible. What you are relying on is a short latency. And in the no.. I'm relying on a latency of the delayed z bounds being somewhere on the scale of a couple 'primatives'. > This is highly app-specific though - the app can happily modify its > interpretation of the results based on the above. Whereas if you leave it > all up to the hardware, it can get very had to get consistent framerates. true. > Your method actually _removes_ control from the app. That is not going to > help to get consistent, smooth framerates - if anything, it will give you > the opposite. no.. it can always choose not to give the hints :). However, because the granularity is finer on hardware rejection, moments when you could get close to a dropped frame are not frame specific so much as action specific: like a 1000 poly character walks around a corner and becomes visable. But on a camera cut or teleportation, the hardware rejection can still work, and still effectively shorten the rendering time of that specific frame.. the frame2frame cohearency is powerless to do anything until the next frame. > > Ouch.. hadn't thought much about the driver related issues. > > However, *if* > > state was constant accross a primative, it's not a problem. > > That would be a > > big issue, but I don't think it's insurmountable. > > Except that was one of your supposed "plus" points - that state wouldn't > have to be changed if the object was rejected! True.. so it probibly wouldn't help you reduce texture downloads. But then again, reducing bus traffic isn't the primary goal of this sort of rejection: it's reducing invisable fill/primative consumption. > (1) There _is_ early rejection in rasterisation pipes. Hierarchial Z is > massively cool, but relatively conventional. Right... I need to go read up on that (one of those things on the list). And the truely wicked implimentation: use both the app level frame2 frame coherancy and the hardware rejection :). |
From: Tom F. <to...@mu...> - 2000-08-20 22:20:52
|
> From: jason watkins [mailto:jas...@po...] > > > That's not the performance hit. You are submitting tris like this: > > > > BeginChunk > > Draw tester tris > > EndChunk > > if ( previous chunk rendered ) > > { > > Draw real tris > > } > > > Nope, think more like: > begin(indexed_array); > setboundshint(my_boundingvolume); > draw(my_iarray); > end(indexed_array); > > it's handed off as a single, seperatable transaction.. the hint merely > allows the hardware to quickly reject the entire array *if* > it's obvious > it's hidden.. I would think this happens fairly often, like > when a character > model is behind a wall, for example. Looks the same to me - the point being that the "hint" is just before the triangles that are gated by it. the hardware can't rearrange triangle order or interleave or anything made like that - that's just not how hardware works (except for wacky stuff like scene-capture architecture, which has some very different problems to cope with, and so far has failed to live up to its claims). > So what you're not getting, is that the *if* is _not_ a > blocking *if*. If it doesn't block, then it's not much good. Not block = does nothing (or very little). > It's > just a hint.. the hardware can deal with the hint in many > ways.. If you want to abstract things this way, then this is very definately an API abstraction. This is not something that can be done directly in hardware. > it's true > that it would work best in a heirarchical z pipeline, but it > should still > work in the typical. How that z information gets relayed back to the > rejection block is an open question.. Erm... faster than light. OK, here is the pipeline of a typical T&L&rasterise chip: -Index AGP read -Vertex cacheing -Vertex AGP read (1) -Transform vertices -Light vertices -Clip vertices -Project vertices -Construct triangle from vertices -Backface cull -Rasteriser setup -Rasterise -Read Z buffer -Test against Z buffer (2) -Etc. (rest of pixel pipeline). So what you are asking is for results at (2) to affect what is done at (1) on the very next triangle. the only way this can be done is to hold the "drawn" triangles at (1) until all the "test" tris have passed (2). So the pipeline from (1) to (2) is empty. It has no triangle info in it at all. That is a huge bubble - probably hundreds of clock cycles long. You noticed all that complex floating-point maths in the middle, didn't you? Each floating-point operation has many pipelined clock stages, and there are a lot of operations in that section of the chip. It's a massive bubble, and no AGP FIFO is going to deal with those sorts of delays. > but I can think of > several ways in a > typical architecture.. it's cacheing scanlines anyhow, Not in a typical architecture it's not. But let's say it was... > so it could do > something like relay the maximal value for every 4 z's in the > scanline being > unloaded from cache back to the rejection block, where the > rejection block > has it's own low res local cache. OK, well there is only the Radeon that does this at the moment. It's cool, but it's nowhere near commonplace. And it still requires that the test triangles be rasterised - converted into pixel representation. The details of Z-testing are not important. The fact that you first have to rasterise them is the killer. > The details of how this > works could take > many different forms.. the point being is that you only need > delayed z info, > and that having the hint processed on chip means that you can > do it inside a > single frame instead of relying on a previous frame. I just don't see the problem with relying on the previous frame. There are hundreds of algorithms that we use every day in code that rely on frame-to-frame coherency for speed. One more is not going to drive people bonkers. > > The problem is that the if() is being done at the very start of the > pipeline > > (i.e. the AGP bus - any later and you lose most of the gain), > > Nope.. you gain fill rate by reduced depth complexity. You could gain > effective polygon bandwidth as well. No, you don't. Consider the case where the object is invisible: You: draw & test bounding object. Don't draw real object. Me: test previous frame's object results. Draw this frame's low-rez object. Pretty even - we both draw an object that is roughly the right number of pixels on-screen. Both get fast Z-rejects (by whatever hierarchial Z method you like) and fetch no texels. OK, now a drawn object: You: draw & test bounding object. Draw real object. Me: test previous frame's object results. Draw this frame's low-rez object. Looks like I win. I drew one object, you draw an object & tested a bounding object. True, you didn't need to fetch texels or write out results for your bounding object, but you staill rasterised & checked _something_. I didn't. Plus, you also had to send down the polygon data for your bounding object, while I didn't. It's usually small for a bounding object, but it's not zero. [snip stuff that also isn't right, but...] > You misunderstood.. I never said anything about drawing > anything. Just a > bounding volume hint, which is a very different thing. > There's plenty of > existing work for converting a OBB to exact screen regions > *very* quickly > without resorting to scan conversion/rasterization. We're > only interested in > conservative values as well, since it's common for a > character model to be > completely seperate from a set of wall polygons. OK, if you did this sort of incredibly conservative (i.e. add hardware to T&L OBB in some quick but conservative way, find enclosing screen BB, test all Z values using some sort of quickie rectangle rasteriser, somehow dodging the bullet of concurrent Z-buffer access with polys that are currently being rasterised), maybe it would work sometimes. But remember that you're finding the screen BB of an OBB. So the area being tested is quite big compared to your original shape. And that's still a decent chunk of hardware. I _still_ don't see what is so bad about adding zero hardware to existing chips and using some frame-to-frame coherency. > > (2) No extra triangles needed. OK, the bounding box is a > pretty small > number > > of tris, but what if you wanted to do this scheme with lots > of smallish > > object? Might get significant then. > > again, it's a hint, not a set of triangles.. no added triangles. > > > (3) (and this is the biggie) It is already supported by > tons of existing, > > normal, shipped, out there hardware. Not some mystical > future device. > Real, > > existing ones that you have probably used. > > *very* true, very good point. Indeedy. [snip] > > Huge deal if the delay is longer than a few tens of clock > cycles. The AGP > > FIFOs are not very big, and bubbles of the sort of size you > are talking > > about are not going to be absorbed by them. So for part of > your frame, the > > AGP bus will be sitting idle. And if, as is happening, you > are limited by > > AGP speed, that is going to hurt quite a lot. > > as long as the gap in the pipeline as it starts fetching the > next stream > after a rejection is shorter than the number of cycles it > would have taken > to finish the rejected stream, you win. Considering that a cycle = 4 > rasterized pixels or so, and that trinagles typically are > 4-8x that, and > that arrays are typically 10 tris or more, I think it's not > to much of a > worry. Unless it really does take 200 cycles of a ~150mhz part to set > up/redirect the dma. You're confusing _throughput_ with _latency_. The typical on-screen textured pixel may take thousands of clock cycles from its triangle being read in by the AGP bus, to actually being written onto the screen. However, the next one will be right behind it. so the throughput of chips is massive, but the latency is terrible. What you are relying on is a short latency. And in the graphics chip world, latency is very very expendable. Huge pipelines, massively parallel, quarter of the chip is FIFOs, multiple stages in even the simplest operations. That is what makes graphics chips fast. Do not stall the pipe, or you're toast. Those are the keys. This technique blows all that out of the water. [snip] > > What's wrong with frame-to-frame coherence? Remember, if > there is a camera > > change or cut, the application can simply discard all the > visibility info > it > > has and just draw everything, until it has vis information > for the new > > camera position. > > A couple things.. originally I didn't think this was a big > deal, but later > changed... I think making assumptions is bad, and I > definately think that > consistant framerate is more important than a high > instantanious. Nothings > more annoying than jumping through a portal in a game to get > dropped frames > for a few frames before it gets everything sorted out and > cached and gets > back up to 60fps (or whatever the target is). Making the > granularity of > rejection sub-frame should help avoid this... Also, when > you're using an in > engine cinematic approach, it's really annoying when you get > a dropped frame > every time the camera cuts. This is highly app-specific though - the app can happily modify its interpretation of the results based on the above. Whereas if you leave it all up to the hardware, it can get very had to get consistent framerates. Your method actually _removes_ control from the app. That is not going to help to get consistent, smooth framerates - if anything, it will give you the opposite. > > No no no. There is no way you could get the _hardware_ to > reject state > > change info and texture downloads because of some > internally fed-back > state. > > Drivers rely very heavily on persistent state, i.e. not > having to send > state > > that doesn't change. If the driver now doesn't know whether > the state > change > > info it sent actually made it to the chip's registers or > not, that's just > > madness - the driver will go potty trying to figure out > what it does and > > doesn't need to send over the bus. Ditto for downloading > textures. Since > the > > driver can't know whether the hardware is going to reject > the tris or not, > > it always has to do the texture downloads anyway. And if > the hardware is > > doing the downloads instead (e.g. AGP or cached-AGP > textures), then either > > solution works - the fast-Z-rejection of pixels means those > textures never > > get fetched. > > Ouch.. hadn't thought much about the driver related issues. > However, *if* > state was constant accross a primative, it's not a problem. > That would be a > big issue, but I don't think it's insurmountable. Except that was one of your supposed "plus" points - that state wouldn't have to be changed if the object was rejected! > So, maybe I'm foggy on some details.. but I still thing early > rejection in > rasterization pipes is a *good thing*tm :). (1) There _is_ early rejection in rasterisation pipes. Hierarchial Z is massively cool, but relatively conventional. (2) It's not the rasteriser that needs the speeding up. We have some awesomely fast rasterisers at the moment. But the T&L is a bottleneck under some situations (complex lighting, high tesselation), and the AGP bus is the bottleneck under others. Those are the things that need conserving right now. Tom Forsyth - Muckyfoot bloke. Whizzing and pasting and pooting through the day. |