Thread: Re: [Algorithms] EPA : working ?
Brought to you by:
vexxed72
From: <Pau...@sc...> - 2004-04-26 16:18:37
|
On 26/04/2004 17:11:06 gdalgorithms-list-admin wrote: >Hi, > > I started playing with Solid 3.5, in particular the query computing the > penetration depth between convex objects using the Expanding Polytope > Algorithm (EPA). > > While it seems to work most of the time, it also fails pretty often > (returned penetration vector is simply wrong). It mainly happens when two > similar objects (say cubes) have the same rotation (say they're vertically > stacked). This is a bit frustrating since that's like the first test I did, > and it immediately failed. > > After some investigation it seems to break for cases that usually make the > SAT break as well : two parallel edges giving a zero cross product. While I > know how to fix a SAT-based test to handle those cases, I haven't the > faintest clue where to even start looking in Solid 3.5 in search of the > equivalent tests (if they even exist). > > I contacted Gino and he said he was busy and he'll investigate this later, > so meanwhile I have a question for the list : do you know if it's a problem > in the implementation (Solid is broken) or a more profound problem in the > algorithm itself ? I wouldn't want to go and implement my own version to > realize in the end it can't work anyway :( > > So, EPA, worth it or not ? And if not, suggested alternatives ? (DEEP failed > miserably for me). I was under the impression that GJK was the normal recommened tool for the job (even though its slower under penetration), but I guess you ruled that out already since EPA is based on it? I did try vclip myself briefly, but I couldn't get it to work quite right and reverted back to my partially explicit minkowski sum routine until I have some more time to play with vclip. Cheers, Paul. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify pos...@sc... This footnote also confirms that this email message has been checked for all known viruses. ********************************************************************** SCEE 2004 |
From: Gino v. d. B. <gva...@pl...> - 2004-04-27 08:24:47
|
First of all, there is a fuzzy area between contact and interpenetration. Tiny interpenetrations usually do not count as such. EPA returns a non-intersection for close contacts. EPA basically comes down to blowing up the tetrahedron returned by GJK. This could fail because of the followng reasons: 1) The simplex returned by GJK is not a tetrahedron. For this case, EPA heuristically adds vertices to the simplex in order to make it a volume. However, if the origin is not an internal point of the created volume then EPA will fail, returning a non-penetration. NB: the origin has to be an internal point. It can not lie on the boundary of the initial polytope. 2) The tetrahedron returned by GJK may not contain the origin. Due to tolerances used in the finite-precision version, GJK may terminate returning an intersection when the orign lies very close but not inside the current simplex. This does not necessarily mean that the origin lies on the boundary of the polyhedron. It can be an internal point of the polyhedron but lie outside the current simplex.=20 Case 2 can be solved by getting rid of the tolerance, for instance by using interval arithmetic.=20 Case 1 is more fundamental and usually occurs for a pair of symmetric object and axis-aligned objects. The stacked boxes is a good example. I do not have a hard proof that theoretically (infinite precision) the heuristics used by EPA always result in a volume containing the origin if the origin is indeed an internal point of the full polyhedron. I have not found a counter example for this case either, so if you can provide me one that would help. Could you check where the following function exits, in case of a false penetration depth? =20 bool penDepth(const DT_GJK& gjk, const DT_Convex& a, const DT_Convex& b, MT_Vector3& v, MT_Point3& pa, MT_Point3& pb) Finally, for boxes I would not use EPA but go for a brute-force approach when computing penetration depths. The Minkowski sum of two boxes has only 30 faces at most. EPA is useful for complex polyhedra and arbitrary convex objects. Gino van den Bergen www.dtecta.com =20 -----Original Message----- From: gda...@li... [mailto:gda...@li...] On Behalf Of Pierre Terdiman Sent: Monday, April 26, 2004 18:11 To: gda...@li... Subject: [Algorithms] EPA : working ? Hi, I started playing with Solid 3.5, in particular the query computing the penetration depth between convex objects using the Expanding Polytope Algorithm (EPA). While it seems to work most of the time, it also fails pretty often (returned penetration vector is simply wrong). It mainly happens when two similar objects (say cubes) have the same rotation (say they're vertically stacked). This is a bit frustrating since that's like the first test I did, and it immediately failed. After some investigation it seems to break for cases that usually make the SAT break as well : two parallel edges giving a zero cross product. While I know how to fix a SAT-based test to handle those cases, I haven't the faintest clue where to even start looking in Solid 3.5 in search of the equivalent tests (if they even exist). I contacted Gino and he said he was busy and he'll investigate this later, so meanwhile I have a question for the list : do you know if it's a problem in the implementation (Solid is broken) or a more profound problem in the algorithm itself ? I wouldn't want to go and implement my own version to realize in the end it can't work anyway :( So, EPA, worth it or not ? And if not, suggested alternatives ? (DEEP failed miserably for me). - Pierre ------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=3D12297 _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 |
From: Gino v. d. B. <gva...@pl...> - 2004-04-27 09:24:13
|
Are you sure? This means that you ran out of facets, i.e. your expanding polytope has more than 200 facets?? Try enlarging these by an order of magnitude: const int MaxSupportPoints =3D 1000; const int MaxFacets =3D 2000; Gino -----Original Message----- From: gda...@li... [mailto:gda...@li...] On Behalf Of Pierre Terdiman Sent: Tuesday, April 27, 2004 11:17 To: gda...@li... Subject: Re: [Algorithms] EPA : working ? >Could you check where the following function exits, in case of a false >penetration depth? I'll do that again ASAP but IIRC it was breaking here : if (!firstFacet) { break; } Overall it didn't seem to hit any weird place or exceed specified tolerances. Tracing the code I didn't see obvious difference between a "working case" and a "non-working case". Seems to use the same code-paths and do the same thing, except in one case returned result is wrong. I'll try to create a repro case / dump object data in a non-working pose. - Pierre ------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=3D12297 _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 |
From: Pierre T. <pie...@no...> - 2004-04-27 11:37:53
|
>Try enlarging these by an order of magnitude: It didn't change anything. However I noticed 2 things : 1) In a particular scene, the Debug version failed while the Release version worked. However the Release version still failed for another scene. 2) There's another reason why it can reach the part of the code I mentioned. That's when this method fails : if (facet->computeClosest(yBuf)) { ... } i.e. when this is false : if (m_det > MT_Scalar(0.0)) Apparently that's what happens for failing cases. I get this : v1dv1 * v2dv2 - v1dv2 * v1dv2 = 1.5281926835086779e-009 ...which gets truncated to 0 for m_det. Perhaps compiling using doubles will help, but it doesn't look like a fix - just a way to hide the bug. I tried a few work-arounds. Just executing the first branch above all the time actually does fix the scene that failed or not in Debug/Release. However it doesn't fix all problems (in particular, my box stack still collapses). It should give you an idea of what's happening, nonetheless. I'm really interested in a fix for this one :) - Pierre |
From: Pierre T. <pie...@no...> - 2004-04-27 12:43:07
|
BTW it's not just an epsilon issue: at some point I found a redundant vertex in "computeClosest()", i.e. v1 or v2 was exactly the same as p0. So the determinant is exactly zero is this case, no matter what. - Pierre |
From: <Pau...@sc...> - 2004-04-27 10:29:17
|
On 27/04/2004 10:16:58 gdalgorithms-list-admin wrote: >> I was under the impression that GJK was the normal recommened tool for the > > job (even though its slower under penetration), but I guess you ruled that > > out already since EPA is based on it? > > AFAIK GJK doesn't work when objects are penetrating. What works is the > extension of GJK named EPA. So I didn't "rule out" GJK, it's just not enough > alone. Hmmm, the title of Cameron's GJK paper "enhancing gjk: computing minimum and penetration distances between convex polyhedra" leads me to believe that this varient does at least... > > I did try vclip myself briefly, but I couldn't get it to work quite right > > and reverted > > back to my partially explicit minkowski sum routine until I have some more > > time to play with vclip. > > Last time I checked, V-Clip didn't handle penetrating cases either. The paper on vclip says it does handle penetration. http://www.merl.com/papers/TR97-05/ Cheers, Paul. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify pos...@sc... This footnote also confirms that this email message has been checked for all known viruses. ********************************************************************** SCEE 2004 |
From: Nick N. \(C. S. Ltd\) <Nic...@ci...> - 2004-04-27 11:46:09
|
V-Clip does correctly identify penetrating objects as intersecting. However, it only gives local witnesses to penetration, so this will not give an accurate value for penetration depth. If you had the linear velocities of the objects you can work this out however via the projection onto the relative velocity of the movement required to separate the object (along the relative velocity - effectively pushing the objects apart to the position of first contact). Cheers, Nick. -----Original Message----- From: gda...@li... [mailto:gda...@li...] On Behalf Of Pau...@sc... Sent: 27 April 2004 11:29 To: gda...@li... Subject: Re: [Algorithms] EPA : working ? On 27/04/2004 10:16:58 gdalgorithms-list-admin wrote:=20 >> I was under the impression that GJK was the normal recommened tool for=20 the > > job (even though its slower under penetration), but I guess you ruled=20 that > > out already since EPA is based on it? >=20 > AFAIK GJK doesn't work when objects are penetrating. What works is the > extension of GJK named EPA. So I didn't "rule out" GJK, it's just not=20 enough > alone. Hmmm, the title of Cameron's GJK paper "enhancing gjk: computing minimum and penetration distances between convex polyhedra" leads me to believe that this varient=20 does at least... > > I did try vclip myself briefly, but I couldn't get it to work quite=20 right > > and reverted > > back to my partially explicit minkowski sum routine until I have some=20 more > > time to play with vclip. >=20 > Last time I checked, V-Clip didn't handle penetrating cases either. The paper on vclip says it does handle penetration. http://www.merl.com/papers/TR97-05/ Cheers, Paul. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify pos...@sc... This footnote also confirms that this email message has been checked for all known viruses. ********************************************************************** SCEE 2004 ------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=3D12297 _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 |
From: Simon F. <si...@vi...> - 2004-04-27 12:04:40
|
>-----Original Message----- >From: Pierre Terdiman [mailto:pie...@no...] >Sent: 27 April 2004 12:48 >To: gda...@li... >Subject: Re: [Algorithms] EPA : working ? > > >>Try enlarging these by an order of magnitude: > >It didn't change anything. However I noticed 2 things : > >1) In a particular scene, the Debug version failed while the >Release version >worked. However the Release version still failed for another scene. You wouldn't be running this sort of code on an x86 would you? I had nightmares with some of my own code (PVR-TC texture compressor) that ran perfectly well on a sensible architecture (eg Sparc, HP) where you ALWAYS got the same FP precision. With x86 the precision depends on the whims of the compiler as to what stays in the internal registers. Simon ****************** This e-mail has been sent from Imagination Technologies Limited. PowerVR, Metagence, Ensigma and PURE Digital are divisions of Imagination Technologies Limited. The information contained in this e-mail, including any attachment, is confidential and may be legally privileged. It is intended solely for the addressee(s) and access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying or distribution or use of the information contained in this e-mail, is prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender by return e-mail and then delete it from your system. Internet communications cannot be guaranteed to be secure, error or virus-free. The sender does not accept liability for any errors or omissions which arise as a result. Any views expressed in this message are those of the author, except where the author specifies and, with authority, states them to be the views of Imagination Technologies Ltd. |
From: Pierre T. <pie...@no...> - 2004-04-27 13:26:11
|
> You wouldn't be running this sort of code on an x86 would you? I had > nightmares with some of my own code (PVR-TC texture compressor) that ran > perfectly well on a sensible architecture (eg Sparc, HP) where you ALWAYS > got the same FP precision. > > With x86 the precision depends on the whims of the compiler as to what stays > in the internal registers. It's of course running on a perfectly plain old x86, and I ran into exactly that sort of issues no later than last week. An algorithm perfectly fine in theory failed in practice because one value had been kept in FP registers, keeping all the precision, while the other had been stored to memory. When later those two values got compared for equality, the comparison failed although those values were perfectly equal to begin with. Bad luck, the algo was waiting for such an equality to actually exit :( I found three fixes : - Change the algo. Not nice since it's otherwise cute and fine and fast. - Force the compiler to actually recompute the same value all over again instead of caching it. Worked in my case, but was slower, and who knows it's not going to be stored to memory anyway the day I compile with VC7 ? - Rewrite the small loop in assembly, where I can make sure the damn values are kept in FP registers. ...well, I now have more assembly in my codebase. Sigh. I think Solid's bug is less subtle than that, however. - Pierre |
From: Charles B. <cb...@cb...> - 2004-04-27 15:38:22
|
Most compilers have an option to "improving floating point consistency" (/fp) which will force the values to be stored back to floats in memory. This makes it consistent, but unfortunately makes it consistent in the low-precision way. The alternative is just to store doubles. At 03:36 PM 4/27/2004 +0200, you wrote: >An algorithm perfectly fine in theory failed in practice because one value >had been kept in FP registers, keeping all the precision, while the other >had been stored to memory. When later those two values got compared for >equality, the comparison failed although those values were perfectly equal >to begin with. Bad luck, the algo was waiting for such an equality to >actually exit :( -------------------------------------------------------------------------------------------- Charles Bloom email "cb" http://www.cbloom.com |
From: Jon W. <hp...@mi...> - 2004-04-27 18:33:56
|
> An algorithm perfectly fine in theory failed in practice because one value > had been kept in FP registers, keeping all the precision, while the other > had been stored to memory. When later those two values got compared for Some compilers have options to force FP data to memory, which may or may not help (I think MSVC calls this "increased FP consistency"). More importantly, you have control over the precision used internally. If your CPU has higher precision internally (say, 80 bits like the x87) than it stores externally (say, 64 bits) then you'll get these problems. The solution is to make sure you compute in a homogenous environment; the easiest way is to set the rounding mode to 53 bits mantissa rather than 69. Note that x87 promotes 32-bit float to full register width as well, and other CPUs may also do this, so if this matters, don't use float for your internal calculation datatypes. > - Rewrite the small loop in assembly, where I can make sure the damn values > are kept in FP registers. The only assembly you need in this case is "fnstcw" to make sure the internal precision matches the external precision. Oh, and when you mix values, it gets even better when you start thinking about denormals. One over a denormal is infinity. Zero times infinity is infinity. Make sure you always go denormal in a consistent way, or, even better, always slam denormals to 0 for your datatype. Cheers, / h+ |
From: Tom F. <tom...@ee...> - 2004-04-28 04:34:03
|
The speed hit for having a denormal in a calculation is colossal as well (for x87 - not sure about others). There's a mode in the floating-point units where you can force denormals to zero. In some cases this can lead to huge speedups. TomF. > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...] On > Behalf Of Jon Watte > Sent: 27 April 2004 11:33 > To: gda...@li... > Subject: RE: [Algorithms] EPA : working ? > > > > > An algorithm perfectly fine in theory failed in practice > because one value > > had been kept in FP registers, keeping all the precision, > while the other > > had been stored to memory. When later those two values got > compared for > > Some compilers have options to force FP data to memory, which may or > may not help (I think MSVC calls this "increased FP consistency"). > More importantly, you have control over the precision used internally. > If your CPU has higher precision internally (say, 80 bits > like the x87) > than it stores externally (say, 64 bits) then you'll get > these problems. > The solution is to make sure you compute in a homogenous environment; > the easiest way is to set the rounding mode to 53 bits mantissa rather > than 69. Note that x87 promotes 32-bit float to full register width as > well, and other CPUs may also do this, so if this matters, don't use > float for your internal calculation datatypes. > > > - Rewrite the small loop in assembly, where I can make sure the damn > values > > are kept in FP registers. > > The only assembly you need in this case is "fnstcw" to make sure the > internal precision matches the external precision. > > Oh, and when you mix values, it gets even better when you > start thinking > about denormals. One over a denormal is infinity. Zero times > infinity is > infinity. Make sure you always go denormal in a consistent > way, or, even > better, always slam denormals to 0 for your datatype. > > Cheers, > > / h+ |
From: Pierre T. <pie...@no...> - 2004-04-28 07:32:04
|
> The only assembly you need in this case is "fnstcw" to make sure the > internal precision matches the external precision. I think we already do that, but maybe we only modified the rounding mode and forgot to setup the accuracy as well. I'll double check this. ...thinking about it, that could explain why this code suddenly fails. It's been working for ages in an old codebase, and only started causing troubles recently, in a new one. The FPU was correctly setup in the old codebase, maybe not in the new framework... - Pierre |
From: Pierre T. <pie...@no...> - 2004-04-28 07:38:40
|
> Some compilers have options to force FP data to memory, which may or > may not help (I think MSVC calls this "increased FP consistency"). But isn't this option causing a general speed hit in the whole program ? - Pierre |
From: Charles B. <cb...@cb...> - 2004-04-28 07:55:53
|
Yep, it's not actually a good idea. Where you need to compare equality, I use some junk like this : pragma optimize off bool fequal(float f1,float f2) { static float s_f1; static float s_f2; s_f1 = f1; s_f2 = f2; return (s_f1 == s_f2); } pragma optimize on evil. At 09:49 AM 4/28/2004 +0200, you wrote: > > Some compilers have options to force FP data to memory, which may or > > may not help (I think MSVC calls this "increased FP consistency"). > >But isn't this option causing a general speed hit in the whole program ? > >- Pierre > > > >------------------------------------------------------- >This SF.Net email is sponsored by: Oracle 10g >Get certified on the hottest thing ever to hit the market... Oracle 10g. >Take an Oracle 10g class now, and we'll give you the exam FREE. >http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click >_______________________________________________ >GDAlgorithms-list mailing list >GDA...@li... >https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >Archives: >http://sourceforge.net/mailarchive/forum.php?forum_id=6188 -------------------------------------------------------------------------------------------- Charles Bloom email "cb" http://www.cbloom.com |
From: Tom F. <tom...@ee...> - 2004-04-29 06:38:05
|
The pragmas are not terribly portable are they? The real solution is to carbet-bomb the thing with the "volatile" keyword. Oh, and you all knew the cool thing about the random bottom-bit of = mantissa on the PS2 didn't you? Good, just checking. I'm just saying - testing = for equality (or inequality) in any way shape or form with floating-point numbers is just asking for trouble. Maybe not today. Maybe not tomorrow, = but soon and for the rest of your life. TomF. > -----Original Message----- > From: gda...@li...=20 > [mailto:gda...@li...] On=20 > Behalf Of Charles Bloom > Sent: 28 April 2004 00:55 > To: gda...@li... > Subject: Re: [Algorithms] EPA : working ? >=20 >=20 >=20 > Yep, it's not actually a good idea. Where you need to=20 > compare equality, I=20 > use some junk like this : >=20 > pragma optimize off > bool fequal(float f1,float f2) > { > static float s_f1; > static float s_f2; > s_f1 =3D f1; > s_f2 =3D f2; > return (s_f1 =3D=3D s_f2); > } > pragma optimize on >=20 > evil. >=20 > At 09:49 AM 4/28/2004 +0200, you wrote: > > > Some compilers have options to force FP data to memory,=20 > which may or > > > may not help (I think MSVC calls this "increased FP consistency"). > > > >But isn't this option causing a general speed hit in the=20 > whole program ? > > > >- Pierre > > > > > > > >------------------------------------------------------- > >This SF.Net email is sponsored by: Oracle 10g > >Get certified on the hottest thing ever to hit the market...=20 > Oracle 10g. > >Take an Oracle 10g class now, and we'll give you the exam FREE. > >http://ads.osdn.com/?ad_id=3D3149&alloc_id=3D8166&op=3Dclick > >_______________________________________________ > >GDAlgorithms-list mailing list > >GDA...@li... > >https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >Archives: > >http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 >=20 > -------------------------------------------------------------- > ------------------------------ > Charles Bloom email "cb" http://www.cbloom.com=20 >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market...=20 > Oracle 10g.=20 > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3D3149&alloc_id=3D8166&op=3Dclick > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 >=20 |
From: Jon W. <hp...@mi...> - 2004-04-28 19:12:27
|
> > Some compilers have options to force FP data to memory, which may or > > may not help (I think MSVC calls this "increased FP consistency"). > But isn't this option causing a general speed hit in the whole program ? In the specific cases of MSVC and GCC, yes, it's terrible. And it doesn't give you anything that you can't already get with fnstcw to set the precision mode to whatever you're using (float or double), in the specific case of the x86. PPC, last I used it, has the same internal width as external storage, so it might be a little less sensitive, although calculations in double precision stored to singles might still cause this kind of problem. Thus, always use doubles, unless the memory space or bandwidth becomes an issue :-) Cheers, / h+ |
From: J. W. R. <jra...@in...> - 2004-04-30 01:25:40
|
With the Nv40 coming out there are soon going to be cards that will let you render massive numbers of instanced objects without constantly stalling out. For quite some time we have been told to perform as few state changes as possible. However, when a transform is considered a 'state change' this has prevented developers from doing massive (on the order or tens of thousands) of dynamic objects at high framerate; even if the sum total of those objects did not actually come near the triangle throughput of the card. I'm currently exploring how to render the theoretical maximum number of instanced objects possible at high frame rate on the Nv40. Once you start dealing with massive numbers of low polygon instanced objects; very quickly you realize that just the transform data itself can add up to a lot of memory. I am curious if you guys can think of more efficient ways to send transform data in massive quantities. For example, representing all of your transforms as a quaternion and a translation would take less than half as much bandwidth as a full 4x4 matrix (assuming of course your objects have no scale.) There is also the challenge of trying to manage this many entities without iterating through them in various portions of your pipeline. Thoughts? |
From: Jon W. <hp...@mi...> - 2004-04-30 05:58:48
|
Character animation is core to our business, but alas we can't afford to target anything like this sexy hardware. That doesn't prevent me from thinking about it, though ;-) My thought is that animations, both quaternions and translations, and blending/morph targets, will be uploaded as texture data. Because you can specify separate derivatives for u and v directions, it should be possible to use this to blend smoothly across time (u direction) while getting full resolution along bone or vertex (v direction). Upload the world transform per mesh into a texture, and then all you render per mesh is a list of indices for vertices (so you know which index to read out of the texture-target-as-vertex-array) and a, much coarser strided, list of world transform indices; one per mesh instance. You can also blend between multiple separate animation kinds, either by texturing out of more than one texture, or rendering all the blended textures into a render target, and then using that as input when transforming the vertices. Send a few blend targets and animation index offsets extra per mesh, and you should be all set to go. And we get blending with 16-bit floating point formats. I really think that someone sitting down and doing a clean engine from scratch, based on this hardware, will come up with something remarkable! You may, in the end, need to use matrices, or 32-bit formats because of precision issues, but thinking hard about it, that really seems like a straight way forward. The only thing I haven't quite figured out is how to uniquely texture them. Although you could send per-mesh color values (skin tone, clothing colors, etc) and use those to map one or a few master "selector" textures and overlay some detail. Cheers, / h+ -----Original Message----- From: gda...@li... [mailto:gda...@li...]On Behalf Of John W. Ratcliff Sent: Thursday, April 29, 2004 6:25 PM To: gda...@li... Subject: [Algorithms] Streaming Transforms With the Nv40 coming out there are soon going to be cards that will let you render massive numbers of instanced objects without constantly stalling out. For quite some time we have been told to perform as few state changes as possible. However, when a transform is considered a 'state change' this has prevented developers from doing massive (on the order or tens of thousands) of dynamic objects at high framerate; even if the sum total of those objects did not actually come near the triangle throughput of the card. I'm currently exploring how to render the theoretical maximum number of instanced objects possible at high frame rate on the Nv40. Once you start dealing with massive numbers of low polygon instanced objects; very quickly you realize that just the transform data itself can add up to a lot of memory. I am curious if you guys can think of more efficient ways to send transform data in massive quantities. For example, representing all of your transforms as a quaternion and a translation would take less than half as much bandwidth as a full 4x4 matrix (assuming of course your objects have no scale.) There is also the challenge of trying to manage this many entities without iterating through them in various portions of your pipeline. Thoughts? ------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=6188 |
From: Sergei M. <se...@ha...> - 2004-04-30 07:59:40
|
Most of the skinned models just stands (right?) - so, single sin/cos + short4 position will do the job. For others - did matrices really take so much space? The transform stream samples each transform each vertex, but that is cached, isn't it? We skin our characters with different colors & masks too and it look quite fine. But it is quite an efford to pack into batch more that (even) one model with bones, I mean - <16 bones + colors/light pos/etc. and constants are over. ----- Original Message ----- From: "Jon Watte" <hp...@mi...> To: <gda...@li...> Sent: Friday, April 30, 2004 8:58 AM Subject: RE: [Algorithms] Streaming Transforms > > Character animation is core to our business, but alas we can't afford to > target anything like this sexy hardware. That doesn't prevent me from > thinking about it, though ;-) > > My thought is that animations, both quaternions and translations, and > blending/morph targets, will be uploaded as texture data. Because you can > specify separate derivatives for u and v directions, it should be possible > to use this to blend smoothly across time (u direction) while getting full > resolution along bone or vertex (v direction). > > Upload the world transform per mesh into a texture, and then all you render > per mesh is a list of indices for vertices (so you know which index to read > out of the texture-target-as-vertex-array) and a, much coarser strided, list > of world transform indices; one per mesh instance. > > You can also blend between multiple separate animation kinds, either by > texturing out of more than one texture, or rendering all the blended > textures into a render target, and then using that as input when > transforming the vertices. Send a few blend targets and animation index > offsets extra per mesh, and you should be all set to go. > > And we get blending with 16-bit floating point formats. I really think that > someone sitting down and doing a clean engine from scratch, based on this > hardware, will come up with something remarkable! > > You may, in the end, need to use matrices, or 32-bit formats because of > precision issues, but thinking hard about it, that really seems like a > straight way forward. > > The only thing I haven't quite figured out is how to uniquely texture them. > Although you could send per-mesh color values (skin tone, clothing colors, > etc) and use those to map one or a few master "selector" textures and > overlay some detail. > > Cheers, > > / h+ > > > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...]On Behalf Of John > W. Ratcliff > Sent: Thursday, April 29, 2004 6:25 PM > To: gda...@li... > Subject: [Algorithms] Streaming Transforms > > > With the Nv40 coming out there are soon going to be cards that will let > you render massive numbers of instanced objects without constantly > stalling out. > > For quite some time we have been told to perform as few state changes as > possible. However, when a transform is considered a 'state change' this > has prevented developers from doing massive (on the order or tens of > thousands) of dynamic objects at high framerate; even if the sum total > of those objects did not actually come near the triangle throughput of > the card. > > I'm currently exploring how to render the theoretical maximum number of > instanced objects possible at high frame rate on the Nv40. > > Once you start dealing with massive numbers of low polygon instanced > objects; very quickly you realize that just the transform data itself > can add up to a lot of memory. > > I am curious if you guys can think of more efficient ways to send > transform data in massive quantities. For example, representing all of > your transforms as a quaternion and a translation would take less than > half as much bandwidth as a full 4x4 matrix (assuming of course your > objects have no scale.) > > There is also the challenge of trying to manage this many entities > without iterating through them in various portions of your pipeline. > > Thoughts? > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=6188 > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=6188 > |
From: Deano C. <de...@ra...> - 2004-04-30 07:20:46
|
John W. Ratcliff wrote: >With the Nv40 coming out there are soon going to be cards that will let >you render massive numbers of instanced objects without constantly >stalling out. > >For quite some time we have been told to perform as few state changes as >possible. However, when a transform is considered a 'state change' this >has prevented developers from doing massive (on the order or tens of >thousands) of dynamic objects at high framerate; even if the sum total >of those objects did not actually come near the triangle throughput of >the card. > > Without the new vertex frequency API (had a quick look but haven't any time for a serious play yet), on PC the method we use is the indexed matrix transform method. For objects I know are going to be render alot of times I create vertex and index buffers that repeat the vertices n times (upto 48 on 256 vertex constants) with an additional element with index number. Rendering a new instance then just consists of copying the matrix and increasing the triangle count, when the batch needs flushing, change material once, render the instances, repeat. >I'm currently exploring how to render the theoretical maximum number of >instanced objects possible at high frame rate on the Nv40. > >Once you start dealing with massive numbers of low polygon instanced >objects; very quickly you realize that just the transform data itself >can add up to a lot of memory. > >I am curious if you guys can think of more efficient ways to send >transform data in massive quantities. For example, representing all of >your transforms as a quaternion and a translation would take less than >half as much bandwidth as a full 4x4 matrix (assuming of course your >objects have no scale.) > > If you using an NV40 there are lots of interesting ways of effectively compressing that amount of for data each instance. One method I've used on a PS2 micro-mesh renderer in the past (that should be possible on NV40) for grass and rocks was to pass a heightfield for all the batches and each instance just has 2D heightfield coordinates and a single angle (for rotation around a vertical y axis). Each instances parameter (encoded in the stream once per object) was then just 3 floats (in constant memory, compressed before stream decode). I've also used quaternion and translation for more general cases, I suspect that bandwidth won't be an issue with the frequency stuff. If the objects only have a few hundred triangles, you used compressed vertices and compressed instance data, the actual bandwidth used should be minimal. Including index data (pity we don't have single byte indices...) each instance should be around 200 * 6 bytes (3 indices of 2 bytes) + 200 * 10 (e.g. extremely compressed vertex) + 16 bytes (compressed quat+translation instance data) = 3216 bytes. Considering that a fair amount will probably be in on chip caches for the other instances, bandwidth shouldn't be a problem. >There is also the challenge of trying to manage this many entities >without iterating through them in various portions of your pipeline. > >Thoughts? > > Doing things like not putting flags in the body of each entities has saved us time due to better cache hits. The best rule is not to pollute the cache for entites that aren't important this frame, that tends to mean things tightly packing on/off and visiblity info in a structure seperate from the main entity info. Deano |
From: Sergei M. <se...@ha...> - 2004-04-30 08:04:19
|
Do you batch skinned objects too? We couldn't pack more that 2 together, because of the limited amount of constants. We have similar systemfor instancing stones&grass patches - one stream with height/probability per patch. In vs we can collapse vertices by the 'probability' and thus getting almost different looking patches - trading vertex processing for memory of course. However, I expirienced very bad slowdowns of nv30 family on some multistream scenarios, probably a driver bug. Half the framerate just goes away... ----- Original Message ----- From: "Deano Calver" <de...@ra...> To: <gda...@li...> Sent: Friday, April 30, 2004 10:20 AM Subject: Re: [Algorithms] Streaming Transforms > John W. Ratcliff wrote: > > >With the Nv40 coming out there are soon going to be cards that will let > >you render massive numbers of instanced objects without constantly > >stalling out. > > > >For quite some time we have been told to perform as few state changes as > >possible. However, when a transform is considered a 'state change' this > >has prevented developers from doing massive (on the order or tens of > >thousands) of dynamic objects at high framerate; even if the sum total > >of those objects did not actually come near the triangle throughput of > >the card. > > > > > Without the new vertex frequency API (had a quick look but haven't any > time for a serious play yet), on PC the method we use is the indexed > matrix transform method. For objects I know are going to be render alot > of times I create vertex and index buffers that repeat the vertices n > times (upto 48 on 256 vertex constants) with an additional element with > index number. Rendering a new instance then just consists of copying the > matrix and increasing the triangle count, when the batch needs flushing, > change material once, render the instances, repeat. > > >I'm currently exploring how to render the theoretical maximum number of > >instanced objects possible at high frame rate on the Nv40. > > > >Once you start dealing with massive numbers of low polygon instanced > >objects; very quickly you realize that just the transform data itself > >can add up to a lot of memory. > > > >I am curious if you guys can think of more efficient ways to send > >transform data in massive quantities. For example, representing all of > >your transforms as a quaternion and a translation would take less than > >half as much bandwidth as a full 4x4 matrix (assuming of course your > >objects have no scale.) > > > > > If you using an NV40 there are lots of interesting ways of effectively > compressing that amount of for data each instance. One method I've used > on a PS2 micro-mesh renderer in the past (that should be possible on > NV40) for grass and rocks was to pass a heightfield for all the batches > and each instance just has 2D heightfield coordinates and a single angle > (for rotation around a vertical y axis). Each instances parameter > (encoded in the stream once per object) was then just 3 floats (in > constant memory, compressed before stream decode). > I've also used quaternion and translation for more general cases, I > suspect that bandwidth won't be an issue with the frequency stuff. If > the objects only have a few hundred triangles, you used compressed > vertices and compressed instance data, the actual bandwidth used should > be minimal. Including index data (pity we don't have single byte > indices...) each instance should be around 200 * 6 bytes (3 indices of 2 > bytes) + 200 * 10 (e.g. extremely compressed vertex) + 16 bytes > (compressed quat+translation instance data) = 3216 bytes. Considering > that a fair amount will probably be in on chip caches for the other > instances, bandwidth shouldn't be a problem. > > >There is also the challenge of trying to manage this many entities > >without iterating through them in various portions of your pipeline. > > > >Thoughts? > > > > > Doing things like not putting flags in the body of each entities has > saved us time due to better cache hits. The best rule is not to pollute > the cache for entites that aren't important this frame, that tends to > mean things tightly packing on/off and visiblity info in a structure > seperate from the main entity info. > > Deano > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=6188 > |
From: Tom F. <tom...@ee...> - 2004-04-28 05:28:00
|
Eh? Checking for equality with floating-point numbers? That's just asking for trouble! TomF. > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...] On > Behalf Of Pierre Terdiman > Sent: 27 April 2004 06:37 > To: gda...@li... > Subject: Re: [Algorithms] EPA : working ? > > > > You wouldn't be running this sort of code on an x86 would you? I had > > nightmares with some of my own code (PVR-TC texture > compressor) that ran > > perfectly well on a sensible architecture (eg Sparc, HP) > where you ALWAYS > > got the same FP precision. > > > > With x86 the precision depends on the whims of the compiler > as to what > stays > > in the internal registers. > > It's of course running on a perfectly plain old x86, and I > ran into exactly > that sort of issues no later than last week. > > An algorithm perfectly fine in theory failed in practice > because one value > had been kept in FP registers, keeping all the precision, > while the other > had been stored to memory. When later those two values got > compared for > equality, the comparison failed although those values were > perfectly equal > to begin with. Bad luck, the algo was waiting for such an equality to > actually exit :( > > I found three fixes : > > - Change the algo. Not nice since it's otherwise cute and > fine and fast. > > - Force the compiler to actually recompute the same value all > over again > instead of caching it. Worked in my case, but was slower, and > who knows it's > not going to be stored to memory anyway the day I compile with VC7 ? > > - Rewrite the small loop in assembly, where I can make sure > the damn values > are kept in FP registers. > > > ...well, I now have more assembly in my codebase. Sigh. > > I think Solid's bug is less subtle than that, however. > > - Pierre > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek > For a limited time only, get FREE Ground shipping on all orders of $35 > or more. Hurry up and shop folks, this offer expires April 30th! > http://www.thinkgeek.com/freeshipping/?cpg=12297 > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=6188 > |
From: Pierre T. <pie...@no...> - 2004-04-28 07:37:37
|
> Eh? Checking for equality with floating-point numbers? That's just asking > for trouble! Certainly, but in this case it seemed ok. The values are dot-products computed for each vertices, and I'm waiting for the same FP value to reoccur when I dot-prod the same vertex as the one I started from. Doing the same operation on the same vertex produces exactly the same FP value, so in theory it's ok - as long as the initial cached value is kept in a FP register, that is. - Pierre |
From: <Pau...@sc...> - 2004-04-27 12:44:32
|
On 27/04/2004 12:47:55 gdalgorithms-list-admin wrote: >V-Clip does correctly identify penetrating objects as intersecting. > However, it only gives local witnesses to penetration, so this will not > give an accurate value for penetration depth. Marvellous. Actually, can't you easily work out the penetration from the witness points? Vertex/face, face/vertex, edge/edge and get a penetration measure from that? > If you had the linear velocities of the objects you can work this out > however via the projection onto the relative velocity of the movement > required to separate the object (along the relative velocity - > effectively pushing the objects apart to the position of first contact). No good if your objects aren't moving, though. Cheers, Paul. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify pos...@sc... This footnote also confirms that this email message has been checked for all known viruses. ********************************************************************** SCEE 2004 |