gdalgorithms-list Mailing List for Game Dev Algorithms (Page 8)
Brought to you by:
vexxed72
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(390) |
Aug
(767) |
Sep
(940) |
Oct
(964) |
Nov
(819) |
Dec
(762) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(680) |
Feb
(1075) |
Mar
(954) |
Apr
(595) |
May
(725) |
Jun
(868) |
Jul
(678) |
Aug
(785) |
Sep
(410) |
Oct
(395) |
Nov
(374) |
Dec
(419) |
2002 |
Jan
(699) |
Feb
(501) |
Mar
(311) |
Apr
(334) |
May
(501) |
Jun
(507) |
Jul
(441) |
Aug
(395) |
Sep
(540) |
Oct
(416) |
Nov
(369) |
Dec
(373) |
2003 |
Jan
(514) |
Feb
(488) |
Mar
(396) |
Apr
(624) |
May
(590) |
Jun
(562) |
Jul
(546) |
Aug
(463) |
Sep
(389) |
Oct
(399) |
Nov
(333) |
Dec
(449) |
2004 |
Jan
(317) |
Feb
(395) |
Mar
(136) |
Apr
(338) |
May
(488) |
Jun
(306) |
Jul
(266) |
Aug
(424) |
Sep
(502) |
Oct
(170) |
Nov
(170) |
Dec
(134) |
2005 |
Jan
(249) |
Feb
(109) |
Mar
(119) |
Apr
(282) |
May
(82) |
Jun
(113) |
Jul
(56) |
Aug
(160) |
Sep
(89) |
Oct
(98) |
Nov
(237) |
Dec
(297) |
2006 |
Jan
(151) |
Feb
(250) |
Mar
(222) |
Apr
(147) |
May
(266) |
Jun
(313) |
Jul
(367) |
Aug
(135) |
Sep
(108) |
Oct
(110) |
Nov
(220) |
Dec
(47) |
2007 |
Jan
(133) |
Feb
(144) |
Mar
(247) |
Apr
(191) |
May
(191) |
Jun
(171) |
Jul
(160) |
Aug
(51) |
Sep
(125) |
Oct
(115) |
Nov
(78) |
Dec
(67) |
2008 |
Jan
(165) |
Feb
(37) |
Mar
(130) |
Apr
(111) |
May
(91) |
Jun
(142) |
Jul
(54) |
Aug
(104) |
Sep
(89) |
Oct
(87) |
Nov
(44) |
Dec
(54) |
2009 |
Jan
(283) |
Feb
(113) |
Mar
(154) |
Apr
(395) |
May
(62) |
Jun
(48) |
Jul
(52) |
Aug
(54) |
Sep
(131) |
Oct
(29) |
Nov
(32) |
Dec
(37) |
2010 |
Jan
(34) |
Feb
(36) |
Mar
(40) |
Apr
(23) |
May
(38) |
Jun
(34) |
Jul
(36) |
Aug
(27) |
Sep
(9) |
Oct
(18) |
Nov
(25) |
Dec
|
2011 |
Jan
(1) |
Feb
(14) |
Mar
(1) |
Apr
(5) |
May
(1) |
Jun
|
Jul
|
Aug
(37) |
Sep
(6) |
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
(7) |
Mar
|
Apr
(4) |
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(10) |
2013 |
Jan
|
Feb
(1) |
Mar
(7) |
Apr
(2) |
May
|
Jun
|
Jul
(9) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(14) |
Feb
|
Mar
(2) |
Apr
|
May
(10) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(12) |
Nov
|
Dec
(1) |
2016 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
From: Nathaniel H. <na...@io...> - 2010-10-01 18:32:23
|
My impression was that the DX9-generation NVIDIA ones did 8-bit precision filtering on any 8-bit textures or DXT, and even the DX10 ones (GeForce 9XXX etc.) did higher precision. Being stuck in the console time warp, I don't have much hands-on experience with GPU hardware newer than 2006 or so, so I may be wrong. > I'm thinking maybe older gpu's did even courser filtering than this - > 6-bit > or something instead of 8. Maybe this was the recent improvement? > > On Fri, Oct 1, 2010 at 12:59 PM, Nathaniel Hoffman <na...@io...> wrote: > >> Didn't the newer NVIDIA GPUs fix this? >> >> > You guessed right. The loss of precision is in the texture units. >> > Unfortunately, 8 bit components are filtered to 8 bit results (even >> though >> > they show up as floating point values in the shader). This is true for >> > nvidia gpus for sure and probably all other gpus. >> > >> > -mike >> > ----- Original Message ----- >> > From: Stefan Sandberg >> > To: Game Development Algorithms >> > Sent: Friday, October 01, 2010 1:45 AM >> > Subject: Re: [Algorithms] Filtering >> > >> > >> > Assuming you're after precision, what's wrong with doing it >> manually? >> :) >> > If performance is what you're after, and you're working on textures >> as >> > they were intended(ie, game textures or video or something like that, >> > not 'data'), you could separate contrast & color separately, keeping >> > high contrast resolution, and downsampled color, and >> > you'd save both bandwidth and instr. >> > If you simply want to know 'why', I'm guessing loss of precision in >> the >> > tex units? >> > You've already ruled out shader precision from your own manual >> > filtering, so doesn't leave much else, imo.. >> > Other than manipulating the data you're working on, which is the >> only >> > thing you -can- change I guess, I cant really see a solution, >> > but far greater minds linger here than mine, so hold on for what I >> > assume will be a lengthy description of floating point math as >> > it is implemented in modern gpu's :) >> > >> > >> > >> > >> > >> > >> > On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck >> > <and...@gm...> wrote: >> > >> > Hi, >> > >> > I have a texture in which I use the R, G and B channel to store a >> > value in the [0, 1] range with very high precision. The value is >> > extracted like this in the (Cg) shader: >> > >> > float >> > extractValue(float2 pos) { >> > float4 temp = tex2D(buffer, pos); >> > return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * >> 255.0) * >> > (1.0 / 16777215.0); >> > } >> > >> > I now want to sample this value with bilinear filtering but when I >> do >> > this I don't get a correct result. If I do the filtering manually >> like >> > this: >> > >> > float >> > sampleValue(float2 pos) { >> > float2 ipos = floor(pos); >> > float2 fracs = pos - ipos; >> > float d0 = extractValue(ipos); >> > float d1 = extractValue(ipos + float2(1, 0)); >> > float d2 = extractValue(ipos + float2(0, 1)); >> > float d3 = extractValue(ipos + float2(1, 1)); >> > return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), >> > fracs.y); >> > } >> > >> > everything works as expected. The values in the buffer can be seen >> as >> > a linear combination of three constants: >> > >> > value = (C0 * r + C1 * g + C2 * b) >> > >> > If we use the built in texture filtering we should get the >> following >> > if we sample somewhere between two texels: {r0, g0, b0} and {r1, >> g1, >> > b1}. For simplicity we just look at filtering along one axis: >> > >> > filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + >> > lerp(b0, b1, t) * C2; >> > >> > Doing the filtering manually: >> > >> > filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * >> C1 >> + >> > b1 * C2, t) = >> > = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 >> * >> > C0 + g1 * C1 + b1 * C2) * t = >> > = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = >> > = lerp(r0, r1, t) * C0 + ... >> > >> > So in the world of non floating point numbers these two should be >> > equivalent right? >> > >> > My theory is that the error is caused by an unfortunate order of >> > floating point operations. I've tried variations like: >> > >> > (temp.x * (16711680.0 / 16777215.0) + temp.y * >> (65280.0/16777215.0) + >> > temp.z * (255.0/16777215.0)) >> > >> > and >> > >> > (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / >> > 16777215.0) >> > >> > but all exhibit the same problem. What do you think; is it >> possible >> to >> > solve this problem? >> > >> > Regards Andreas >> > >> > >> ------------------------------------------------------------------------------ >> > Start uncovering the many advantages of virtual appliances >> > and start using them to simplify application deployment and >> > accelerate your shift to cloud computing. >> > http://p.sf.net/sfu/novell-sfdev2dev >> > _______________________________________________ >> > GDAlgorithms-list mailing list >> > GDA...@li... >> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> > Archives: >> > >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > >> > >> > >> > >> > >> > >> ------------------------------------------------------------------------------ >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Start uncovering the many advantages of virtual appliances >> > and start using them to simplify application deployment and >> > accelerate your shift to cloud computing. >> > http://p.sf.net/sfu/novell-sfdev2dev >> > >> > >> > >> ------------------------------------------------------------------------------ >> > >> > >> > _______________________________________________ >> > GDAlgorithms-list mailing list >> > GDA...@li... >> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> > Archives: >> > >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list------------------------------------------------------------------------------ >> > Start uncovering the many advantages of virtual appliances >> > and start using them to simplify application deployment and >> > accelerate your shift to cloud computing. >> > >> http://p.sf.net/sfu/novell-sfdev2dev_______________________________________________ >> > GDAlgorithms-list mailing list >> > GDA...@li... >> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> > Archives: >> > >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> >> >> >> >> ------------------------------------------------------------------------------ >> Start uncovering the many advantages of virtual appliances >> and start using them to simplify application deployment and >> accelerate your shift to cloud computing. >> http://p.sf.net/sfu/novell-sfdev2dev >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > -- > Jeff Russell > Engineer, 8monkey Labs > www.8monkeylabs.com > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev_______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
From: Ola O. <ola...@gm...> - 2010-10-01 18:12:33
|
While we're throwing ideas around, why not try using a float texture and see if it produces the same error? .ola P.S. Hi Andreas :) ----- Original Message ----- From: "Nathaniel Hoffman" <na...@io...> To: "Game Development Algorithms" <gda...@li...> Sent: Friday, October 01, 2010 7:59 PM Subject: Re: [Algorithms] Filtering > Didn't the newer NVIDIA GPUs fix this? > >> You guessed right. The loss of precision is in the texture units. >> Unfortunately, 8 bit components are filtered to 8 bit results (even >> though >> they show up as floating point values in the shader). This is true for >> nvidia gpus for sure and probably all other gpus. >> >> -mike >> ----- Original Message ----- >> From: Stefan Sandberg >> To: Game Development Algorithms >> Sent: Friday, October 01, 2010 1:45 AM >> Subject: Re: [Algorithms] Filtering >> >> >> Assuming you're after precision, what's wrong with doing it manually? >> :) >> If performance is what you're after, and you're working on textures as >> they were intended(ie, game textures or video or something like that, >> not 'data'), you could separate contrast & color separately, keeping >> high contrast resolution, and downsampled color, and >> you'd save both bandwidth and instr. >> If you simply want to know 'why', I'm guessing loss of precision in the >> tex units? >> You've already ruled out shader precision from your own manual >> filtering, so doesn't leave much else, imo.. >> Other than manipulating the data you're working on, which is the only >> thing you -can- change I guess, I cant really see a solution, >> but far greater minds linger here than mine, so hold on for what I >> assume will be a lengthy description of floating point math as >> it is implemented in modern gpu's :) >> >> >> >> >> >> >> On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck >> <and...@gm...> wrote: >> >> Hi, >> >> I have a texture in which I use the R, G and B channel to store a >> value in the [0, 1] range with very high precision. The value is >> extracted like this in the (Cg) shader: >> >> float >> extractValue(float2 pos) { >> float4 temp = tex2D(buffer, pos); >> return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * >> (1.0 / 16777215.0); >> } >> >> I now want to sample this value with bilinear filtering but when I do >> this I don't get a correct result. If I do the filtering manually >> like >> this: >> >> float >> sampleValue(float2 pos) { >> float2 ipos = floor(pos); >> float2 fracs = pos - ipos; >> float d0 = extractValue(ipos); >> float d1 = extractValue(ipos + float2(1, 0)); >> float d2 = extractValue(ipos + float2(0, 1)); >> float d3 = extractValue(ipos + float2(1, 1)); >> return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), >> fracs.y); >> } >> >> everything works as expected. The values in the buffer can be seen as >> a linear combination of three constants: >> >> value = (C0 * r + C1 * g + C2 * b) >> >> If we use the built in texture filtering we should get the following >> if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, >> b1}. For simplicity we just look at filtering along one axis: >> >> filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + >> lerp(b0, b1, t) * C2; >> >> Doing the filtering manually: >> >> filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 >> + >> b1 * C2, t) = >> = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * >> C0 + g1 * C1 + b1 * C2) * t = >> = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = >> = lerp(r0, r1, t) * C0 + ... >> >> So in the world of non floating point numbers these two should be >> equivalent right? >> >> My theory is that the error is caused by an unfortunate order of >> floating point operations. I've tried variations like: >> >> (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + >> temp.z * (255.0/16777215.0)) >> >> and >> >> (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / >> 16777215.0) >> >> but all exhibit the same problem. What do you think; is it possible >> to >> solve this problem? >> >> Regards Andreas >> >> ------------------------------------------------------------------------------ >> Start uncovering the many advantages of virtual appliances >> and start using them to simplify application deployment and >> accelerate your shift to cloud computing. >> http://p.sf.net/sfu/novell-sfdev2dev >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> ------------------------------------------------------------------------------ >> Start uncovering the many advantages of virtual appliances >> and start using them to simplify application deployment and >> accelerate your shift to cloud computing. >> http://p.sf.net/sfu/novell-sfdev2dev >> >> >> ------------------------------------------------------------------------------ >> >> >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list------------------------------------------------------------------------------ >> Start uncovering the many advantages of virtual appliances >> and start using them to simplify application deployment and >> accelerate your shift to cloud computing. >> http://p.sf.net/sfu/novell-sfdev2dev_______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
From: Jeff R. <je...@8m...> - 2010-10-01 18:04:59
|
I'm thinking maybe older gpu's did even courser filtering than this - 6-bit or something instead of 8. Maybe this was the recent improvement? On Fri, Oct 1, 2010 at 12:59 PM, Nathaniel Hoffman <na...@io...> wrote: > Didn't the newer NVIDIA GPUs fix this? > > > You guessed right. The loss of precision is in the texture units. > > Unfortunately, 8 bit components are filtered to 8 bit results (even > though > > they show up as floating point values in the shader). This is true for > > nvidia gpus for sure and probably all other gpus. > > > > -mike > > ----- Original Message ----- > > From: Stefan Sandberg > > To: Game Development Algorithms > > Sent: Friday, October 01, 2010 1:45 AM > > Subject: Re: [Algorithms] Filtering > > > > > > Assuming you're after precision, what's wrong with doing it manually? > :) > > If performance is what you're after, and you're working on textures as > > they were intended(ie, game textures or video or something like that, > > not 'data'), you could separate contrast & color separately, keeping > > high contrast resolution, and downsampled color, and > > you'd save both bandwidth and instr. > > If you simply want to know 'why', I'm guessing loss of precision in the > > tex units? > > You've already ruled out shader precision from your own manual > > filtering, so doesn't leave much else, imo.. > > Other than manipulating the data you're working on, which is the only > > thing you -can- change I guess, I cant really see a solution, > > but far greater minds linger here than mine, so hold on for what I > > assume will be a lengthy description of floating point math as > > it is implemented in modern gpu's :) > > > > > > > > > > > > > > On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck > > <and...@gm...> wrote: > > > > Hi, > > > > I have a texture in which I use the R, G and B channel to store a > > value in the [0, 1] range with very high precision. The value is > > extracted like this in the (Cg) shader: > > > > float > > extractValue(float2 pos) { > > float4 temp = tex2D(buffer, pos); > > return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * > > (1.0 / 16777215.0); > > } > > > > I now want to sample this value with bilinear filtering but when I do > > this I don't get a correct result. If I do the filtering manually > like > > this: > > > > float > > sampleValue(float2 pos) { > > float2 ipos = floor(pos); > > float2 fracs = pos - ipos; > > float d0 = extractValue(ipos); > > float d1 = extractValue(ipos + float2(1, 0)); > > float d2 = extractValue(ipos + float2(0, 1)); > > float d3 = extractValue(ipos + float2(1, 1)); > > return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), > > fracs.y); > > } > > > > everything works as expected. The values in the buffer can be seen as > > a linear combination of three constants: > > > > value = (C0 * r + C1 * g + C2 * b) > > > > If we use the built in texture filtering we should get the following > > if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, > > b1}. For simplicity we just look at filtering along one axis: > > > > filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + > > lerp(b0, b1, t) * C2; > > > > Doing the filtering manually: > > > > filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 > + > > b1 * C2, t) = > > = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * > > C0 + g1 * C1 + b1 * C2) * t = > > = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = > > = lerp(r0, r1, t) * C0 + ... > > > > So in the world of non floating point numbers these two should be > > equivalent right? > > > > My theory is that the error is caused by an unfortunate order of > > floating point operations. I've tried variations like: > > > > (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + > > temp.z * (255.0/16777215.0)) > > > > and > > > > (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / > > 16777215.0) > > > > but all exhibit the same problem. What do you think; is it possible > to > > solve this problem? > > > > Regards Andreas > > > > > ------------------------------------------------------------------------------ > > Start uncovering the many advantages of virtual appliances > > and start using them to simplify application deployment and > > accelerate your shift to cloud computing. > > http://p.sf.net/sfu/novell-sfdev2dev > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > ------------------------------------------------------------------------------ > > Start uncovering the many advantages of virtual appliances > > and start using them to simplify application deployment and > > accelerate your shift to cloud computing. > > http://p.sf.net/sfu/novell-sfdev2dev > > > > > > > ------------------------------------------------------------------------------ > > > > > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list------------------------------------------------------------------------------ > > Start uncovering the many advantages of virtual appliances > > and start using them to simplify application deployment and > > accelerate your shift to cloud computing. > > > http://p.sf.net/sfu/novell-sfdev2dev_______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |
From: Nathaniel H. <na...@io...> - 2010-10-01 17:59:27
|
Didn't the newer NVIDIA GPUs fix this? > You guessed right. The loss of precision is in the texture units. > Unfortunately, 8 bit components are filtered to 8 bit results (even though > they show up as floating point values in the shader). This is true for > nvidia gpus for sure and probably all other gpus. > > -mike > ----- Original Message ----- > From: Stefan Sandberg > To: Game Development Algorithms > Sent: Friday, October 01, 2010 1:45 AM > Subject: Re: [Algorithms] Filtering > > > Assuming you're after precision, what's wrong with doing it manually? :) > If performance is what you're after, and you're working on textures as > they were intended(ie, game textures or video or something like that, > not 'data'), you could separate contrast & color separately, keeping > high contrast resolution, and downsampled color, and > you'd save both bandwidth and instr. > If you simply want to know 'why', I'm guessing loss of precision in the > tex units? > You've already ruled out shader precision from your own manual > filtering, so doesn't leave much else, imo.. > Other than manipulating the data you're working on, which is the only > thing you -can- change I guess, I cant really see a solution, > but far greater minds linger here than mine, so hold on for what I > assume will be a lengthy description of floating point math as > it is implemented in modern gpu's :) > > > > > > > On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck > <and...@gm...> wrote: > > Hi, > > I have a texture in which I use the R, G and B channel to store a > value in the [0, 1] range with very high precision. The value is > extracted like this in the (Cg) shader: > > float > extractValue(float2 pos) { > float4 temp = tex2D(buffer, pos); > return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * > (1.0 / 16777215.0); > } > > I now want to sample this value with bilinear filtering but when I do > this I don't get a correct result. If I do the filtering manually like > this: > > float > sampleValue(float2 pos) { > float2 ipos = floor(pos); > float2 fracs = pos - ipos; > float d0 = extractValue(ipos); > float d1 = extractValue(ipos + float2(1, 0)); > float d2 = extractValue(ipos + float2(0, 1)); > float d3 = extractValue(ipos + float2(1, 1)); > return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), > fracs.y); > } > > everything works as expected. The values in the buffer can be seen as > a linear combination of three constants: > > value = (C0 * r + C1 * g + C2 * b) > > If we use the built in texture filtering we should get the following > if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, > b1}. For simplicity we just look at filtering along one axis: > > filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + > lerp(b0, b1, t) * C2; > > Doing the filtering manually: > > filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 + > b1 * C2, t) = > = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * > C0 + g1 * C1 + b1 * C2) * t = > = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = > = lerp(r0, r1, t) * C0 + ... > > So in the world of non floating point numbers these two should be > equivalent right? > > My theory is that the error is caused by an unfortunate order of > floating point operations. I've tried variations like: > > (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + > temp.z * (255.0/16777215.0)) > > and > > (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / > 16777215.0) > > but all exhibit the same problem. What do you think; is it possible to > solve this problem? > > Regards Andreas > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > > ------------------------------------------------------------------------------ > > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev_______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
From: Michael B. <mi...@fa...> - 2010-10-01 15:21:17
|
You guessed right. The loss of precision is in the texture units. Unfortunately, 8 bit components are filtered to 8 bit results (even though they show up as floating point values in the shader). This is true for nvidia gpus for sure and probably all other gpus. -mike ----- Original Message ----- From: Stefan Sandberg To: Game Development Algorithms Sent: Friday, October 01, 2010 1:45 AM Subject: Re: [Algorithms] Filtering Assuming you're after precision, what's wrong with doing it manually? :) If performance is what you're after, and you're working on textures as they were intended(ie, game textures or video or something like that, not 'data'), you could separate contrast & color separately, keeping high contrast resolution, and downsampled color, and you'd save both bandwidth and instr. If you simply want to know 'why', I'm guessing loss of precision in the tex units? You've already ruled out shader precision from your own manual filtering, so doesn't leave much else, imo.. Other than manipulating the data you're working on, which is the only thing you -can- change I guess, I cant really see a solution, but far greater minds linger here than mine, so hold on for what I assume will be a lengthy description of floating point math as it is implemented in modern gpu's :) On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck <and...@gm...> wrote: Hi, I have a texture in which I use the R, G and B channel to store a value in the [0, 1] range with very high precision. The value is extracted like this in the (Cg) shader: float extractValue(float2 pos) { float4 temp = tex2D(buffer, pos); return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * (1.0 / 16777215.0); } I now want to sample this value with bilinear filtering but when I do this I don't get a correct result. If I do the filtering manually like this: float sampleValue(float2 pos) { float2 ipos = floor(pos); float2 fracs = pos - ipos; float d0 = extractValue(ipos); float d1 = extractValue(ipos + float2(1, 0)); float d2 = extractValue(ipos + float2(0, 1)); float d3 = extractValue(ipos + float2(1, 1)); return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), fracs.y); } everything works as expected. The values in the buffer can be seen as a linear combination of three constants: value = (C0 * r + C1 * g + C2 * b) If we use the built in texture filtering we should get the following if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, b1}. For simplicity we just look at filtering along one axis: filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + lerp(b0, b1, t) * C2; Doing the filtering manually: filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 + b1 * C2, t) = = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * C0 + g1 * C1 + b1 * C2) * t = = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = = lerp(r0, r1, t) * C0 + ... So in the world of non floating point numbers these two should be equivalent right? My theory is that the error is caused by an unfortunate order of floating point operations. I've tried variations like: (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + temp.z * (255.0/16777215.0)) and (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / 16777215.0) but all exhibit the same problem. What do you think; is it possible to solve this problem? Regards Andreas ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ------------------------------------------------------------------------------ _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
From: Jeff R. <je...@8m...> - 2010-10-01 15:08:42
|
I wouldn't be surprised if the hardware filtering of 8-bit components operated at less than 32 bit precision, at least on some cards. Also worth mentioning that the order of your floating point operations is probably being aggressively modified by your shader compiler in its attempts to optimize. Those last two bits of code you included are probably compiling into the same result. On Fri, Oct 1, 2010 at 3:45 AM, Stefan Sandberg <kef...@gm...>wrote: > Assuming you're after precision, what's wrong with doing it manually? :) > If performance is what you're after, and you're working on textures as they > were intended(ie, game textures or video or something like that, not > 'data'), you could separate contrast & color separately, keeping high > contrast resolution, and downsampled color, and > you'd save both bandwidth and instr. > If you simply want to know 'why', I'm guessing loss of precision in the tex > units? > You've already ruled out shader precision from your own manual filtering, > so doesn't leave much else, imo.. > Other than manipulating the data you're working on, which is the only thing > you -can- change I guess, I cant really see a solution, > but far greater minds linger here than mine, so hold on for what I assume > will be a lengthy description of floating point math as > it is implemented in modern gpu's :) > > > > On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck <and...@gm...>wrote: > >> Hi, >> >> I have a texture in which I use the R, G and B channel to store a >> value in the [0, 1] range with very high precision. The value is >> extracted like this in the (Cg) shader: >> >> float >> extractValue(float2 pos) { >> float4 temp = tex2D(buffer, pos); >> return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * >> (1.0 / 16777215.0); >> } >> >> I now want to sample this value with bilinear filtering but when I do >> this I don't get a correct result. If I do the filtering manually like >> this: >> >> float >> sampleValue(float2 pos) { >> float2 ipos = floor(pos); >> float2 fracs = pos - ipos; >> float d0 = extractValue(ipos); >> float d1 = extractValue(ipos + float2(1, 0)); >> float d2 = extractValue(ipos + float2(0, 1)); >> float d3 = extractValue(ipos + float2(1, 1)); >> return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), fracs.y); >> } >> >> everything works as expected. The values in the buffer can be seen as >> a linear combination of three constants: >> >> value = (C0 * r + C1 * g + C2 * b) >> >> If we use the built in texture filtering we should get the following >> if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, >> b1}. For simplicity we just look at filtering along one axis: >> >> filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + >> lerp(b0, b1, t) * C2; >> >> Doing the filtering manually: >> >> filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 + >> b1 * C2, t) = >> = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * >> C0 + g1 * C1 + b1 * C2) * t = >> = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = >> = lerp(r0, r1, t) * C0 + ... >> >> So in the world of non floating point numbers these two should be >> equivalent right? >> >> My theory is that the error is caused by an unfortunate order of >> floating point operations. I've tried variations like: >> >> (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + >> temp.z * (255.0/16777215.0)) >> >> and >> >> (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / >> 16777215.0) >> >> but all exhibit the same problem. What do you think; is it possible to >> solve this problem? >> >> Regards Andreas >> >> >> ------------------------------------------------------------------------------ >> Start uncovering the many advantages of virtual appliances >> and start using them to simplify application deployment and >> accelerate your shift to cloud computing. >> http://p.sf.net/sfu/novell-sfdev2dev >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |
From: Stefan S. <kef...@gm...> - 2010-10-01 08:45:44
|
Assuming you're after precision, what's wrong with doing it manually? :) If performance is what you're after, and you're working on textures as they were intended(ie, game textures or video or something like that, not 'data'), you could separate contrast & color separately, keeping high contrast resolution, and downsampled color, and you'd save both bandwidth and instr. If you simply want to know 'why', I'm guessing loss of precision in the tex units? You've already ruled out shader precision from your own manual filtering, so doesn't leave much else, imo.. Other than manipulating the data you're working on, which is the only thing you -can- change I guess, I cant really see a solution, but far greater minds linger here than mine, so hold on for what I assume will be a lengthy description of floating point math as it is implemented in modern gpu's :) On Fri, Oct 1, 2010 at 9:57 AM, Andreas Brinck <and...@gm...>wrote: > Hi, > > I have a texture in which I use the R, G and B channel to store a > value in the [0, 1] range with very high precision. The value is > extracted like this in the (Cg) shader: > > float > extractValue(float2 pos) { > float4 temp = tex2D(buffer, pos); > return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * > (1.0 / 16777215.0); > } > > I now want to sample this value with bilinear filtering but when I do > this I don't get a correct result. If I do the filtering manually like > this: > > float > sampleValue(float2 pos) { > float2 ipos = floor(pos); > float2 fracs = pos - ipos; > float d0 = extractValue(ipos); > float d1 = extractValue(ipos + float2(1, 0)); > float d2 = extractValue(ipos + float2(0, 1)); > float d3 = extractValue(ipos + float2(1, 1)); > return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), fracs.y); > } > > everything works as expected. The values in the buffer can be seen as > a linear combination of three constants: > > value = (C0 * r + C1 * g + C2 * b) > > If we use the built in texture filtering we should get the following > if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, > b1}. For simplicity we just look at filtering along one axis: > > filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + > lerp(b0, b1, t) * C2; > > Doing the filtering manually: > > filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 + > b1 * C2, t) = > = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * > C0 + g1 * C1 + b1 * C2) * t = > = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = > = lerp(r0, r1, t) * C0 + ... > > So in the world of non floating point numbers these two should be > equivalent right? > > My theory is that the error is caused by an unfortunate order of > floating point operations. I've tried variations like: > > (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + > temp.z * (255.0/16777215.0)) > > and > > (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / 16777215.0) > > but all exhibit the same problem. What do you think; is it possible to > solve this problem? > > Regards Andreas > > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
From: Andreas B. <and...@gm...> - 2010-10-01 07:57:19
|
Hi, I have a texture in which I use the R, G and B channel to store a value in the [0, 1] range with very high precision. The value is extracted like this in the (Cg) shader: float extractValue(float2 pos) { float4 temp = tex2D(buffer, pos); return (temp.x * 16711680.0 + temp.y * 65280.0 + temp.z * 255.0) * (1.0 / 16777215.0); } I now want to sample this value with bilinear filtering but when I do this I don't get a correct result. If I do the filtering manually like this: float sampleValue(float2 pos) { float2 ipos = floor(pos); float2 fracs = pos - ipos; float d0 = extractValue(ipos); float d1 = extractValue(ipos + float2(1, 0)); float d2 = extractValue(ipos + float2(0, 1)); float d3 = extractValue(ipos + float2(1, 1)); return lerp(lerp(d0, d1, fracs.x), lerp(d2, d3, fracs.x), fracs.y); } everything works as expected. The values in the buffer can be seen as a linear combination of three constants: value = (C0 * r + C1 * g + C2 * b) If we use the built in texture filtering we should get the following if we sample somewhere between two texels: {r0, g0, b0} and {r1, g1, b1}. For simplicity we just look at filtering along one axis: filtered value = lerp(r0, r1, t) * C0 + lerp(g0, g1, t) * C1 + lerp(b0, b1, t) * C2; Doing the filtering manually: filtered value = lerp(r0 * C0 + b0 * C1 + g0 * C2, r1 * C0 + g1 * C1 + b1 * C2, t) = = (r0 * C0 + b0 * C1 + g0 * C2) * (1 - t) + (r1 * C0 + g1 * C1 + b1 * C2) * t = = (r0 * C0) * (1 - t) + (r1 * C0) * t + ... = = lerp(r0, r1, t) * C0 + ... So in the world of non floating point numbers these two should be equivalent right? My theory is that the error is caused by an unfortunate order of floating point operations. I've tried variations like: (temp.x * (16711680.0 / 16777215.0) + temp.y * (65280.0/16777215.0) + temp.z * (255.0/16777215.0)) and (((temp.x * 256.0 + temp.y) * 256.0 + temp.z) * 255.0) * (1.0 / 16777215.0) but all exhibit the same problem. What do you think; is it possible to solve this problem? Regards Andreas |
From: <chr...@pl...> - 2010-08-22 23:42:52
|
Jeff Russell <je...@8m...> wrote on 08/18/2010 06:16:17 PM: > I need to compute pow(x,y), where x is on [0,1], and y is guaranteed > positive and could have an upper bound in the neighborhood of 1000. I've never used it so I have no idea if Schlick's approximation fits that description, but fwiw: http://ompf.org/forum/viewtopic.php?f=11&t=1402 Christer Ericson, Director of Tools and Technology Sony Computer Entertainment, Santa Monica http://realtimecollisiondetection.net/blog/ |
From: Jon W. <jw...@gm...> - 2010-08-21 04:44:10
|
Depending on accuracy needs, a look-up table with interpolation (say, cubic interpolation) can be just fine and dandy. Consider the 0-255/0-255 texture typically used to approximate pow() for specular lighting back in the days; it would be 65 kB in size and thus fit in L2 on a modern CPU (if you use bytes to represent that 0..1 range). Another, even faster, and even worse, approximation is simply to make a line from (1,1) that intersects y=0 somewhere between x=0 and x=1, and move it farther to the right for higher exponents. It all depends on what kind of precision you need this for. Sincerely, jw -- Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable. On Thu, Aug 19, 2010 at 11:17 AM, Fabian Giesen <ry...@gm...> wrote: > On 19.08.2010 10:57, Robin Green wrote: >> On Wed, Aug 18, 2010 at 11:35 PM, Fabian Giesen<ry...@gm...> wrote: >>> >>>> I would also love to just see a sample implementation of pow(), log(), >>>> and exp() somewhere, even that might be helpful. >>> >>> glibc math implementations are in sysdeps/ieee754 for generic IEEE-754 >>> compliant platforms, with optimized versions for all relevant >>> architectures in sysdeps/<arch>. If you really want to know how it's >>> implemented :) >> >> >> What he said. >> >> Also, take a look at the CEPHES library for platform agnostic >> reference implementations of the C math functions and some extras like >> cotangent, cuberoot and integer powers: >> >> http://www.netlib.org/cephes/ >> >> And here's an X86 specific implementation of powf() that claims to be >> faster (than what, it doesn't say): >> >> http://www.xyzw.de/c190.html > > Now that's interesting :). I wrote most of that header file, around 2000 > or so. It's faster than what used to be the standard pow() > implementation on x86 (as in the VC++ 6.0 runtime library), using fscale > (that method is still used for sFExp below). This is all code for 64k > intros so it was optimized for size originally, but pow was a bottleneck > during texture generation, and Agner Fogs version was 20-30% faster if I > recall correctly. (This was back when P3s were the norm though, no idea > how it looks now). The main change is to replace the fscale (which used > to be very slow on some processors) with a longer code sequence that's > faster. > > The original code sequence used to be commented out before the "// > faster pow" comment, but I guess that got removed at some point :). > > Since VS2002 or 2003, the C library contains a much better pow() > implementation (using SSE on processors that support it) that should be > faster than this code. It's also a lot bigger though. > > -Fabian > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
From: Fabian G. <ry...@gm...> - 2010-08-19 18:17:42
|
On 19.08.2010 10:57, Robin Green wrote: > On Wed, Aug 18, 2010 at 11:35 PM, Fabian Giesen<ry...@gm...> wrote: >> >>> I would also love to just see a sample implementation of pow(), log(), >>> and exp() somewhere, even that might be helpful. >> >> glibc math implementations are in sysdeps/ieee754 for generic IEEE-754 >> compliant platforms, with optimized versions for all relevant >> architectures in sysdeps/<arch>. If you really want to know how it's >> implemented :) > > > What he said. > > Also, take a look at the CEPHES library for platform agnostic > reference implementations of the C math functions and some extras like > cotangent, cuberoot and integer powers: > > http://www.netlib.org/cephes/ > > And here's an X86 specific implementation of powf() that claims to be > faster (than what, it doesn't say): > > http://www.xyzw.de/c190.html Now that's interesting :). I wrote most of that header file, around 2000 or so. It's faster than what used to be the standard pow() implementation on x86 (as in the VC++ 6.0 runtime library), using fscale (that method is still used for sFExp below). This is all code for 64k intros so it was optimized for size originally, but pow was a bottleneck during texture generation, and Agner Fogs version was 20-30% faster if I recall correctly. (This was back when P3s were the norm though, no idea how it looks now). The main change is to replace the fscale (which used to be very slow on some processors) with a longer code sequence that's faster. The original code sequence used to be commented out before the "// faster pow" comment, but I guess that got removed at some point :). Since VS2002 or 2003, the C library contains a much better pow() implementation (using SSE on processors that support it) that should be faster than this code. It's also a lot bigger though. -Fabian |
From: Robin G. <rob...@gm...> - 2010-08-19 17:57:13
|
On Wed, Aug 18, 2010 at 11:35 PM, Fabian Giesen <ry...@gm...> wrote: > >> I would also love to just see a sample implementation of pow(), log(), >> and exp() somewhere, even that might be helpful. > > glibc math implementations are in sysdeps/ieee754 for generic IEEE-754 > compliant platforms, with optimized versions for all relevant > architectures in sysdeps/<arch>. If you really want to know how it's > implemented :) What he said. Also, take a look at the CEPHES library for platform agnostic reference implementations of the C math functions and some extras like cotangent, cuberoot and integer powers: http://www.netlib.org/cephes/ And here's an X86 specific implementation of powf() that claims to be faster (than what, it doesn't say): http://www.xyzw.de/c190.html - Robin Green. |
From: Steve L. <sm...@go...> - 2010-08-19 16:35:09
|
Another option, even if you can't find a much faster approximation, is to unroll your loop 4 times and calculate 4 pows at the same time using SSE/VMX. From: Jeff Russell [mailto:je...@8m...] Sent: 19 August 2010 2:16am To: Game Development Algorithms Subject: [Algorithms] fast pow() for limited inputs So I need to speed up the CRT pow() function a bit, but I have some restrictions on the input which hopefully should give me some room to optimize: I need to compute pow(x,y), where x is on [0,1], and y is guaranteed positive and could have an upper bound in the neighborhood of 1000. I need "reasonable" accuracy (could be a little looser than the standard pow() implementation). I've searched online and found some bit twiddling approaches that claim to be very fast, but they seem to be too inaccurate for my purposes. I've tried implementing pow() as exp( log(x), y ), with my own cheap Taylor series in place of the natural log function. It did produce good output but wasn't very fast (slightly slower than the CRT pow()). It is probably worth mentioning before anyone asks that yes I have confirmed pow() as the bottleneck with a profiling tool ;-) I would also love to just see a sample implementation of pow(), log(), and exp() somewhere, even that might be helpful. Thanks, Jeff -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |
From: Simon F. <sim...@po...> - 2010-08-19 15:45:10
|
Jeff, I don't know how much accuracy you are really after, but Jim Blinn had a little article called "Floating-Point Tricks" that was in IEEE CG&A Jul/Aug 1997, which has some "log2" and "exp2" approximations. It's also printed in his "Notation, Notation, Notation" book. Cheers Simon ________________________________ From: Jeff Russell [mailto:je...@8m...] Sent: 19 August 2010 16:32 To: Game Development Algorithms Subject: Re: [Algorithms] fast pow() for limited inputs Thanks Fabian, good info there. I'll check out minimax polynomials, and as both you and Simon pointed out base 2 log/exp would probably make more sense. Jeff |
From: Jeff R. <je...@8m...> - 2010-08-19 15:32:16
|
Thanks Fabian, good info there. I'll check out minimax polynomials, and as both you and Simon pointed out base 2 log/exp would probably make more sense. Jeff On Thu, Aug 19, 2010 at 1:35 AM, Fabian Giesen <ry...@gm...> wrote: > On 18.08.2010 18:16, Jeff Russell wrote: > > So I need to speed up the CRT pow() function a bit, but I have some > > restrictions on the input which hopefully should give me some room to > > optimize: > > > > I need to compute pow(x,y), where x is on [0,1], and y is guaranteed > > positive and could have an upper bound in the neighborhood of 1000. > > Unfortunately, x in [0,1] isn't a big restriction for pow; it covers > roughly a quarter of all possible 32-bit floating point numbers over a > wide range of exponents. pow on [0,1] is, for all practical purposes, > just as expensive as a full-range one. > > What's a lot more interesting is the distribution of your x inside > [0,1], and what type of error bound you need. Some background first: A > full-precision pow tries to return a floating-point number that is as > close as possible to the result you would get if you were doing the > computation exactly then rounding to the nearest representable FP > number. The error of numerical approximations is usually specified in > ulps (units in the last place). With IEEE floating point, addition, > subtraction, multiplication and division are guaranteed to return > results accurate to within 0.5ulps (which is equivalent to saying that > the results are the same as if they had been performed exactly then > rounded). For a variety of reasons, you can't get the same error bound > on pow and a lot of other transcendental functions, but library > implementations are usually on the order of 1ulp or less, so they're > pretty accurate. > > Anyway, if you don't care about relative error, but rather need an > absolute error bound, you can play fast and loose with small x (as long > as y > 1) and the like. > > > I need "reasonable" accuracy (could be a little looser than the standard > > pow() implementation). I've searched online and found some bit twiddling > > approaches that claim to be very fast, but they seem to be too > > inaccurate for my purposes. I've tried implementing pow() as exp( > > log(x), y ), with my own cheap Taylor series in place of the natural log > > function. It did produce good output but wasn't very fast (slightly > > exp(log(x) * y) is already significantly worse than a "true" pow(x, y) > in terms of relative error, and it has some extra roundoff steps that > aren't directly visible: exp and log are usually implemented in terms of > exp2 and log2 (since that matches better with the internal FP > representation), so there's some invisible conversion factors in there. > > Taylor series approximations are quite useless in practice. You can get > a lot more precision for the same amount of work (or, equivalently, the > same precision with a lot less work) by using optimized approximation > polynomials (Minimax polynomials are usually the weapon of choice, but > you really need a Computer Algebra package if you want to build your own > approximation polynomials). log is still a fairly tough nut to crack > however, since it's hard to approximate properly with polynomials. > > Robin Green did a great GDC 2003 presentation on the subject, it's > available here: (transcendental functions are in the 2nd part, but you > should still start at the beginning) > > http://www.research.scea.com/gdc2003/fast-math-functions.html > > Big spoiler: There's no really great general solution for pow that's > significantly faster than what the CRT does. > > > slower than the CRT pow()). It is probably worth mentioning before > > anyone asks that yes I have confirmed pow() as the bottleneck with a > > profiling tool ;-) > > A bit more context would help. What are you pow'ing, and what do you do > with the results? General pow is hard, but for applications like > rendering, you can usually get by with rather rough approximations. > > > I would also love to just see a sample implementation of pow(), log(), > > and exp() somewhere, even that might be helpful. > > A handy reference is... the CRT source code. VC++ doesn't come with > source code to the math functions though - you can either just > disassemble (e.g. with objdump) or look at another CRT implementation. > glibc is a good candidate. > > glibc math implementations are in sysdeps/ieee754 for generic IEEE-754 > compliant platforms, with optimized versions for all relevant > architectures in sysdeps/<arch>. If you really want to know how it's > implemented :) > > -Fabian > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |
From: Fabian G. <ry...@gm...> - 2010-08-19 06:36:04
|
On 18.08.2010 18:16, Jeff Russell wrote: > So I need to speed up the CRT pow() function a bit, but I have some > restrictions on the input which hopefully should give me some room to > optimize: > > I need to compute pow(x,y), where x is on [0,1], and y is guaranteed > positive and could have an upper bound in the neighborhood of 1000. Unfortunately, x in [0,1] isn't a big restriction for pow; it covers roughly a quarter of all possible 32-bit floating point numbers over a wide range of exponents. pow on [0,1] is, for all practical purposes, just as expensive as a full-range one. What's a lot more interesting is the distribution of your x inside [0,1], and what type of error bound you need. Some background first: A full-precision pow tries to return a floating-point number that is as close as possible to the result you would get if you were doing the computation exactly then rounding to the nearest representable FP number. The error of numerical approximations is usually specified in ulps (units in the last place). With IEEE floating point, addition, subtraction, multiplication and division are guaranteed to return results accurate to within 0.5ulps (which is equivalent to saying that the results are the same as if they had been performed exactly then rounded). For a variety of reasons, you can't get the same error bound on pow and a lot of other transcendental functions, but library implementations are usually on the order of 1ulp or less, so they're pretty accurate. Anyway, if you don't care about relative error, but rather need an absolute error bound, you can play fast and loose with small x (as long as y > 1) and the like. > I need "reasonable" accuracy (could be a little looser than the standard > pow() implementation). I've searched online and found some bit twiddling > approaches that claim to be very fast, but they seem to be too > inaccurate for my purposes. I've tried implementing pow() as exp( > log(x), y ), with my own cheap Taylor series in place of the natural log > function. It did produce good output but wasn't very fast (slightly exp(log(x) * y) is already significantly worse than a "true" pow(x, y) in terms of relative error, and it has some extra roundoff steps that aren't directly visible: exp and log are usually implemented in terms of exp2 and log2 (since that matches better with the internal FP representation), so there's some invisible conversion factors in there. Taylor series approximations are quite useless in practice. You can get a lot more precision for the same amount of work (or, equivalently, the same precision with a lot less work) by using optimized approximation polynomials (Minimax polynomials are usually the weapon of choice, but you really need a Computer Algebra package if you want to build your own approximation polynomials). log is still a fairly tough nut to crack however, since it's hard to approximate properly with polynomials. Robin Green did a great GDC 2003 presentation on the subject, it's available here: (transcendental functions are in the 2nd part, but you should still start at the beginning) http://www.research.scea.com/gdc2003/fast-math-functions.html Big spoiler: There's no really great general solution for pow that's significantly faster than what the CRT does. > slower than the CRT pow()). It is probably worth mentioning before > anyone asks that yes I have confirmed pow() as the bottleneck with a > profiling tool ;-) A bit more context would help. What are you pow'ing, and what do you do with the results? General pow is hard, but for applications like rendering, you can usually get by with rather rough approximations. > I would also love to just see a sample implementation of pow(), log(), > and exp() somewhere, even that might be helpful. A handy reference is... the CRT source code. VC++ doesn't come with source code to the math functions though - you can either just disassemble (e.g. with objdump) or look at another CRT implementation. glibc is a good candidate. glibc math implementations are in sysdeps/ieee754 for generic IEEE-754 compliant platforms, with optimized versions for all relevant architectures in sysdeps/<arch>. If you really want to know how it's implemented :) -Fabian |
From: Simon F. <sim...@po...> - 2010-08-19 05:52:19
|
> with my own cheap Taylor series in place of the natural log function. This may be a silly question, but why did you use a natural log instead of a log base 2? Cheers Simon ________________________________ From: Jeff Russell [mailto:je...@8m...] Sent: 19 August 2010 02:16 To: Game Development Algorithms Subject: [Algorithms] fast pow() for limited inputs So I need to speed up the CRT pow() function a bit, but I have some restrictions on the input which hopefully should give me some room to optimize: I need to compute pow(x,y), where x is on [0,1], and y is guaranteed positive and could have an upper bound in the neighborhood of 1000. I need "reasonable" accuracy (could be a little looser than the standard pow() implementation). I've searched online and found some bit twiddling approaches that claim to be very fast, but they seem to be too inaccurate for my purposes. I've tried implementing pow() as exp( log(x), y ), with my own cheap Taylor series in place of the natural log function. It did produce good output but wasn't very fast (slightly slower than the CRT pow()). It is probably worth mentioning before anyone asks that yes I have confirmed pow() as the bottleneck with a profiling tool ;-) I would also love to just see a sample implementation of pow(), log(), and exp() somewhere, even that might be helpful. Thanks, Jeff -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |
From: Jeff R. <je...@8m...> - 2010-08-19 01:16:52
|
So I need to speed up the CRT pow() function a bit, but I have some restrictions on the input which hopefully should give me some room to optimize: I need to compute pow(x,y), where x is on [0,1], and y is guaranteed positive and could have an upper bound in the neighborhood of 1000. I need "reasonable" accuracy (could be a little looser than the standard pow() implementation). I've searched online and found some bit twiddling approaches that claim to be very fast, but they seem to be too inaccurate for my purposes. I've tried implementing pow() as exp( log(x), y ), with my own cheap Taylor series in place of the natural log function. It did produce good output but wasn't very fast (slightly slower than the CRT pow()). It is probably worth mentioning before anyone asks that yes I have confirmed pow() as the bottleneck with a profiling tool ;-) I would also love to just see a sample implementation of pow(), log(), and exp() somewhere, even that might be helpful. Thanks, Jeff -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |
From: Matt J <mjo...@gm...> - 2010-08-08 19:01:12
|
Another thing to check for is how you are loading the texture..e.g. if you are using something like D3DXCreateTextureFromFileEx make sure it isn't rounding something to the nearest power of 2, e.g. use *D3DX_DEFAULT_NONPOW2 * to prevent that... and check that you aren't using a filter e.g. use * D3DX_FILTER_NONE* The aspect ratio of the orthogonal matrix (e.g. width/height) needs to match > the aspect ratio of the viewport...otherwise you get stretching.. not sure > if that is the issue.... > > If you are creating an orthogonal matrix that ranges from -0.5 to 0.5 for > left and right planes, you need to multiply the -0.5 and 0.5 by 0.5859375 > (600/1024) for the bottom and top planes to account for the aspect ratio. > Maybe you are, but you didn't mention that. > > If this part isn't right, it could mess up how your adding the half texel > offset to y as well. Just a hunch, I didn't try doing the math. > > Matthew > > |
From: Matt J <mjo...@gm...> - 2010-08-08 18:37:57
|
The aspect ratio of the orthogonal matrix (e.g. width/height) needs to match the aspect ratio of the viewport...otherwise you get stretching.. not sure if that is the issue.... If you are creating an orthogonal matrix that ranges from -0.5 to 0.5 for left and right planes, you need to multiply the -0.5 and 0.5 by 0.5859375 (600/1024) for the bottom and top planes to account for the aspect ratio. Maybe you are, but you didn't mention that. If this part isn't right, it could mess up how your adding the half texel offset to y as well. Just a hunch, I didn't try doing the math. Matthew - @Matthew: My ortho projection is simply -0.5 to 0.5, and the view has an > offset of 0.5 so that the 'model' deals with 0..1 as the range. So, no, the > aspect ratio really doesn't come into it. Thus, it would just stretch > anything I render to fit the screen, regardless of the resolution. Why do > you ask? > > |
From: Jon W. <jw...@gm...> - 2010-08-08 17:47:07
|
On Sun, Aug 8, 2010 at 5:16 AM, Colin Barrett <bar...@gm...>wrote: > > Subtract -1 and 1 respectively and you get a constant offset of -1/vpw. > This is essentially the same thing that Jon said already, except I'm working > in post projection space so my offset is double his ( at least, I hope > that's why :-) ). > That's exactly right! Sincerely, jw -- Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable. |
From: Colin B. <bar...@gm...> - 2010-08-08 12:16:53
|
On 8 August 2010 04:21, Jason Hughes <jh...@st...> wrote: > - @Colin: I did read that previously, which is why I attempted a half-pixel > offset with the projection and/or view matrices. This did not seem to have > the desired effect of correcting a solid gray to black and white. I was a > bit surprised at that. No value I could put in there seemed to do more than > 25% correction, at best, which leads me to believe it's a texture issue. > I see where you said that now: I was a little bit too anxious to stick my oar in. Apologies! > I guess the real question I had was, how is the best way to correct for > pixel/texel mismatches? Do most people adjust the view matrix or the > projection matrix, or do you modify the vertices on the quads you generate, > or do you trick it with texture matrix modifications or generate the UVs > differently? Lots of options. The easiest seemed to me to be the view > matrix, but when it didn't work, I started looking for other things that > could affect the calculation, but didn't find any culprits. > I like to think of it not so much as an adjustment of those matrices, but as another transform applied after the model/view/projection. If you consider just the x-coordinate, the post-projection value of the sides of your fullscreen quad are -1 and 1. The viewport transform is: x * 0.5 * vpw + (vpx + 0.5 * vpw) Which (assuming vpx is 0) gives you: -1: -0.5 * vpw + 0.5 * vpw = 0 1: 0.5 * vpw + 0.5 * vpw = vpw Where you actually want to be, accounting for the position offset, is -0.5 and vpw - 0.5. So you solve for x and you get (leaving out vpx for simplicity): x * 0.5 * vpw + 0.5 * vpw = -0.5 -> x = -(1 + vpw) / vpw x * 0.5 * vpw + 0.5 * vpw = vpw - 0.5 -> x = (-1 + vpw) / vpw Subtract -1 and 1 respectively and you get a constant offset of -1/vpw. This is essentially the same thing that Jon said already, except I'm working in post projection space so my offset is double his ( at least, I hope that's why :-) ). |
From: Jon W. <jw...@gm...> - 2010-08-08 09:10:36
|
The best way to correct for pixel/texel offsets is to understand what the rendering rules say for the system in question. For OpenGL, 0,0 for the screen is in a corner between pixels, and 0,0 for a texture is in a corner between texels (assuming wrapping here). This means that if you map coordinates 1:1 through math, it will work out fine. For Direct3D9, 0,0 for a texture is in a corner between texels (thus, the term "texel offset" is a bit misleading, as texels don't need to be offset), but 0,0 for the screen is in the center of a pixel. Thus, to match centers of texels to centers of pixels, you either have to offset your texture UV coordinates by 0.5/texwidth and 0.5/texheight, or you have to offset your projection matrix by 0.5/screenwidth and 0.5/screenheight. I prefer to do the latter. In D3D10 and up, they fixed this problem, and it's now the same as OpenGL. Sincerely, jw -- Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable. On Sat, Aug 7, 2010 at 8:21 PM, Jason Hughes <jh...@st...>wrote: > Thanks for all the responses. Here's some more info, in digest form: > > - The target platform is D3D9. I'm creating the textures with > D3DXCreateTextureFromFileInMemoryEx and passing in default to the mip count > so the driver will create all the mips for me. I guess I could try passing > 1 in and see if it changes any behavior by not generating any mips. Thanks > for bringing that up. The graphics card I'm testing this with is my Quadro > FX580, with Maya certified drivers... I would expect it not to pull > shenanigans with LOD bias, but then, I might be wrong. I'm definitely not > setting LOD bias anywhere explicitly. I suppose I could also supply the > texture with mips manually and see what I see. > > - @Colin: I did read that previously, which is why I attempted a half-pixel > offset with the projection and/or view matrices. This did not seem to have > the desired effect of correcting a solid gray to black and white. I was a > bit surprised at that. No value I could put in there seemed to do more than > 25% correction, at best, which leads me to believe it's a texture issue. > > - @Matthew: My ortho projection is simply -0.5 to 0.5, and the view has an > offset of 0.5 so that the 'model' deals with 0..1 as the range. So, no, the > aspect ratio really doesn't come into it. Thus, it would just stretch > anything I render to fit the screen, regardless of the resolution. Why do > you ask? > > I guess the real question I had was, how is the best way to correct for > pixel/texel mismatches? Do most people adjust the view matrix or the > projection matrix, or do you modify the vertices on the quads you generate, > or do you trick it with texture matrix modifications or generate the UVs > differently? Lots of options. The easiest seemed to me to be the view > matrix, but when it didn't work, I started looking for other things that > could affect the calculation, but didn't find any culprits. > > Thanks guys, > JH > > > On 8/7/2010 4:56 PM, Marco Salvi wrote: > > Hi Jason, > > Does your texture come with a full mipmap chain? If that's the case it > would be better to see what happens without mipmaps and just point > filtering. > > On which platform are you working? > > Some drivers with low quality settings may set by default a positive LOD > bias. Are you (inadvertently?) modifying the textures LOD bias at all? > > Marco > > On Sat, Aug 7, 2010 at 2:20 PM, Jason Hughes <jh...@st...>wrote: > >> I've been trying to resolve an issue where full screen textures being >> drawn through the 3D system appear blurry. I feel like I'm probably not >> considering something. Any ideas would be welcome! >> >> - I'm using an orthographic projection matrix along Z, a view matrix >> that is pretty much identity except for some negations on various axes >> to correct for LH/RH differences in the engine. >> - The quad is mapped to the full screen both for vertex positions and >> UVs (0,0,1) to (1,1,1), since the ortho matrix is 0..1, not in screen >> pixels. >> - The texture has the exactly the same number of pixels in both >> directions as the back buffer. >> - Back buffer and window match resolution exactly. >> - Resolution is 1024x600 (but problem exists in other resolutions) >> - Texture sampling is bilinear or anisotropic (shouldn't matter). Both >> show the problem. Sometimes my 2D stuff needs to cleanly scale, so I >> can't force point sampling globally. Texture matrix is disabled. >> >> Steps I've taken to debug: >> - I made a simple texture with regions that are alternating white and >> black columns and rows that are 1, 2, and 4 pixels wide, so I could see >> what was happening with texture sampling. The result is an even gray in >> the 1-pixel wide areas, but can be made brighter and darker depending on >> the offset I put into the matrices. Half-pixel adjustment didn't fix >> it. I never get white or black lines. >> - I checked to make sure that the test image wasn't being munged by the >> texture processing tools and it's perfectly clean (no GIGO). >> >> I can understand not getting perfect rasterization vertically due to the >> height being a non-power-of-two, but I expected columns to be cleanly >> mapped. What else should I be looking for? >> >> Most appreciated, >> JH >> >> Jason Hughes >> President >> Steel Penny Games >> Austin, TX >> >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by >> >> Make an app they can't live without >> Enter the BlackBerry Developer Challenge >> http://p.sf.net/sfu/RIM-dev2dev >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challengehttp://p.sf.net/sfu/RIM-dev2dev > > > _______________________________________________ > GDAlgorithms-list mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives:http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
From: Jason H. <jh...@st...> - 2010-08-08 03:22:01
|
Thanks for all the responses. Here's some more info, in digest form: - The target platform is D3D9. I'm creating the textures with D3DXCreateTextureFromFileInMemoryEx and passing in default to the mip count so the driver will create all the mips for me. I guess I could try passing 1 in and see if it changes any behavior by not generating any mips. Thanks for bringing that up. The graphics card I'm testing this with is my Quadro FX580, with Maya certified drivers... I would expect it not to pull shenanigans with LOD bias, but then, I might be wrong. I'm definitely not setting LOD bias anywhere explicitly. I suppose I could also supply the texture with mips manually and see what I see. - @Colin: I did read that previously, which is why I attempted a half-pixel offset with the projection and/or view matrices. This did not seem to have the desired effect of correcting a solid gray to black and white. I was a bit surprised at that. No value I could put in there seemed to do more than 25% correction, at best, which leads me to believe it's a texture issue. - @Matthew: My ortho projection is simply -0.5 to 0.5, and the view has an offset of 0.5 so that the 'model' deals with 0..1 as the range. So, no, the aspect ratio really doesn't come into it. Thus, it would just stretch anything I render to fit the screen, regardless of the resolution. Why do you ask? I guess the real question I had was, how is the best way to correct for pixel/texel mismatches? Do most people adjust the view matrix or the projection matrix, or do you modify the vertices on the quads you generate, or do you trick it with texture matrix modifications or generate the UVs differently? Lots of options. The easiest seemed to me to be the view matrix, but when it didn't work, I started looking for other things that could affect the calculation, but didn't find any culprits. Thanks guys, JH On 8/7/2010 4:56 PM, Marco Salvi wrote: > Hi Jason, > > Does your texture come with a full mipmap chain? If that's the case it > would be better to see what happens without mipmaps and just point > filtering. > > On which platform are you working? > > Some drivers with low quality settings may set by default a positive > LOD bias. Are you (inadvertently?) modifying the textures LOD bias at all? > > Marco > > On Sat, Aug 7, 2010 at 2:20 PM, Jason Hughes > <jh...@st... <mailto:jh...@st...>> wrote: > > I've been trying to resolve an issue where full screen textures being > drawn through the 3D system appear blurry. I feel like I'm > probably not > considering something. Any ideas would be welcome! > > - I'm using an orthographic projection matrix along Z, a view matrix > that is pretty much identity except for some negations on various axes > to correct for LH/RH differences in the engine. > - The quad is mapped to the full screen both for vertex positions and > UVs (0,0,1) to (1,1,1), since the ortho matrix is 0..1, not in screen > pixels. > - The texture has the exactly the same number of pixels in both > directions as the back buffer. > - Back buffer and window match resolution exactly. > - Resolution is 1024x600 (but problem exists in other resolutions) > - Texture sampling is bilinear or anisotropic (shouldn't matter). > Both > show the problem. Sometimes my 2D stuff needs to cleanly scale, so I > can't force point sampling globally. Texture matrix is disabled. > > Steps I've taken to debug: > - I made a simple texture with regions that are alternating white and > black columns and rows that are 1, 2, and 4 pixels wide, so I > could see > what was happening with texture sampling. The result is an even > gray in > the 1-pixel wide areas, but can be made brighter and darker > depending on > the offset I put into the matrices. Half-pixel adjustment didn't fix > it. I never get white or black lines. > - I checked to make sure that the test image wasn't being munged > by the > texture processing tools and it's perfectly clean (no GIGO). > > I can understand not getting perfect rasterization vertically due > to the > height being a non-power-of-two, but I expected columns to be cleanly > mapped. What else should I be looking for? > > Most appreciated, > JH > > Jason Hughes > President > Steel Penny Games > Austin, TX > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > <mailto:GDA...@li...> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
From: Matt J <mjo...@gm...> - 2010-08-08 01:42:59
|
Does your orthographic projection matrix account for the aspect ratio? Matthew > IIRC you're not guaranteed that the hardware won't filter even when you are > at a perfect 1:1. You may need to add some higher-level logic to manage > your filtering states. I could be wrong about this, but looking at our > code, we're set to point-sampling for all HUD, UI overlays, etc. > > And you likely will need the 1/2 pixel adjustment. > > Double check to be sure you don't have anything strange going on with the > mip range adjustments, etc. Build your textures without mips just to be > sure. > > Matt > > > On Sat, Aug 7, 2010 at 4:20 PM, Jason Hughes <jh...@st...>wrote: > >> I've been trying to resolve an issue where full screen textures being >> drawn through the 3D system appear blurry. I feel like I'm probably not >> considering something. Any ideas would be welcome! >> >> - I'm using an orthographic projection matrix along Z, a view matrix >> that is pretty much identity except for some negations on various axes >> to correct for LH/RH differences in the engine. >> - The quad is mapped to the full screen both for vertex positions and >> UVs (0,0,1) to (1,1,1), since the ortho matrix is 0..1, not in screen >> pixels. >> - The texture has the exactly the same number of pixels in both >> directions as the back buffer. >> - Back buffer and window match resolution exactly. >> - Resolution is 1024x600 (but problem exists in other resolutions) >> - Texture sampling is bilinear or anisotropic (shouldn't matter). Both >> show the problem. Sometimes my 2D stuff needs to cleanly scale, so I >> can't force point sampling globally. Texture matrix is disabled. >> >> Steps I've taken to debug: >> - I made a simple texture with regions that are alternating white and >> black columns and rows that are 1, 2, and 4 pixels wide, so I could see >> what was happening with texture sampling. The result is an even gray in >> the 1-pixel wide areas, but can be made brighter and darker depending on >> the offset I put into the matrices. Half-pixel adjustment didn't fix >> it. I never get white or black lines. >> - I checked to make sure that the test image wasn't being munged by the >> texture processing tools and it's perfectly clean (no GIGO). >> >> I can understand not getting perfect rasterization vertically due to the >> height being a non-power-of-two, but I expected columns to be cleanly >> mapped. What else should I be looking for? >> >> Most appreciated, >> JH >> >> Jason Hughes >> President >> Steel Penny Games >> Austin, TX >> >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by >> >> Make an app they can't live without >> Enter the BlackBerry Developer Challenge >> http://p.sf.net/sfu/RIM-dev2dev >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > Hi Jason: -- ----- Matt Johnson http://otowngraphics.blogspot.com |