You can subscribe to this list here.
2000 
_{Jan}

_{Feb}

_{Mar}
(10) 
_{Apr}
(28) 
_{May}
(41) 
_{Jun}
(91) 
_{Jul}
(63) 
_{Aug}
(45) 
_{Sep}
(37) 
_{Oct}
(80) 
_{Nov}
(91) 
_{Dec}
(47) 

2001 
_{Jan}
(48) 
_{Feb}
(121) 
_{Mar}
(126) 
_{Apr}
(16) 
_{May}
(85) 
_{Jun}
(84) 
_{Jul}
(115) 
_{Aug}
(71) 
_{Sep}
(27) 
_{Oct}
(33) 
_{Nov}
(15) 
_{Dec}
(71) 
2002 
_{Jan}
(73) 
_{Feb}
(34) 
_{Mar}
(39) 
_{Apr}
(135) 
_{May}
(59) 
_{Jun}
(116) 
_{Jul}
(93) 
_{Aug}
(40) 
_{Sep}
(50) 
_{Oct}
(87) 
_{Nov}
(90) 
_{Dec}
(32) 
2003 
_{Jan}
(181) 
_{Feb}
(101) 
_{Mar}
(231) 
_{Apr}
(240) 
_{May}
(148) 
_{Jun}
(228) 
_{Jul}
(156) 
_{Aug}
(49) 
_{Sep}
(173) 
_{Oct}
(169) 
_{Nov}
(137) 
_{Dec}
(163) 
2004 
_{Jan}
(243) 
_{Feb}
(141) 
_{Mar}
(183) 
_{Apr}
(364) 
_{May}
(369) 
_{Jun}
(251) 
_{Jul}
(194) 
_{Aug}
(140) 
_{Sep}
(154) 
_{Oct}
(167) 
_{Nov}
(86) 
_{Dec}
(109) 
2005 
_{Jan}
(176) 
_{Feb}
(140) 
_{Mar}
(112) 
_{Apr}
(158) 
_{May}
(140) 
_{Jun}
(201) 
_{Jul}
(123) 
_{Aug}
(196) 
_{Sep}
(143) 
_{Oct}
(165) 
_{Nov}
(158) 
_{Dec}
(79) 
2006 
_{Jan}
(90) 
_{Feb}
(156) 
_{Mar}
(125) 
_{Apr}
(146) 
_{May}
(169) 
_{Jun}
(146) 
_{Jul}
(150) 
_{Aug}
(176) 
_{Sep}
(156) 
_{Oct}
(237) 
_{Nov}
(179) 
_{Dec}
(140) 
2007 
_{Jan}
(144) 
_{Feb}
(116) 
_{Mar}
(261) 
_{Apr}
(279) 
_{May}
(222) 
_{Jun}
(103) 
_{Jul}
(237) 
_{Aug}
(191) 
_{Sep}
(113) 
_{Oct}
(129) 
_{Nov}
(141) 
_{Dec}
(165) 
2008 
_{Jan}
(152) 
_{Feb}
(195) 
_{Mar}
(242) 
_{Apr}
(146) 
_{May}
(151) 
_{Jun}
(172) 
_{Jul}
(123) 
_{Aug}
(195) 
_{Sep}
(195) 
_{Oct}
(138) 
_{Nov}
(183) 
_{Dec}
(125) 
2009 
_{Jan}
(268) 
_{Feb}
(281) 
_{Mar}
(295) 
_{Apr}
(293) 
_{May}
(273) 
_{Jun}
(265) 
_{Jul}
(406) 
_{Aug}
(679) 
_{Sep}
(434) 
_{Oct}
(357) 
_{Nov}
(306) 
_{Dec}
(478) 
2010 
_{Jan}
(856) 
_{Feb}
(668) 
_{Mar}
(927) 
_{Apr}
(269) 
_{May}
(12) 
_{Jun}
(13) 
_{Jul}
(6) 
_{Aug}
(8) 
_{Sep}
(23) 
_{Oct}
(4) 
_{Nov}
(8) 
_{Dec}
(11) 
2011 
_{Jan}
(4) 
_{Feb}
(2) 
_{Mar}
(3) 
_{Apr}
(9) 
_{May}
(6) 
_{Jun}

_{Jul}
(1) 
_{Aug}
(1) 
_{Sep}

_{Oct}
(2) 
_{Nov}

_{Dec}

2012 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}
(3) 
_{Aug}

_{Sep}
(1) 
_{Oct}

_{Nov}

_{Dec}

2013 
_{Jan}
(2) 
_{Feb}
(2) 
_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}
(7) 
_{Nov}
(1) 
_{Dec}

2014 
_{Jan}

_{Feb}

_{Mar}

_{Apr}
(1) 
_{May}

_{Jun}
(1) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 







1
(6) 
2
(6) 
3
(10) 
4
(19) 
5
(9) 
6
(2) 
7
(8) 
8
(4) 
9
(6) 
10
(17) 
11
(13) 
12

13
(1) 
14
(7) 
15
(6) 
16
(2) 
17
(8) 
18
(4) 
19
(5) 
20

21
(4) 
22
(9) 
23
(2) 
24
(6) 
25
(16) 
26
(7) 
27
(22) 
28
(20) 
29
(1) 
30
(3) 
31
(8) 





From: Josh Vanderhoof <hoof@na...>  20030304 23:52:58

Brian Paul <brian@...> writes: > Josh Vanderhoof wrote: > > Brian Paul <brian@...> writes: > > > >>On my P4 2.4GHz system inv_sqrt() is just slightly slower than using > >>the x86 sqrt instruction and reciprocol. 230 fps vs 235 fps, > >>respectively. I don't know what to expect on other CPUs. > > This is surprising. On my P3, inv_sqrt() runs in about 1/3 the time > > of 1.0/sqrt() (compiled with gcc3.2 O2 fomitframepointer). Can > > you try this test program? > > time_inv_sqrt: 36.358833 ns per iteration (33554432 iterations > in 1.220000 seconds) > time_inv_sqrt_t1: 22.351742 ns per iteration (67108864 iterations > in 1.500000 seconds) > time_inv_sqrt_t4: 24.437904 ns per iteration (67108864 iterations > in 1.640000 seconds) > time_inv_sqrt_t32: 18.775463 ns per iteration (67108864 iterations > in 1.260000 seconds) > > Clearly, your functions are faster than the sqrt instruction here, but > not 3x. I'll have to run some more tests with Mesa to see how it > behaves with other demos. I forgot that Mesa puts the FPU in low precision mode. When I change to test to use 24 bit precision the results become almost identical. time_inv_sqrt: 61.988827 ns per iteration (16777216 iterations in 1.040000 seconds) time_inv_sqrt_t1: 55.134296 ns per iteration (33554432 iterations in 1.850000 seconds) time_inv_sqrt_t4: 59.604645 ns per iteration (16777216 iterations in 1.000000 seconds) time_inv_sqrt_t32: 43.213367 ns per iteration (33554432 iterations in 1.450000 seconds) Only a 10% improvement for the nontable version that way. It wouldn't surprise me if it's actually slower on a P4. Add this line to the beginning of main() to test it: asm ( "fldcw %0" : : "m" (0x003f)); 
From: Laurent Desnogues <laurent.desnogues@wa...>  20030304 23:01:08

Hum, sorry but I will have to remind something obvious... Testing a table lookup based algo in a loop won't give you any info about real performance in a real env where your data cache is used by other stuff... No flame intended, I was just proven wrong so often on similar benchmarks ;) Laurent  Original Message  From: "Josh Vanderhoof" <hoof@...> To: <mesa3ddev@...> Sent: Tuesday, March 04, 2003 9:12 PM Subject: Re: [Mesa3ddev] Mangled Mesa problem > Brian Paul <brian@...> writes: > > > On my P4 2.4GHz system inv_sqrt() is just slightly slower than using > > the x86 sqrt instruction and reciprocol. 230 fps vs 235 fps, > > respectively. I don't know what to expect on other CPUs. > > This is surprising. On my P3, inv_sqrt() runs in about 1/3 the time > of 1.0/sqrt() (compiled with gcc3.2 O2 fomitframepointer). Can > you try this test program? > > #include <string.h> > #include <time.h> > #include <stdio.h> > #include <math.h> > > > /* > inv_sqrt_t1  A single precision 1/sqrt routine for IEEE format floats. > written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk > and Vesa Karvonen. > */ > > float > inv_sqrt_t1(float n) > { > float r0, r1, r2, r3; > float x0, x1, x2, x3; > float y0, y1, y2, y3; > union { float f; unsigned int i; } u; > unsigned int magic; > > /* > Exponent part of the magic number  > > We want to: > 1. subtract the bias from the exponent, > 2. negate it > 3. divide by two (rounding towards inf) > 4. add the bias back > > Which is the same as subtracting the exponent from 381 and dividing > by 2. > > floor((x  127) / 2) + 127 = floor((381  x) / 2) > */ > > magic = 381 << 23; > > /* > Significand part of magic number  > > With the current magic number, "(magic  u.i) >> 1" will give you: > > for 1 <= u.f <= 2: 1.25  u.f / 4 > for 2 <= u.f <= 4: 1.00  u.f / 8 > > This isn't a bad approximation of 1/sqrt. The maximum difference from > 1/sqrt will be around .06. After three NewtonRaphson iterations, the > maximum difference is less than 4.5e8. (Which is actually close > enough to make the following bias academic...) > > To get a better approximation you can add a bias to the magic > number. For example, if you subtract 1/2 of the maximum difference in > the first approximation (.03), you will get the following function: > > for 1 <= u.f <= 2: 1.22  u.f / 4 > for 2 <= u.f <= 3.76: 0.97  u.f / 8 > for 3.76 <= u.f <= 4: 0.72  u.f / 16 > (The 3.76 to 4 range is where the result is < .5.) > > This is the closest possible initial approximation, but with a maximum > error of 8e11 after three NR iterations, it is still not perfect. If > you subtract 0.0332281 instead of .03, the maximum error will be > 2.5e11 after three NR iterations, which should be about as close as > is possible. > > for 1 <= u.f <= 2: 1.2167719  u.f / 4 > for 2 <= u.f <= 3.73: 0.9667719  u.f / 8 > for 3.73 <= u.f <= 4: 0.7167719  u.f / 16 > > */ > > magic = (int)(0.0332281 * (1 << 25)); > > u.f = n; > u.i = (magic  u.i) >> 1; > > /* > Instead of NewtonRaphson, we use Goldschmidt's algorithm, which > allows more parallelism. From what I understand, the parallelism > comes at the cost of less precision, because it lets error > accumulate across iterations. > */ > x0 = 1.0f; > y0 = 0.5f * n; > r0 = u.f; > > x1 = x0 * r0; > y1 = y0 * r0 * r0; > r1 = 1.5f  y1; > > x2 = x1 * r1; > y2 = y1 * r1 * r1; > r2 = 1.5f  y2; > > x3 = x2 * r2; > y3 = y2 * r2 * r2; > r3 = 1.5f  y3; > > return x3 * r3; > } > > > /* > inv_sqrt_t4  A single precision 1/sqrt routine for IEEE format floats. > written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk > and Vesa Karvonen. > > This version uses a four entry table to reduce the number of NR iterations > from 3 to 2. The approximation method is similar to inv_sqrt_t1 except the > estimate is multiplied by the scale entry in the table to allow the slope of > the approximation function to be chosen without restriction. > */ > > static struct { > unsigned int bias; > float scale; > } inv_sqrt_table_4[] = { > { 0xbe5a659a, 1.03805210 }, /* 2.000000  3.000000 */ > { 0xbf1c01fa, 0.61880215 }, /* 3.000000  4.000000 */ > > { 0xbdda659a, 1.46802735 }, /* 1.000000  1.500000 */ > { 0xbe9c01fa, 0.87511840 }, /* 1.500000  2.000000 */ > }; > > float > inv_sqrt_t4(float n) > { > float r0, r1, r2; > float x0, x1, x2; > float y0, y1, y2; > union { float f; unsigned int i; } u; > unsigned int idx; > > u.f = n; > > idx = (u.i >> 22) & 3; > > u.i = inv_sqrt_table_4[idx].bias  u.i; > u.i >>= 1; > > u.f = u.f * inv_sqrt_table_4[idx].scale; > > x0 = 1.0f; > y0 = 0.5f * n; > r0 = u.f; > > x1 = x0 * r0; > y1 = y0 * r0 * r0; > r1 = 1.5f  y1; > > x2 = x1 * r1; > y2 = y1 * r1 * r1; > r2 = 1.5f  y2; > > return x2 * r2; > } > > > /* > inv_sqrt_t32  A single precision 1/sqrt routine for IEEE format floats. > written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk > and Vesa Karvonen. > > This is like inv_sqrt_t4 except with a larger table and one less NR iteration. > */ > > static struct { > unsigned int bias; > float scale; > } inv_sqrt_table_32[] = { > { 0xbe0be4e2, 1.35119620 }, /* 2.000000  2.125000 */ > { 0xbe23e66e, 1.23697113 }, /* 2.125000  2.250000 */ > { 0xbe3be7d0, 1.13798286 }, /* 2.250000  2.375000 */ > { 0xbe53e90e, 1.05152976 }, /* 2.375000  2.500000 */ > { 0xbe6bea2c, 0.97549646 }, /* 2.500000  2.625000 */ > { 0xbe83eb30, 0.90820548 }, /* 2.625000  2.750000 */ > { 0xbe9bec1c, 0.84831133 }, /* 2.750000  2.875000 */ > { 0xbeb3ecf4, 0.79472355 }, /* 2.875000  3.000000 */ > { 0xbecbedbc, 0.74655003 }, /* 3.000000  3.125000 */ > { 0xbee3ee72, 0.70305464 }, /* 3.125000  3.250000 */ > { 0xbefbef1c, 0.66362511 }, /* 3.250000  3.375000 */ > { 0xbf13efba, 0.62774849 }, /* 3.375000  3.500000 */ > { 0xbf2bf04c, 0.59499215 }, /* 3.500000  3.625000 */ > { 0xbf43f0d4, 0.56498892 }, /* 3.625000  3.750000 */ > { 0xbf5bf152, 0.53742538 }, /* 3.750000  3.875000 */ > { 0xbf73f1ca, 0.51203251 }, /* 3.875000  4.000000 */ > > { 0xbd8be4e2, 1.91087999 }, /* 1.000000  1.062500 */ > { 0xbda3e66e, 1.74934135 }, /* 1.062500  1.125000 */ > { 0xbdbbe7d0, 1.60935079 }, /* 1.125000  1.187500 */ > { 0xbdd3e90e, 1.48708765 }, /* 1.187500  1.250000 */ > { 0xbdebea2c, 1.37956032 }, /* 1.250000  1.312500 */ > { 0xbe03eb30, 1.28439651 }, /* 1.312500  1.375000 */ > { 0xbe1bec1c, 1.19969339 }, /* 1.375000  1.437500 */ > { 0xbe33ecf4, 1.12390882 }, /* 1.437500  1.500000 */ > { 0xbe4bedbc, 1.05578118 }, /* 1.500000  1.562500 */ > { 0xbe63ee72, 0.99426940 }, /* 1.562500  1.625000 */ > { 0xbe7bef1c, 0.93850762 }, /* 1.625000  1.687500 */ > { 0xbe93efba, 0.88777043 }, /* 1.687500  1.750000 */ > { 0xbeabf04c, 0.84144597 }, /* 1.750000  1.812500 */ > { 0xbec3f0d4, 0.79901500 }, /* 1.812500  1.875000 */ > { 0xbedbf152, 0.76003425 }, /* 1.875000  1.937500 */ > { 0xbef3f1ca, 0.72412332 }, /* 1.937500  2.000000 */ > }; > > float > inv_sqrt_t32(float n) > { > union { float f; unsigned int i; } u; > unsigned int idx; > > u.f = n; > > idx = (u.i >> 19) & 31; > > u.i = inv_sqrt_table_32[idx].bias  u.i; > u.i >>= 1; > > u.f = u.f * inv_sqrt_table_32[idx].scale; > > return u.f * (1.5F  0.5F * n * u.f * u.f); > } > > float > inv_sqrt(float n) > { > return 1.0 / sqrt(n); > } > > #define DFN_TIME_FN(fn) \ > void \ > time_ ## fn(int n) \ > { \ > int i; \ > \ > for (i = 0; i < n; i++) { \ > fn(1.25); \ > } \ > } > > DFN_TIME_FN(inv_sqrt) > DFN_TIME_FN(inv_sqrt_t1) > DFN_TIME_FN(inv_sqrt_t4) > DFN_TIME_FN(inv_sqrt_t32) > > void > print_time(void (*x)(int), char *name) > { > clock_t t0, t1; > double seconds; > int n = 1; > > (*x)(100); > > do { > n += n; > t0 = clock(); > (*x)(n); > t1 = clock(); > } while (t1  t0 < CLOCKS_PER_SEC); > > seconds = ((double)t1  (double)t0) / CLOCKS_PER_SEC; > > printf("%20s: %.6f ns per iteration (%d iterations in %f seconds)\n", > name, > seconds * 1000000000 / n, > n, > seconds); > } > > #define PRINT_TIME(x) print_time(x, #x) > > > int > main(void) > { > PRINT_TIME(time_inv_sqrt); > PRINT_TIME(time_inv_sqrt_t1); > PRINT_TIME(time_inv_sqrt_t4); > PRINT_TIME(time_inv_sqrt_t32); > > return 0; > } > > > > >  > This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger > for complex code. Debugging C/C++ programs can leave you feeling lost and > disoriented. TotalView can help you find your way. Available on major UNIX > and Linux platforms. Try it free. http://www.etnus.com > _______________________________________________ > Mesa3ddev mailing list > Mesa3ddev@... > https://lists.sourceforge.net/lists/listinfo/mesa3ddev 
From: Brian Paul <brian@tu...>  20030304 22:14:11

Josh Vanderhoof wrote: > Brian Paul <brian@...> writes: > > >>On my P4 2.4GHz system inv_sqrt() is just slightly slower than using >>the x86 sqrt instruction and reciprocol. 230 fps vs 235 fps, >>respectively. I don't know what to expect on other CPUs. > > > This is surprising. On my P3, inv_sqrt() runs in about 1/3 the time > of 1.0/sqrt() (compiled with gcc3.2 O2 fomitframepointer). Can > you try this test program? time_inv_sqrt: 36.358833 ns per iteration (33554432 iterations in 1.220000 seconds) time_inv_sqrt_t1: 22.351742 ns per iteration (67108864 iterations in 1.500000 seconds) time_inv_sqrt_t4: 24.437904 ns per iteration (67108864 iterations in 1.640000 seconds) time_inv_sqrt_t32: 18.775463 ns per iteration (67108864 iterations in 1.260000 seconds) Clearly, your functions are faster than the sqrt instruction here, but not 3x. I'll have to run some more tests with Mesa to see how it behaves with other demos. Brian 
From: Josh Vanderhoof <hoof@na...>  20030304 20:12:58

Brian Paul <brian@...> writes: > On my P4 2.4GHz system inv_sqrt() is just slightly slower than using > the x86 sqrt instruction and reciprocol. 230 fps vs 235 fps, > respectively. I don't know what to expect on other CPUs. This is surprising. On my P3, inv_sqrt() runs in about 1/3 the time of 1.0/sqrt() (compiled with gcc3.2 O2 fomitframepointer). Can you try this test program? #include <string.h> #include <time.h> #include <stdio.h> #include <math.h> /* inv_sqrt_t1  A single precision 1/sqrt routine for IEEE format floats. written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk and Vesa Karvonen. */ float inv_sqrt_t1(float n) { float r0, r1, r2, r3; float x0, x1, x2, x3; float y0, y1, y2, y3; union { float f; unsigned int i; } u; unsigned int magic; /* Exponent part of the magic number  We want to: 1. subtract the bias from the exponent, 2. negate it 3. divide by two (rounding towards inf) 4. add the bias back Which is the same as subtracting the exponent from 381 and dividing by 2. floor((x  127) / 2) + 127 = floor((381  x) / 2) */ magic = 381 << 23; /* Significand part of magic number  With the current magic number, "(magic  u.i) >> 1" will give you: for 1 <= u.f <= 2: 1.25  u.f / 4 for 2 <= u.f <= 4: 1.00  u.f / 8 This isn't a bad approximation of 1/sqrt. The maximum difference from 1/sqrt will be around .06. After three NewtonRaphson iterations, the maximum difference is less than 4.5e8. (Which is actually close enough to make the following bias academic...) To get a better approximation you can add a bias to the magic number. For example, if you subtract 1/2 of the maximum difference in the first approximation (.03), you will get the following function: for 1 <= u.f <= 2: 1.22  u.f / 4 for 2 <= u.f <= 3.76: 0.97  u.f / 8 for 3.76 <= u.f <= 4: 0.72  u.f / 16 (The 3.76 to 4 range is where the result is < .5.) This is the closest possible initial approximation, but with a maximum error of 8e11 after three NR iterations, it is still not perfect. If you subtract 0.0332281 instead of .03, the maximum error will be 2.5e11 after three NR iterations, which should be about as close as is possible. for 1 <= u.f <= 2: 1.2167719  u.f / 4 for 2 <= u.f <= 3.73: 0.9667719  u.f / 8 for 3.73 <= u.f <= 4: 0.7167719  u.f / 16 */ magic = (int)(0.0332281 * (1 << 25)); u.f = n; u.i = (magic  u.i) >> 1; /* Instead of NewtonRaphson, we use Goldschmidt's algorithm, which allows more parallelism. From what I understand, the parallelism comes at the cost of less precision, because it lets error accumulate across iterations. */ x0 = 1.0f; y0 = 0.5f * n; r0 = u.f; x1 = x0 * r0; y1 = y0 * r0 * r0; r1 = 1.5f  y1; x2 = x1 * r1; y2 = y1 * r1 * r1; r2 = 1.5f  y2; x3 = x2 * r2; y3 = y2 * r2 * r2; r3 = 1.5f  y3; return x3 * r3; } /* inv_sqrt_t4  A single precision 1/sqrt routine for IEEE format floats. written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk and Vesa Karvonen. This version uses a four entry table to reduce the number of NR iterations from 3 to 2. The approximation method is similar to inv_sqrt_t1 except the estimate is multiplied by the scale entry in the table to allow the slope of the approximation function to be chosen without restriction. */ static struct { unsigned int bias; float scale; } inv_sqrt_table_4[] = { { 0xbe5a659a, 1.03805210 }, /* 2.000000  3.000000 */ { 0xbf1c01fa, 0.61880215 }, /* 3.000000  4.000000 */ { 0xbdda659a, 1.46802735 }, /* 1.000000  1.500000 */ { 0xbe9c01fa, 0.87511840 }, /* 1.500000  2.000000 */ }; float inv_sqrt_t4(float n) { float r0, r1, r2; float x0, x1, x2; float y0, y1, y2; union { float f; unsigned int i; } u; unsigned int idx; u.f = n; idx = (u.i >> 22) & 3; u.i = inv_sqrt_table_4[idx].bias  u.i; u.i >>= 1; u.f = u.f * inv_sqrt_table_4[idx].scale; x0 = 1.0f; y0 = 0.5f * n; r0 = u.f; x1 = x0 * r0; y1 = y0 * r0 * r0; r1 = 1.5f  y1; x2 = x1 * r1; y2 = y1 * r1 * r1; r2 = 1.5f  y2; return x2 * r2; } /* inv_sqrt_t32  A single precision 1/sqrt routine for IEEE format floats. written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk and Vesa Karvonen. This is like inv_sqrt_t4 except with a larger table and one less NR iteration. */ static struct { unsigned int bias; float scale; } inv_sqrt_table_32[] = { { 0xbe0be4e2, 1.35119620 }, /* 2.000000  2.125000 */ { 0xbe23e66e, 1.23697113 }, /* 2.125000  2.250000 */ { 0xbe3be7d0, 1.13798286 }, /* 2.250000  2.375000 */ { 0xbe53e90e, 1.05152976 }, /* 2.375000  2.500000 */ { 0xbe6bea2c, 0.97549646 }, /* 2.500000  2.625000 */ { 0xbe83eb30, 0.90820548 }, /* 2.625000  2.750000 */ { 0xbe9bec1c, 0.84831133 }, /* 2.750000  2.875000 */ { 0xbeb3ecf4, 0.79472355 }, /* 2.875000  3.000000 */ { 0xbecbedbc, 0.74655003 }, /* 3.000000  3.125000 */ { 0xbee3ee72, 0.70305464 }, /* 3.125000  3.250000 */ { 0xbefbef1c, 0.66362511 }, /* 3.250000  3.375000 */ { 0xbf13efba, 0.62774849 }, /* 3.375000  3.500000 */ { 0xbf2bf04c, 0.59499215 }, /* 3.500000  3.625000 */ { 0xbf43f0d4, 0.56498892 }, /* 3.625000  3.750000 */ { 0xbf5bf152, 0.53742538 }, /* 3.750000  3.875000 */ { 0xbf73f1ca, 0.51203251 }, /* 3.875000  4.000000 */ { 0xbd8be4e2, 1.91087999 }, /* 1.000000  1.062500 */ { 0xbda3e66e, 1.74934135 }, /* 1.062500  1.125000 */ { 0xbdbbe7d0, 1.60935079 }, /* 1.125000  1.187500 */ { 0xbdd3e90e, 1.48708765 }, /* 1.187500  1.250000 */ { 0xbdebea2c, 1.37956032 }, /* 1.250000  1.312500 */ { 0xbe03eb30, 1.28439651 }, /* 1.312500  1.375000 */ { 0xbe1bec1c, 1.19969339 }, /* 1.375000  1.437500 */ { 0xbe33ecf4, 1.12390882 }, /* 1.437500  1.500000 */ { 0xbe4bedbc, 1.05578118 }, /* 1.500000  1.562500 */ { 0xbe63ee72, 0.99426940 }, /* 1.562500  1.625000 */ { 0xbe7bef1c, 0.93850762 }, /* 1.625000  1.687500 */ { 0xbe93efba, 0.88777043 }, /* 1.687500  1.750000 */ { 0xbeabf04c, 0.84144597 }, /* 1.750000  1.812500 */ { 0xbec3f0d4, 0.79901500 }, /* 1.812500  1.875000 */ { 0xbedbf152, 0.76003425 }, /* 1.875000  1.937500 */ { 0xbef3f1ca, 0.72412332 }, /* 1.937500  2.000000 */ }; float inv_sqrt_t32(float n) { union { float f; unsigned int i; } u; unsigned int idx; u.f = n; idx = (u.i >> 19) & 31; u.i = inv_sqrt_table_32[idx].bias  u.i; u.i >>= 1; u.f = u.f * inv_sqrt_table_32[idx].scale; return u.f * (1.5F  0.5F * n * u.f * u.f); } float inv_sqrt(float n) { return 1.0 / sqrt(n); } #define DFN_TIME_FN(fn) \ void \ time_ ## fn(int n) \ { \ int i; \ \ for (i = 0; i < n; i++) { \ fn(1.25); \ } \ } DFN_TIME_FN(inv_sqrt) DFN_TIME_FN(inv_sqrt_t1) DFN_TIME_FN(inv_sqrt_t4) DFN_TIME_FN(inv_sqrt_t32) void print_time(void (*x)(int), char *name) { clock_t t0, t1; double seconds; int n = 1; (*x)(100); do { n += n; t0 = clock(); (*x)(n); t1 = clock(); } while (t1  t0 < CLOCKS_PER_SEC); seconds = ((double)t1  (double)t0) / CLOCKS_PER_SEC; printf("%20s: %.6f ns per iteration (%d iterations in %f seconds)\n", name, seconds * 1000000000 / n, n, seconds); } #define PRINT_TIME(x) print_time(x, #x) int main(void) { PRINT_TIME(time_inv_sqrt); PRINT_TIME(time_inv_sqrt_t1); PRINT_TIME(time_inv_sqrt_t4); PRINT_TIME(time_inv_sqrt_t32); return 0; } 
From: SourceForge.net <noreply@so...>  20030304 19:22:12

Bugs item #697070, was opened at 20030304 04:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100003&aid=697070&group_id=3 Category: other Group: Compile/Install Status: Open Resolution: None Priority: 5 Submitted By: ChuoLing Chang (chuoling) Assigned to: Nobody/Anonymous (nobody) Summary: (OSMesa) libOSMesa32.so compile errors Initial Comment: To compile Mesa and OSMesa with 16/32bit color buffer, I did the following (03/03/2003 RedHat 7.1): ====================================================== cvs d:pserver:anonymous@...:/cvsroot/mesa3d login cvs z3 d:pserver:anonymous@...:/cvsroot/mesa3d co Mesa bootstrap configure prefix=/usr make make check make install % compile libOSMesa16.so libOSMesa32.so for OSMesa with deep color buffer mkdir ./lib cd ./src create an empty file "depend" make f Makefile.X11 dep make f Makefile.X11 clean make f Makefile.OSMesa16 linuxosmesa16 make f Makefile.X11 clean make f Makefile.OSMesa16 linuxosmesa32 ============================================= libOSMesa16.so build correctly, but libOSMesa32.so gives the following errors: gcc c I. I../include O3 ansi pedantic Wall Wmissingprototypes Wundef fPIC ffastmath D_SVID_SOURCE D_BSD_SOURCE I/usr/X11R6/include DUSE_XSHM DPTHREADS DDEBUG DMESA_DEBUG DCHAN_BITS=32 DDEFAULT_SOFTWARE_DEPTH_BITS=31 swrast/s_texture.c o swrast/s_texture.o swrast/s_texture.c: In function `_swrast_texture_table_lookup': swrast/s_texture.c:329: array subscript is not an integer swrast/s_texture.c:365: array subscript is not an integer swrast/s_texture.c:399: array subscript is not an integer swrast/s_texture.c:435: array subscript is not an integer swrast/s_texture.c:436: array subscript is not an integer swrast/s_texture.c:477: array subscript is not an integer swrast/s_texture.c:478: array subscript is not an integer swrast/s_texture.c:479: array subscript is not an integer swrast/s_texture.c:520: array subscript is not an integer swrast/s_texture.c:521: array subscript is not an integer swrast/s_texture.c:522: array subscript is not an integer swrast/s_texture.c:523: array subscript is not an integer make[1]: *** [swrast/s_texture.o] Error 1 make[1]: Leaving directory `/usr/src/source/Mesa/src' make: *** [linuxosmesa32] Error 2  >Comment By: Brian Paul (brianp) Date: 20030304 19:32 Message: Logged In: YES user_id=983 Try a CVS update  it should be OK now.  You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100003&aid=697070&group_id=3 
From: Kendall Bennett <KendallB@scitechsoft.com>  20030304 19:20:30

Keith Whitwell <keith@...> wrote: > > The profiler tells us that the performance bottleneck is squaring in the > > display list processing code > > Squaring? what squaring? Typo. Should have read 'squarely in the display list processing code'. We don't know where yet, but the display list functions were taking up the bulk of the time it seems. > Go ahead & build one  it should be pretty straightforward if you > already know how to write a proper driver. Will do. I was just hoping to avoid some effort ;) Regards,  Kendall Bennett Chief Executive Officer SciTech Software, Inc. Phone: (530) 894 8400 http://www.scitechsoft.com ~ SciTech SNAP  The future of device driver technology! ~ 
From: Brian Paul <brian@tu...>  20030304 18:16:05

Kendall Bennett wrote: > Keith Whitwell <keith@...> wrote: > > >>Nothing like this currently exists. However, I would have thought >>a reasonable profiler could tell you what is going on without >>resorting to this? > > > The profiler tells us that the performance bottleneck is squaring in the > display list processing code, What do you mean by squaring? The square of what? > but we would like to find out just how much > overhead is in there. If we had a NULL renderer to use, it would allow us > to performance tune just the display list code and not worry about the > back end initially. It shouldn't be hard to do that. I don't know if you're using hardware or software T&L or what, but you should be able to find a spot to shortcircuit the point/line/triangle functions or vertexbuffer renderer. Brian 
From: Keith Whitwell <keith@tu...>  20030304 18:14:51

Kendall Bennett wrote: > Keith Whitwell <keith@...> wrote: > > >>Nothing like this currently exists. However, I would have thought >>a reasonable profiler could tell you what is going on without >>resorting to this? > > > The profiler tells us that the performance bottleneck is squaring in the > display list processing code Squaring? what squaring? > but we would like to find out just how much > overhead is in there. If we had a NULL renderer to use, it would allow us > to performance tune just the display list code and not worry about the > back end initially. Go ahead & build one  it should be pretty straightforward if you already know how to write a proper driver. Keith 
From: Kendall Bennett <KendallB@scitechsoft.com>  20030304 18:06:23

Keith Whitwell <keith@...> wrote: > Nothing like this currently exists. However, I would have thought > a reasonable profiler could tell you what is going on without > resorting to this? The profiler tells us that the performance bottleneck is squaring in the display list processing code, but we would like to find out just how much overhead is in there. If we had a NULL renderer to use, it would allow us to performance tune just the display list code and not worry about the back end initially. Regards,  Kendall Bennett Chief Executive Officer SciTech Software, Inc. Phone: (530) 894 8400 http://www.scitechsoft.com ~ SciTech SNAP  The future of device driver technology! ~ 
From: Gareth Hughes <gareth@nv...>  20030304 17:52:27

Keith Whitwell wrote: > > Profilers like oprofile and presumable vtune handle shared > libs just fine. VTune is still my profiler of choice, but then again I have a Windows system to run the GUI on. The remote data collection features works very well on Linux.  Gareth 
From: Brian Paul <brian@tu...>  20030304 16:35:18

Josh Vanderhoof wrote: > Brian Paul <brian@...> writes: > > >>Mangling isn't the issue. A bad sqrt() function is, I believe. >> >>When GL_SQRT() was defined in mmath.h it evaluated to the std sqrt() function. >>When I moved this macro to imports.h (and renamed it SQRTF()) I >>noticed that we weren't using the optimized (lookup table based) sqrt >>function. Thinking it was an oversight, I restored the optimized >>routine. >> >>Turns out, the optimized sqrt() function isn't accurate enough. A >>bunch of lighting conformance tests fail with it. > > > Here's a fast 1/sqrt you can use. This should be off by at most about > 1 ulp. If you want more speed and less precision, you can chop off an > iteration or two at the end. (To be as close as possible you would > have to recompute the 'magic number' too, but that would only make a > tiny difference.) Thanks, Josh. I've done some experimentation. I inserted calls to inv_sqrt() in the lighting and normalization routines (the hot spots for 1/sqrt) and ran some tests. I found that I could remove the last iteration of the algorithm and pass conformance. But if I removed two iterations, we fail some lighting tests. With the last iteration disabled I did a simple benchmark. I modified the isosurf demo to use glScalef and glEnable(GL_NORMALIZE). I shrunk the window to 30x30 pixels to minimize the rasterization factor then ran a few benchmarks ('b' key). On my P4 2.4GHz system inv_sqrt() is just slightly slower than using the x86 sqrt instruction and reciprocol. 230 fps vs 235 fps, respectively. I don't know what to expect on other CPUs. I'll include the code in Mesa, via an INV_SQRT() macro which can be easily redefined to use inv_sqrt(x) or 1.0/sqrt(x). Maybe other people will want to experiment with it. Brian 
From: Keith Whitwell <keith@tu...>  20030304 15:46:24

Laurent DESNOGUES wrote: >>>Nothing like this currently exists. However, I would have thought a >>>reasonable profiler could tell you what is going on without resorting >> > to > >>>this? >> >>I concur. > > > I never succeeded at profiling code in a shared lib and > linking with a static version of Mesa will make sure you will > meet Heisenberg ;) > > Note however that my attemps at profiling a shared lib were > made in a very specific Linux environments, so I may be out > on that... Profilers like oprofile and presumable vtune handle shared libs just fine. Keith 
From: Laurent DESNOGUES <Laurent.DESNOGUES@estereltechnologies.com>  20030304 14:44:56

> > Nothing like this currently exists. However, I would have thought a > > reasonable profiler could tell you what is going on without resorting to > > this? > > I concur. I never succeeded at profiling code in a shared lib and linking with a static version of Mesa will make sure you will meet Heisenberg ;) Note however that my attemps at profiling a shared lib were made in a very specific Linux environments, so I may be out on that... Laurent 
From: Brian Paul <brian@tu...>  20030304 14:35:19

Keith Whitwell wrote: > Kendall Bennett wrote: > >> Hi Guys, >> >> We are doing some performance profiling of our Mesa>Direct3D drivers, >> and are getting absolutely dismal performance from the display list >> components in Mesa. On not so good hardware with Quake3 engines we can >> get to about 80% of the speed of an ICD driver. However when we run >> ProCDRS to benchmark CAD performance, the ICD drivers are on the order >> of 10x50x faster! Some quick profiling has shown that the bottlenecks >> appear to be in the display list code. In general, Mesa's display lists don't have any big performance problems that I'm aware of. There is some optimization going on. Display lists do take more memory than they should in some cases, but that's a different problem. >> So, to determine just how much of a bottleneck the generic Mesa code >> is, we would like to code up a NULL renderer for Mesa and try running >> ProCDRS on that. Before I waste time trying to do this, I was >> wondering if anyone else has already done something similar? Basically >> it would be a Mesa driver that would allow me to do absolutely nothing >> when asked to render pixels, and possibly another level to actually do >> nothing and return right when vertices are about to be passed down the >> geometry and clipping pipelines. That way we can profile how much of >> overhead is in the display list and other generic pieces of code and >> more important see which hot areas need some work. At one point, there was a NULLrasterizer feature in Mesa, but it's long gone. > Nothing like this currently exists. However, I would have thought a > reasonable profiler could tell you what is going on without resorting to > this? I concur. Brian 
From: Keith Whitwell <keith@tu...>  20030304 10:12:38

Kendall Bennett wrote: > Hi Guys, > > We are doing some performance profiling of our Mesa>Direct3D > drivers, and are getting absolutely dismal performance from the > display list components in Mesa. On not so good hardware with Quake3 > engines we can get to about 80% of the speed of an ICD driver. > However when we run ProCDRS to benchmark CAD performance, the ICD > drivers are on the order of 10x50x faster! Some quick profiling has > shown that the bottlenecks appear to be in the display list code. > > So, to determine just how much of a bottleneck the generic Mesa code > is, we would like to code up a NULL renderer for Mesa and try running > ProCDRS on that. Before I waste time trying to do this, I was > wondering if anyone else has already done something similar? > Basically it would be a Mesa driver that would allow me to do > absolutely nothing when asked to render pixels, and possibly another > level to actually do nothing and return right when vertices are about > to be passed down the geometry and clipping pipelines. That way we > can profile how much of overhead is in the display list and other > generic pieces of code and more important see which hot areas need > some work. Nothing like this currently exists. However, I would have thought a reasonable profiler could tell you what is going on without resorting to this? Keith 
From: Josh Vanderhoof <hoof@na...>  20030304 05:08:05

Brian Paul <brian@...> writes: > Mangling isn't the issue. A bad sqrt() function is, I believe. > > When GL_SQRT() was defined in mmath.h it evaluated to the std sqrt() function. > When I moved this macro to imports.h (and renamed it SQRTF()) I > noticed that we weren't using the optimized (lookup table based) sqrt > function. Thinking it was an oversight, I restored the optimized > routine. > > Turns out, the optimized sqrt() function isn't accurate enough. A > bunch of lighting conformance tests fail with it. Here's a fast 1/sqrt you can use. This should be off by at most about 1 ulp. If you want more speed and less precision, you can chop off an iteration or two at the end. (To be as close as possible you would have to recompute the 'magic number' too, but that would only make a tiny difference.) /* inv_sqrt  A single precision 1/sqrt routine for IEEE format floats. written by Josh Vanderhoof, based on newsgroup posts by James Van Buskirk and Vesa Karvonen. */ float inv_sqrt(float n) { float r0, r1, r2, r3; float x0, x1, x2, x3; float y0, y1, y2, y3; union { float f; unsigned int i; } u; unsigned int magic; /* Exponent part of the magic number  We want to: 1. subtract the bias from the exponent, 2. negate it 3. divide by two (rounding towards inf) 4. add the bias back Which is the same as subtracting the exponent from 381 and dividing by 2. floor((x  127) / 2) + 127 = floor((381  x) / 2) */ magic = 381 << 23; /* Significand part of magic number  With the current magic number, "(magic  u.i) >> 1" will give you: for 1 <= u.f <= 2: 1.25  u.f / 4 for 2 <= u.f <= 4: 1.00  u.f / 8 This isn't a bad approximation of 1/sqrt. The maximum difference from 1/sqrt will be around .06. After three NewtonRaphson iterations, the maximum difference is less than 4.5e8. (Which is actually close enough to make the following bias academic...) To get a better approximation you can add a bias to the magic number. For example, if you subtract 1/2 of the maximum difference in the first approximation (.03), you will get the following function: for 1 <= u.f <= 2: 1.22  u.f / 4 for 2 <= u.f <= 3.76: 0.97  u.f / 8 for 3.76 <= u.f <= 4: 0.72  u.f / 16 (The 3.76 to 4 range is where the result is < .5.) This is the closest possible initial approximation, but with a maximum error of 8e11 after three NR iterations, it is still not perfect. If you subtract 0.0332281 instead of .03, the maximum error will be 2.5e11 after three NR iterations, which should be about as close as is possible. for 1 <= u.f <= 2: 1.2167719  u.f / 4 for 2 <= u.f <= 3.73: 0.9667719  u.f / 8 for 3.73 <= u.f <= 4: 0.7167719  u.f / 16 */ magic = (int)(0.0332281 * (1 << 25)); u.f = n; u.i = (magic  u.i) >> 1; /* Instead of NewtonRaphson, we use Goldschmidt's algorithm, which allows more parallelism. From what I understand, the parallelism comes at the cost of less precision, because it lets error accumulate across iterations. */ x0 = 1.0f; y0 = 0.5f * n; r0 = u.f; x1 = x0 * r0; y1 = y0 * r0 * r0; r1 = 1.5f  y1; x2 = x1 * r1; y2 = y1 * r1 * r1; r2 = 1.5f  y2; x3 = x2 * r2; y3 = y2 * r2 * r2; r3 = 1.5f  y3; return x3 * r3; } 
From: SourceForge.net <noreply@so...>  20030304 04:47:29

Bugs item #697070, was opened at 20030303 20:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100003&aid=697070&group_id=3 Category: other Group: Compile/Install Status: Open Resolution: None Priority: 5 Submitted By: ChuoLing Chang (chuoling) Assigned to: Nobody/Anonymous (nobody) Summary: (OSMesa) libOSMesa32.so compile errors Initial Comment: To compile Mesa and OSMesa with 16/32bit color buffer, I did the following (03/03/2003 RedHat 7.1): ====================================================== cvs d:pserver:anonymous@...:/cvsroot/mesa3d login cvs z3 d:pserver:anonymous@...:/cvsroot/mesa3d co Mesa bootstrap configure prefix=/usr make make check make install % compile libOSMesa16.so libOSMesa32.so for OSMesa with deep color buffer mkdir ./lib cd ./src create an empty file "depend" make f Makefile.X11 dep make f Makefile.X11 clean make f Makefile.OSMesa16 linuxosmesa16 make f Makefile.X11 clean make f Makefile.OSMesa16 linuxosmesa32 ============================================= libOSMesa16.so build correctly, but libOSMesa32.so gives the following errors: gcc c I. I../include O3 ansi pedantic Wall Wmissingprototypes Wundef fPIC ffastmath D_SVID_SOURCE D_BSD_SOURCE I/usr/X11R6/include DUSE_XSHM DPTHREADS DDEBUG DMESA_DEBUG DCHAN_BITS=32 DDEFAULT_SOFTWARE_DEPTH_BITS=31 swrast/s_texture.c o swrast/s_texture.o swrast/s_texture.c: In function `_swrast_texture_table_lookup': swrast/s_texture.c:329: array subscript is not an integer swrast/s_texture.c:365: array subscript is not an integer swrast/s_texture.c:399: array subscript is not an integer swrast/s_texture.c:435: array subscript is not an integer swrast/s_texture.c:436: array subscript is not an integer swrast/s_texture.c:477: array subscript is not an integer swrast/s_texture.c:478: array subscript is not an integer swrast/s_texture.c:479: array subscript is not an integer swrast/s_texture.c:520: array subscript is not an integer swrast/s_texture.c:521: array subscript is not an integer swrast/s_texture.c:522: array subscript is not an integer swrast/s_texture.c:523: array subscript is not an integer make[1]: *** [swrast/s_texture.o] Error 1 make[1]: Leaving directory `/usr/src/source/Mesa/src' make: *** [linuxosmesa32] Error 2  You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100003&aid=697070&group_id=3 
From: Kendall Bennett <KendallB@scitechsoft.com>  20030304 01:42:35

Hi Guys, We are doing some performance profiling of our Mesa>Direct3D drivers, and are getting absolutely dismal performance from the display list components in Mesa. On not so good hardware with Quake3 engines we can get to about 80% of the speed of an ICD driver. However when we run ProCDRS to benchmark CAD performance, the ICD drivers are on the order of 10x50x faster! Some quick profiling has shown that the bottlenecks appear to be in the display list code. So, to determine just how much of a bottleneck the generic Mesa code is, we would like to code up a NULL renderer for Mesa and try running ProCDRS on that. Before I waste time trying to do this, I was wondering if anyone else has already done something similar? Basically it would be a Mesa driver that would allow me to do absolutely nothing when asked to render pixels, and possibly another level to actually do nothing and return right when vertices are about to be passed down the geometry and clipping pipelines. That way we can profile how much of overhead is in the display list and other generic pieces of code and more important see which hot areas need some work. Thanks!  Kendall Bennett Chief Executive Officer SciTech Software, Inc. Phone: (530) 894 8400 http://www.scitechsoft.com ~ SciTech SNAP  The future of device driver technology! ~ 
From: SourceForge.net <noreply@so...>  20030304 01:16:39

Bugs item #696219, was opened at 20030302 13:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100003&aid=696219&group_id=3 Category: other Group: Compile/Install >Status: Closed Resolution: None Priority: 5 Submitted By: ChuoLing Chang (chuoling) Assigned to: Nobody/Anonymous (nobody) Summary: (Mesa 5.0) OSMesa 16/32bit buffer compile & runtime error Initial Comment: To compile Mesa5.0 for using 16/32bit color buffer under RedHat 7.1, I did the following as described in README: cd Mesa5.0/src make f Makefile.X11 clean make f Makefile.OSMesa16 linuxosmesa16 make f Makefile.X11 clean make f Makefile.OSMesa16 linuxosmesa32 However, there are compile errors. To successfully compile it, I have to do the following: remove the line "math/m_debug_vertex.c" in "Mesa5.0/src/Makefile.OSMesa16" remove the line "math/m_vertices.c" in "Mesa5.0/src/Makefile.OSMesa16" (These .c files do not exist) add the line "#define MEMSET16( DST, VAL, N ) _mesa_memset16(DST, VAL, N);" in "Mesa5.0/src/imports.h" (MEMSET16 is not defined in any place but frequently used) remove lines related to "mesa_profile" in "Mesa5.0/src/math/m_debug.h, m_debug_util.h, m_debug_clip.c, m_debug_norm.c" I then tested it with a simple code: OSMesaContext ctx = OSMesaCreateContextExt(OSMESA_RGBA, 16, 0, 0, NULL); GLushort* buffer = new GLushort[100 * 100 * 4]; OSMesaMakeCurrent(ctx, buffer, GL_UNSIGNED_SHORT, 100, 100); To avoid runtime errors, I have to do the following: remove the line "osmesa_register_swrast_functions( ctx );" in "OSMesaCreateContextExt()" in "Mesa5.0/src/OSmesa/osmesa.c" ("SWcontext *swrast = SWRAST_CONTEXT( ctx );" in "static void osmesa_register_swrast_functions( GLcontext *ctx )" gives a NULL) remove the line "tnl>Driver.RunPipeline = _tnl_run_pipeline;" in "OSMesa_update_state()" in "Mesa5.0/src/OSmesa/osmesa.c" ("TNLcontext *tnl = TNL_CONTEXT(ctx);" in "OSMesa_update_state()" gives a NULL)  Comment By: Brian Paul (brianp) Date: 20030302 19:48 Message: Logged In: YES user_id=983 Could you try the latest code out of CVS? All the makefile problems were fixed a while ago. I don't understand how the runtime erros could be happening. I haven't seen that problem here.  You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100003&aid=696219&group_id=3 