From: John Peterson <peterson@cf...>  20040209 20:35:47

Daniel Dreyer writes: > I've seen you already changed the sources  Thanks a lot! > > Please help me then to clarify something: i think i picked it > up in the deal.ii library, that Horner is best w.r.t. > avoiding roundoff, simply because 0.1^5 = 1.e5, and > when you may substract this from something else of order 1.e1, > you may get orderofmagnitude cancellation. That is true. I see that Steffen (or somebody) already implemented the Horner scheme for the 1D routines. If anyone wants to do this for the 2D shapes I highly encourage (and pity!) them. By putting all the pow<> in there, I was not trying to imply that was the "correct" way to do things :) John 
From: Benjamin S. Kirk <benkirk@cf...>  20040208 00:54:21

Thanks for the prompt on this! Many of the changes in CVS simply changed "Copyright 20022003" with "20022004" However, there are a number of other functional changes as well. I guess I can put together an official release this week (comments from the devel list?). The biggest job in creating an "official" release is in testing the code on all the machines I have access to, and testing various configuration options. I'll try to release 0.4.2 this week if there are no complaints from the other developers. Ben Denis Danilov wrote: >Hi, > >According CVS i see many things have been changed in >comparison with libmesh0.4.1. Is there any plans about >next stable release of libmesh in near future? > >Best regards, >Denis > > > >The SF.Net email is sponsored by EclipseCon 2004 >Premiere Conference on Open Tools Development and Integration >See the breadth of Eclipse activity. February 35 in Anaheim, CA. >http://www.eclipsecon.org/osdn >_______________________________________________ >Libmeshusers mailing list >Libmeshusers@... >https://lists.sourceforge.net/lists/listinfo/libmeshusers > > 
From: Benjamin S. Kirk <benkirk@cf...>  20040209 13:44:38

Yup... that seems about right... The prototype for pow looks like: double pow(double, double); So there is no distinction between x^10 and x^pi, for instance... (Or if there is a distinction it is implemented in C with iftests, which is slow too. As John noted, we can do better assuming double^int with a recursive template that is unrolled at compiletime. We can do even better with a recursive template that handles repeated squaring, but for small powers (< ~10) I don't expect to see much difference... The only change I might make to the code below is a provision to handle negative powers... Maybe 2 templaes, one <int > that handles +/ powers, and one <unsigned int> optimized for positive powers? For negative powers just return the inverse of the positive power... Ben John Peterson wrote: >Benjamin S. Kirk writes: > > Help me out here, John... Do you remeber that C++ User's Journal on > > scientific programming? I think in near the end of that they describe a > > templated pow function that hadles x^int cases intelligently through > > repeated squaring... Does that ring a bell? It might be worth having > > in utility.h > >OK, I've done some testing. Apparently pow() is ridiculously slow. >I tested two different methods and pow in the attached code. Under >O3 optimization, I found the results on a P4 2GHz (in seconds): > >Raising 50 million double values to the third power. >Recursive Template: 0.05 >Template Loop: 1.93 >Standard pow(): 13.01 > >Raising to the fifth power, results were: >Recursive Template: 0.05 >Template Loop: 2.23 >Standard pow(): 14.17 > >Finally, raising to the 10th power (base numbers were small) >Recursive Template: 0.06 >Template Loop: 2.64 >Standard pow(): 13.56 > >The "Recursive Template" method is the method I posted to the mailing >list earlier today. It seems to be the fastest, and as expected, its >performance remains about constant no matter how high the exponent becomes. >Here is the code (don't forget lm for pow()) I invite you to test it >out yourselves... > >#include <iostream> >#include <vector> >#include <time.h> >#include <iomanip> > > >// Anonymous Namespace to hold useful constants >namespace { >const unsigned int n_bases = static_cast<unsigned int>(1e7); > >// Number of tests to perform >const unsigned int n_tests = 5; > >// Raise to an integer power >const unsigned int power = 10; >}; > > > >// Power raising class >template <int N> >struct Pow >{ > template <int Val> > struct In > { > enum { value = Val * Pow<N1>::template In<Val>::value }; > }; >}; > >template <> >struct Pow<0> >{ > template <int Val> > struct In > { > enum { value = 1 }; > }; >}; > > > >// Rescursive template function >template <int N> >double pow(double x) { return x * pow<N1>(x); } >template <> >double pow<0>(double x) { return 1.; } > > >// Templated Loop function >template <int N> >double pow2(double x) >{ > const unsigned int l = N / 2; > const unsigned int m = N % 2; > const double x2 = x*x; > double y = 1.0; > > for (unsigned int i=0; i<l; ++i) > { > y = y * x2; > } > > return m==0 ? y : y*x; >} > >// I can't figure out how to get the class to use >// double as its base value. Therefore, this method >// is not feasible now. >// void test_class(std::vector<double>& base_values) >// { >// for (unsigned int n=0; n<n_tests; ++n) >// { >// for (unsigned int i=0; i<n_bases; ++i) >// //std::cout << >// Pow<base_values[i]>::In<power> >// // << std::endl >// ; >// } >// } > >void test_recursive_template(std::vector<double>& base_values) >{ > for (unsigned int n=0; n<n_tests; ++n) > { > for (unsigned int i=0; i<n_bases; ++i) > //std::cout << > pow<power>(base_values[i]) > // << std::endl > ; > } > >} > >void test_template_loop(std::vector<double>& base_values) >{ > for (unsigned int n=0; n<n_tests; ++n) > { > for (unsigned int i=0; i<n_bases; ++i) > //std::cout << > pow2<power>(base_values[i]) > // << std::endl > ; > } >} > > > >void test_standard_pow(std::vector<double>& base_values) >{ > for (unsigned int n=0; n<n_tests; ++n) > { > for (unsigned int i=0; i<n_bases; ++i) > //std::cout << > pow(base_values[i], power) > // << std::endl > ; > } >} > > > > > >int main() >{ > srand(0); > > std::vector<double> base_values(n_bases); > > // Fill the vector with random values between zero and one. > for (unsigned int i=0; i<n_bases; ++i) > base_values[i] = static_cast<double>(rand()) / > static_cast<double>(RAND_MAX); > > // Convenient typedef for function pointers > typedef void (*TestFunction) (std::vector<double>& v); > > // A vector of function pointers for the functions we will test. > std::vector<TestFunction> tests(3); > tests[0] = test_recursive_template; > tests[1] = test_template_loop; > tests[2] = test_standard_pow; > > // A vector of convenient names > std::vector<std::string> test_names(3); > test_names[0] = "Recursive Template"; > test_names[1] = "Template Loop"; > test_names[2] = "Standard pow()"; > > for (unsigned int i=0; i<tests.size(); ++i) > { > clock_t start = clock(); > > tests[i](base_values); > > clock_t stop = clock(); > > const double time_taken = (stop  start)/double(CLOCKS_PER_SEC); > std::cout << test_names[i] << ": " > << std::setprecision(10) > << time_taken > << std::endl; > > } > > return 0; >} > > > > > >The SF.Net email is sponsored by EclipseCon 2004 >Premiere Conference on Open Tools Development and Integration >See the breadth of Eclipse activity. February 35 in Anaheim, CA. >http://www.eclipsecon.org/osdn >_______________________________________________ >Libmeshdevel mailing list >Libmeshdevel@... >https://lists.sourceforge.net/lists/listinfo/libmeshdevel > > 
From: John Peterson <peterson@cf...>  20040209 15:19:07

Benjamin S. Kirk writes: > Yup... that seems about right... > > The prototype for pow looks like: > > double pow(double, double); > > So there is no distinction between x^10 and x^pi, for instance... (Or if > there is a distinction it is implemented in C with iftests, which is > slow too. As John noted, we can do better assuming double^int with a > recursive template that is unrolled at compiletime. We can do even > better with a recursive template that handles repeated squaring, but for > small powers (< ~10) I don't expect to see much difference... > > The only change I might make to the code below is a provision to handle > negative powers... Maybe 2 templaes, one <int > that handles +/ > powers, and one <unsigned int> optimized for positive powers? > For negative powers just return the inverse of the positive power... One additional note if you are trying this at home. Add the inline keyword to both recursive template functions. It will not be nearly as fast if your compiler chooses not to inline those functions. John 
From: John Peterson <peterson@cf...>  20040209 20:35:47

Daniel Dreyer writes: > I've seen you already changed the sources  Thanks a lot! > > Please help me then to clarify something: i think i picked it > up in the deal.ii library, that Horner is best w.r.t. > avoiding roundoff, simply because 0.1^5 = 1.e5, and > when you may substract this from something else of order 1.e1, > you may get orderofmagnitude cancellation. That is true. I see that Steffen (or somebody) already implemented the Horner scheme for the 1D routines. If anyone wants to do this for the 2D shapes I highly encourage (and pity!) them. By putting all the pow<> in there, I was not trying to imply that was the "correct" way to do things :) John 
From: Steffen Petersen <steffen.petersen@tu...>  20040210 16:55:39

I have been away for a few days so I did not get any mail until today. I just commited szabab shapes with nicer representations (similar to what I did in 1D). Hendrik had already converted them last week but we did not commit the changes since they were not tested. Some pow<2>s are still in the code which I think should be ok. So the szabab elements should now be ready for the new release :) Steffen > Daniel Dreyer writes: > > I've seen you already changed the sources  Thanks a lot! > > > > Please help me then to clarify something: i think i picked > it > up in the deal.ii library, that Horner is best w.r.t. > > avoiding roundoff, simply because 0.1^5 = 1.e5, and > > when you may substract this from something else of order > 1.e1, > you may get orderofmagnitude cancellation. That is > true. I see that Steffen (or somebody) already implemented > the Horner scheme for the 1D routines. If anyone wants to do > this for the 2D shapes I highly encourage (and pity!) them. > By putting all the pow<> in there, I was not trying to imply > that was the "correct" way to do things :) > > John > > >  > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 35 in Anaheim, > CA. http://www.eclipsecon.org/osdn > _______________________________________________ > Libmeshdevel mailing list > Libmeshdevel@... > https://lists.sourceforge.net/lists/listinfo/libmeshdevel > 
From: Benjamin S. Kirk <benkirk@cf...>  20040208 22:08:39

I just changed the sqrt calls in the shape function evaluation to constants... As for pow... We might be able to do better than that. In particular, pow is not particularly smart & does not make optimizations where possible. Help me out here, John... Do you remeber that C++ User's Journal on scientific programming? I think in near the end of that they describe a templated pow function that hadles x^int cases intelligently through repeated squaring... Does that ring a bell? It might be worth having in utility.h In the mean time I have added inline pow2, pow3, pow4, pow5 functions in that file. Test it out and let me know if it is fastest (in optimized mode) than the old implementation. Of course, make sure the answers are still the same! Finally, I changed all the tests that look like (a != min(a,b)) to (a>b) I realize that the former is the test I put in the hierarchic implementation... What was I thinking? That code is *really* messy in 3D... I need to look at it again. Ben Daniel Dreyer wrote: > On Sat, 7 Feb 2004, Benjamin S. Kirk wrote: > > >>Thanks for the prompt on this! >> >>Many of the changes in CVS simply changed "Copyright 20022003" with >>"20022004" >> >>However, there are a number of other functional changes as well. I >>guess I can put together an official release this week (comments from >>the devel list?). The biggest job in creating an "official" release is >>in testing the code on all the machines I have access to, and testing >>various configuration options. >> >>I'll try to release 0.4.2 this week if there are no complaints from the >>other developers. >> >>Ben > > > > The SzaboBabuska shape functions are fully functional, and > they can already be used. However, they are not yet _truly_ > good: only the 1D shapes are coded in Horner scheme. > > The 2D shapes, honestly, aren't perfect (yet): many > calls to sqrt(), pow(x,n) with n even 5 or so... > That means: the 2D shapes (namely only on TRI6) may be prone > to cancellation issues, and very likely slow! > > I'd suggest to: > >  either change everything _prior_ to releasing 0.4.2, > namely all the sqrt to evaluated floats, and the > pow's to hornerlike. Since Steffen/Hendrik did these shapes, > i'd suggest they work on that. They have to say how long > this may take. > >  or postpone this to 0.4.3, and put some comment in the release text > that says something like "SZABAB is functional (compiles fine), however > not really finished". > > Either way, i suggest that the pows and sqrts should go sooner > or later, either before or after 0.4.2. > > >  Sorry for being that pedantic, but since libmesh is > open source, we better have an eye on the quality of > the code. > > Daniel 
From: John Peterson <peterson@cf...>  20040208 23:28:32

Benjamin S. Kirk writes: > Help me out here, John... Do you remeber that C++ User's Journal on > scientific programming? I think in near the end of that they describe a > templated pow function that hadles x^int cases intelligently through > repeated squaring... Does that ring a bell? It might be worth having > in utility.h It's tough to beat successive squaring to raise to an arbitrary integer power. I would be surprised if pow does not already do that. Those C++ User's Journals are up at school right now, I will take a look through them on Monday. This pair of functions can do it, but I don't if it's faster. Also I did not test it in a library, i.e. do you have to explicitly instantiate every version you want to use? template <int N> double pow(double x) { return x * pow<N1>(x); } template <> double pow<0>(double x) { return 1.; } int main() { std::cout << pow<3>(2.0) << std::endl; return 0; } John 
From: John Peterson <peterson@cf...>  20040209 04:19:36

Benjamin S. Kirk writes: > Help me out here, John... Do you remeber that C++ User's Journal on > scientific programming? I think in near the end of that they describe a > templated pow function that hadles x^int cases intelligently through > repeated squaring... Does that ring a bell? It might be worth having > in utility.h OK, I've done some testing. Apparently pow() is ridiculously slow. I tested two different methods and pow in the attached code. Under O3 optimization, I found the results on a P4 2GHz (in seconds): Raising 50 million double values to the third power. Recursive Template: 0.05 Template Loop: 1.93 Standard pow(): 13.01 Raising to the fifth power, results were: Recursive Template: 0.05 Template Loop: 2.23 Standard pow(): 14.17 Finally, raising to the 10th power (base numbers were small) Recursive Template: 0.06 Template Loop: 2.64 Standard pow(): 13.56 The "Recursive Template" method is the method I posted to the mailing list earlier today. It seems to be the fastest, and as expected, its performance remains about constant no matter how high the exponent becomes. Here is the code (don't forget lm for pow()) I invite you to test it out yourselves... #include <iostream> #include <vector> #include <time.h> #include <iomanip> // Anonymous Namespace to hold useful constants namespace { const unsigned int n_bases = static_cast<unsigned int>(1e7); // Number of tests to perform const unsigned int n_tests = 5; // Raise to an integer power const unsigned int power = 10; }; // Power raising class template <int N> struct Pow { template <int Val> struct In { enum { value = Val * Pow<N1>::template In<Val>::value }; }; }; template <> struct Pow<0> { template <int Val> struct In { enum { value = 1 }; }; }; // Rescursive template function template <int N> double pow(double x) { return x * pow<N1>(x); } template <> double pow<0>(double x) { return 1.; } // Templated Loop function template <int N> double pow2(double x) { const unsigned int l = N / 2; const unsigned int m = N % 2; const double x2 = x*x; double y = 1.0; for (unsigned int i=0; i<l; ++i) { y = y * x2; } return m==0 ? y : y*x; } // I can't figure out how to get the class to use // double as its base value. Therefore, this method // is not feasible now. // void test_class(std::vector<double>& base_values) // { // for (unsigned int n=0; n<n_tests; ++n) // { // for (unsigned int i=0; i<n_bases; ++i) // //std::cout << // Pow<base_values[i]>::In<power> // // << std::endl // ; // } // } void test_recursive_template(std::vector<double>& base_values) { for (unsigned int n=0; n<n_tests; ++n) { for (unsigned int i=0; i<n_bases; ++i) //std::cout << pow<power>(base_values[i]) // << std::endl ; } } void test_template_loop(std::vector<double>& base_values) { for (unsigned int n=0; n<n_tests; ++n) { for (unsigned int i=0; i<n_bases; ++i) //std::cout << pow2<power>(base_values[i]) // << std::endl ; } } void test_standard_pow(std::vector<double>& base_values) { for (unsigned int n=0; n<n_tests; ++n) { for (unsigned int i=0; i<n_bases; ++i) //std::cout << pow(base_values[i], power) // << std::endl ; } } int main() { srand(0); std::vector<double> base_values(n_bases); // Fill the vector with random values between zero and one. for (unsigned int i=0; i<n_bases; ++i) base_values[i] = static_cast<double>(rand()) / static_cast<double>(RAND_MAX); // Convenient typedef for function pointers typedef void (*TestFunction) (std::vector<double>& v); // A vector of function pointers for the functions we will test. std::vector<TestFunction> tests(3); tests[0] = test_recursive_template; tests[1] = test_template_loop; tests[2] = test_standard_pow; // A vector of convenient names std::vector<std::string> test_names(3); test_names[0] = "Recursive Template"; test_names[1] = "Template Loop"; test_names[2] = "Standard pow()"; for (unsigned int i=0; i<tests.size(); ++i) { clock_t start = clock(); tests[i](base_values); clock_t stop = clock(); const double time_taken = (stop  start)/double(CLOCKS_PER_SEC); std::cout << test_names[i] << ": " << std::setprecision(10) << time_taken << std::endl; } return 0; } 