Thread: [GD-General] RE: [Algorithms] C style math libraries versus "proper" C++ maths
Brought to you by:
vexxed72
From: Stephane E. <set...@hi...> - 2006-03-25 22:00:59
|
To do a fair comparison, we would need to see your C implementation. You implemented operator+, operator*, etc but these return a temporary which may or may not be optimized out. The best way to find out is to look at the disassembly. Also, different compiler may do different things. If your C version usage does not need any temporaries, then it will most likely be faster indeed. However if u use operator+=3D or operator *=3D, = you should definitely get similar performances. Also never profile debug builds. Even functions marked inline will not be inlined in debug builds for example. =20 Since C99 support the inline keyword, you should not see any difference between a C and C++ implementation. As Jarkko said, beware of exceptions. They can easily make your code a lot slower. =20 I have a few suggestions regarding your code. They will not improve execution speed but you wrote several things that are unnecessary: =20 - the inline keyword is not necessary for inlined method inside the class body (less to read means it is also easier to read) - Initializer list are only useful when the default constructor of constructed objects actually does something. Your use of initializer list here does not buy you anything. - Prefer C style functions over C++ member methods. See Scott Meyer, "How non member functions improve encapsulation" for more info. - No need to have a default destructor that does not do anything for non virtual classes. - No need to implement the default copy operator unless you need to do something special. In your case, the default copy constructors and operators provided by the compiler will do just fine. - Your use of statics to implement zero vector, etc could mess with the cache on platforms such the Xbox 360 and the PS3. =20 Understand your code. Look at the disassembly and profile over an over again to understand why something is slow. =20 Stephan. =20 -----Original Message----- From: gda...@li... [mailto:gda...@li...] On Behalf Of Richard Fabian Sent: Saturday, March 25, 2006 5:40 AM To: gda...@li... Subject: [Algorithms] C style math libraries versus "proper" C++ maths =20 We've always used a math library that has a function per operation you want to apply to each math construct. We do this because "back in the day" we found that C++ style math libraries lead to lots of unecessary destructors being made... at least in debug, but we suspected in release also.=20 its been a few years, two major microsoft compilers later, a new SN compiler for PS2, and GCC still getting better and better... so, after hearing a workmate claim that he was using a C++ math library "because it was faster" i decided to update my old C++ math library and make myself a profiling testbed.=20 The sad news is: my old and horrible to read and work with function based math library still outstrips the C++ math library by about a 5% speed margin, even on trivial stuff. I'm not a C++ newb, i do know about constructor return optimisation, operator overloads copy constructors and the like, but as i tend to write code that is more architectural than the low level stuff like math libraries, i wondered what kind of performance difference (and which way) you were seeing.=20 neither math library uses any platform specific optimisations, such as SSE, SIMD, VU. This is pure C++ only. I have pasted my C++ math "header" that i was using for profiling if you want to take a look at my code and maybe point me in a direction of betterly optimised code.. http://rafb.net/paste/results/P2kY2O59.html=20 =20 |
From: Richard F. <ra...@de...> - 2006-03-26 12:34:31
|
> Regarding profiling, you need to be very careful to get reliable profilin= g > numbers. For instance data/code alignment, cache warmups (i.e. the first > test read data over the bus while next one has advantage of reading it fr= om > cache), systematicly executed background tasks while testing (e.g. some H= D > activity occuring by OS always after 1sec of testing), etc. can easily ca= use > 5% performance difference between tests. Also make sure you profile in fu= lly > optimized release version and disable C++ exception handling as it can > potentially hinder some of the inlining and result in other overheads. > > Jarkko > I agree that profiling is difficult to get right normally, but i think my method is safe: 1. write the test as a framework with the timing stuff built in. 2. write the implementation of the test as two seperate modules, one using one technique, the other using the other technique 3. test the apps one after the other giving them a few runs then giving the other a few runs, store all the runs sub timings. this gives me a good idea about what i can expect as best performance, wors= t performance, and the average case too. oddly, i found that the C++ version of the math library has a smaller standard deviation on the overall time to complete the excercise than the C style lib, I guess i'll put that down to there being more instructions made from the C++ version, but less access to memory... so, i'm not holding my breath for fantastic performance on PS2 where number of instructions being sent to the CPU is important still. |
From: Andras B. <and...@gm...> - 2006-03-25 22:26:35
|
With regards to inlining, I believe that the inline keyword is merely a suggestion to the compiler. I've found that some compilers might not inline your code even in optimized release builds, when it's explicitly specified as inline. If you really know that a function should be inlined, you might want to use _forceinline or something similar. One good example is when your function is complicated, but given a constant input, it evaluates to a constant result, with no side-effects (in my case this was a function computing perlin noise). Using the inline keyword, the compiler made the function call, running an expensive function. With __forceinline, it could actually compute the result at compile time and get rid of the function call altogether! That said, be very careful, when using __forceinline! Andras Saturday, March 25, 2006, 3:00:55 PM, you wrote: > To do a fair comparison, we would need tosee your C > implementation. You implemented operator+, operator*, etc but > thesereturn a temporary which may or may not be optimized out. The > best way to findout is to look at the disassembly. Also, different > compiler may do differentthings. If your C version usage does not > need any temporaries, then it will most likely befaster indeed. > However if u use operator+= or operator *=, you should definitely > get similarperformances. Also never profile debug builds. Even > functions marked inlinewill not be inlined in debug builds for > example. > > Since C99 support the inline keyword, youshould not see any > difference between a C and C++ implementation. As Jarkkosaid, beware > of exceptions. They can easily make your code a lot slower. > > I have a few suggestions regarding yourcode. They will not > improve execution speed but you wrote several things thatare > unnecessary: > > - the inline keyword is not necessary for inlined method > inside theclass body (less to read means it is also easier to read) > - Initializer list are only useful when the default > constructor of constructedobjects actually does something. Your use > of initializer list here doesnot buy you anything. > - Prefer C style functions over C++ member methods. See > Scott Meyer,How non member functions improve encapsulation for > more info. > - No need to have a default destructor that does not do anything fornon virtual classes. > - No need to implement the default copy operator unless > you need todo something special. In your case, the default copy > constructors and operatorsprovided by the compiler will do just fine. > - Your use of statics to implement zero vector, etc > could mess with the cache on platformssuch the Xbox 360 and the PS3. > > Understand your code. Look at thedisassembly and profile over an > over again to understand why something is slow. > > Stephan. > > -----Original Message----- > From: > gda...@li...[mailto:gda...@li...]On > Behalf Of Richard Fabian > Sent:Saturday, March 25, 20065:40 AM > To:gda...@li... > Subject: [Algorithms] Cstyle math libraries versus "proper" C++ maths > > We've always used a math library that has a functionper operation > you want to apply to each math construct. We do this because"back in > the day" we found that C++ style math libraries lead to lotsof > unecessary destructors being made... at least in debug, but we > suspected inrelease also. > its been a few years, two major microsoft compilers later, a > newSN compiler for PS2, and GCC still getting better and better... > so, after hearing a workmate claim that he was using a C++ math > library"because it was faster" i decided to update my old C++ math > libraryand make myself a profiling testbed. > The sad news is: my old and horrible to read and work with > function based mathlibrary still outstrips the C++ math library by > about a 5% speed margin, evenon trivial stuff. > I'm not a C++ newb, i do know about constructor return > optimisation, operatoroverloads copy constructors and the like, but > as i tend to write code that ismore architectural than the low level > stuff like math libraries, i wonderedwhat kind of performance > difference (and which way) you were seeing. > neither math library uses any platform specific optimisations, > such as SSE,SIMD, VU. This is pure C++ only. > I have pasted my C++ math "header" that i was using for profiling > ifyou want to take a look at my code and maybe point me in a > direction of betterlyoptimised code.. > http://rafb.net/paste/results/P2kY2O59.html > |
From: Scoubidou944 \(Hotmail\) <sco...@ho...> - 2006-03-25 22:26:36
|
I remember some passed days in game programming. 3D matrix operations and its implementation on PIII 900MHz. a class containing an array of float, something very simple. And for basic operation, you can pass on reference (using & keyword) or = pointer (* keyword) or pass directly the array (so all floats are placed= into stack. Case 1 & Case 2 : only 4 bytes on the stack but all computing is using a= deference (-> operator) Case 3 : for 4x4 array (x,y,z,w), 16 floats so 16 x 4 =3D 64 bytes per m= atrix. At this time the fastest method was the 3. So better code isn't always the faster one ;) ----- Original Message ----- From: Stephane Etienne To: gam...@li... Sent: Saturday, March 25, 2006 11:00 PM Subject: [GD-General] RE: [Algorithms] C style math libraries versus "= proper" C++ maths To do a fair comparison, we would need to see your C implementation. Y= ou implemented operator+, operator*, etc but these return a temporary wh= ich may or may not be optimized out. The best way to find out is to look= at the disassembly. Also, different compiler may do different things. If your C version usage does not need a= ny temporaries, then it will most likely be faster indeed. However if u = use operator+=3D or operator *=3D, you should definitely get similar per= formances. Also never profile debug builds. Even functions marked inline will not be inlined in debug builds for e= xample. Since C99 support the inline keyword, you should not see any differenc= e between a C and C++ implementation. As Jarkko said, beware of exceptio= ns. They can easily make your code a lot slower. I have a few suggestions regarding your code. They will not improve ex= ecution speed but you wrote several things that are unnecessary: - the inline keyword is not necessary for inlined method insi= de the class body (less to read means it is also easier to read) - Initializer list are only useful when the default construct= or of constructed objects actually does something. Your use of initializ= er list here does not buy you anything. - Prefer C style functions over C++ member methods. See Scott= Meyer, "How non member functions improve encapsulation" for more info. - No need to have a default destructor that does not do anyth= ing for non virtual classes. - No need to implement the default copy operator unless you n= eed to do something special. In your case, the default copy constructors= and operators provided by the compiler will do just fine. - Your use of statics to implement zero vector, etc could mes= s with the cache on platforms such the Xbox 360 and the PS3. Understand your code. Look at the disassembly and profile over an over= again to understand why something is slow. Stephan. -----Original Message----- From: gda...@li... [mailto:gdalgorith= ms-...@li...] On Behalf Of Richard Fabian Sent: Saturday, March 25, 2006 5:40 AM To: gda...@li... Subject: [Algorithms] C style math libraries versus "proper" C++ maths We've always used a math library that has a function per operation you= want to apply to each math construct. We do this because "back in the d= ay" we found that C++ style math libraries lead to lots of unecessary de= structors being made... at least in debug, but we suspected in release also. its been a few years, two major microsoft compilers later, a new SN co= mpiler for PS2, and GCC still getting better and better... so, after hearing a workmate claim that he was using a C++ math librar= y "because it was faster" i decided to update my old C++ math library an= d make myself a profiling testbed. The sad news is: my old and horrible to read and work with function ba= sed math library still outstrips the C++ math library by about a 5% spee= d margin, even on trivial stuff. I'm not a C++ newb, i do know about constructor return optimisation, o= perator overloads copy constructors and the like, but as i tend to write= code that is more architectural than the low level stuff like math libr= aries, i wondered what kind of performance difference (and which way) you were seeing. neither math library uses any platform specific optimisations, such as= SSE, SIMD, VU. This is pure C++ only. I have pasted my C++ math "header" that i was using for profiling if y= ou want to take a look at my code and maybe point me in a direction of b= etterly optimised code.. http://rafb.net/paste/results/P2kY2O59.html |
From: Andras B. <and...@gm...> - 2006-03-25 22:43:49
|
Isn't it fun when theory meets practice? :) With a proper compiler, if you used references, and the function was inlined, then the function should just work straight on the values, no dereferencing, no copying to the stack, etc.. However, last time I checked, this was still not the case, so I still pass all my vectors and matrices by value, rather than reference. Note that if the function is inlined, then it shouldn't do the copy to stack in any case.. So my advice would be to simply pass by value. Andras Saturday, March 25, 2006, 3:26:53 PM, you wrote: > I remember some passed days in gameprogramming. > > 3D matrix operations and its implementation on PIII900MHz. > a class containing an array of float, somethingvery simple. > And for basic operation, you can pass on reference(using > keyword) or pointer (* keyword) or pass directly the array (so > allfloats are placed into stack. > Case 1 Case 2 : only 4 bytes on the stack butall computing is using a deference (-> operator) > Case 3 : for 4x4 array (x,y,z,w), 16 floats so 16 x4 = 64 bytes per matrix. > > At this time the fastest method was the3. > > So better code isn't always the faster one;) > |
From: Richard F. <ra...@de...> - 2006-03-26 12:22:27
|
Thanks for the help, the C++ library is now marginally faster than the C library, and now i can approve it for general use at our company. I'm gonna love being able to change my code from: Vector relativeStart, lineDirection; VectorSubtract( &relativeStart, pos, lineStart ); VectorSubtract( &lineDirection, lineEnd, lineStart ); F32 lineLength =3D VectorLength( lineDirection ); VectorMultiply( &lineDirection, 1.0f / lineLength ); to: Vec3 relativeStart( pos - lineStart ), lineDirection( lineEnd - lineStart ); F32 lineLength =3D lineDirection.GetLength(); lineDirection /=3D lineLength; thanks again guys |