Has anyone actually profiled SmallObj? In my initial test results (in VS2005 anyway), it seems to perform poorly compared to the default new/delete (at least, poorly compared to expectations). It could very well be that I am using the class wrong, but I expect that I am using it the way most folks likely would.
Here are my results:
[smallObj] Reported: 08:39:48
Method Name :: #Cycles tmElapsed
|-> ProfileNormalAlloc : # 10000 tm: 65.45
|-> ProfileNormalFree : # 10000 tm: 35.80
|-> ProfileSmallAlloc : # 10000 tm: 54.06
|-> ProfileSmallFree : # 10000 tm: 102.27
In these tests, I have two objects, identical except that one object inherits from SmallObj<> and the other does not. I create 10,000 instances of each and add them to a std::list before deleting each of the 10000. The "ProfileNormal..." results are from the objects that do not inherit from SmallObj, and the "ProfileSmall..." are the results from those that do.
The results are in milliseconds and are gathered with a high-performance timer (using QueryPerformanceCounter)
The code is straightforward;
void ProfileSmall(int nCount)
for (int n = 0; n < nCount; n++)
for (_i = list.begin(); _i != list.end(); _i++)
The test objects are defined as;
class mySmallObj : public Loki::SmallObject<>
Granted, there is likely some overhead from pushing the objects onto the list, but that overhead should be the same for both objects. (and further testing without a list shows this to be true). It seems the allocation is faster for smallObj, but that deleting them is significantly slower.
One of Loki's test projects does a runtime comparison between Loki's SmallObject allocator and the global new & delete allocators. We've known the performance is not as fast as the global new and delete most of the time.
I've made a small change to Loki's SmallObject allocator which makes a big difference in performance. The change is in the FixedAllocator::Deallocate function. Instead of searching through the Chunk list almost every time, it checks if either of its two pointers into the Chunk list have the address to be deallocated. This changes the performance from a linear search into a constant time operation most of the time. Occasionally, it still has to search the list, but not as often so the overall performance is better.
Feel free to redo your tests after getting the latest Loki source code from subversion. I'd like to see how well your tests compare after the fix.
I know it's little old topic, but although I'm no expert, I have one note: did you reserved memory in the list? (list.reserve(10000)) Allocating memory during adding pointers makes test less accurate. In fact, in my opinion, best thing would be make c-like array on the stack while it's testing only (and code is not used anywhere else).