For memory intensive application, with multiple threads, fastmm does not scale very well: no 2x improvement if application is run on dual core instead of single core. This is because of a single point / hotspot in the code:
function FastGetMem(ASize: Integer): Pointer;
...
{Try to lock the small block type}
if LockCmpxchg(0, 1, @LPSmallBlockType.BlockTypeLocked) = 0 then
Break;
...
end;
Such a locking is bad for scaling, read some interesting blogs about this:
http://www.bluebytesoftware.com/blog/2009/01/09/SomePerformanceImplicationsOfCASOperations.aspx
http://www.bluebytesoftware.com/blog/2009/01/13/SomePerformanceImplicationsOfCASOperationsRedux.aspx
I made a quick fix to give each thread it's own memory pool, so no need for locking anymore. Result: scales almost perfectly and much faster!
1. Use additional thread index slot to assign its own memory pool
2. Always create memory from its own pool
3. When release memory, if it is not found in its own pool, search global pool list to release it. To minimized the search, use the few low order bit to identify it pool/pool group
I'm pretty sure this is not the problem at all. The problem is in Sleep, which is pretty bad. It has an option to spin instead of sleeping, which is good, but better would be an option to spin, and eventually wait for an event. This is what a critical section does. But sleeping.. is very bad.
Any locking is bad, regardless of sleep or spinning.
I use TopMM for multicore apps, scales much better:
http://www.topsoftwaresite.nl/
I often hear people who are ready to pay to have a version of fastMM that work fast on multithread. I m also ready to pay for that ... i know it's require lot of job to do it, but if you have an idea of the cost, maybe i can open a kickstarter project to try to get the fund ?
Btw: I made my own MM with almost perfect scaling: ScaleMM
https://github.com/andremussche/scalemm
yes i know ScaleMM, but is it really stable? for exemple just trying to compile under xe4 and i have a warning :( https://github.com/andremussche/scalemm/issues/23
warning is fixed.
scalemm2 is used in production by some users, but there is in some circumstances a "live leak" when you have a lot of interthread memory: so when you allocate a lot of string/objects/arrays in thread 1 and pass these to thread 2 (for calculation etc) and free them in thread 2: the memory belongs to thread 1 but when thread 1 is not active it won't release the memory
yes scalemm is much much more faster than fastmm on multithread scenario ! where fastmm use only around 25% of all available CPU with scaleMM it's use 100% of all the cpu and it's 5x more faster. i decided to use it in production server