I'm running on a Windows Server x64 machine with 8G of RAM. I am experiencing an issue with my application in that it ramps up fast to 4.2G and then levels out for a while and then eventually crashes. I believe it is from STLport trying to increase it's memory pool but can't because the max available is 7G and the STLport allocator is trying to approximately double...which it can't.
I've been looking at the _alloc.h and _alloc.c files to better understand the _S_chunk_alloc(size_t _p_size,int& __nobjs) method. The following line in this function loks like where it approximately doubles the memory of the object array.
size_t __bytes_to_get = 2 * __total_bytes + _S_round_up(_S_heap_size >> 4);
So my question is, can I simply change the "2 *" to "1.5 *" to get stlport to use the full available 7G of RAM or is there a better option?
Any help is greatly appreciated.
Bogus question. Try to read something like 'Introduction into operational systems design'.
According your description you are using an old STLport version, have you try to use 5.1.0 ?
Internal memory pool is only used for small memory chunk, <= 128 bytes under Win32 and <= 256 bytes under Win64. Modifying code in node allocator implementation won't help you I think. If you think problem is in memory pool you can try to change allocation policy using macro settings in stlport/stl/config/host.h _STLP_USE_MALLOC or _STLP_USE_NEWALLOC, if modified, rebuild also STLport libs.
Actually, the problem is quite different.
As you pointed out, __bytes_to_get is dependant on _S_heap_size >> 4. However, even on x64 bits builds, this variable is still 32 bits when compiled under windows with VisualStudio!!! This has the effect of producing a negative value for __bytes_to_get and the allocation fails miserably and your application crash.
The impact of this bug is diminished since you should have more than 2gig worth of nodes in the optimized node allocator before this happen.
Also, there is another annoying STLport 'feature' in this optimized allocator. Nodes are never freed! So if you were to populate any _Rb_tree based container (like maps and sets) with millions of nodes, you would lock memory inside __node_alloc_impl and you would never be able to reclaim thoses bytes again!!! Of course, there is a performance and memory fragmentaiton reason behind this feature.. Still, this can be a huge resource hog when not properly documented and not used correctly!
On windows, _STLP_WIN32 is always defined because WIN32 is defined by the compiler enev for x64 builds. This make the value of stlp_atomic_t to be defined as a long (32 bits) like this in _thread.h:
#if defined (_STLP_WIN32) || defined (__sgi) || defined (_STLP_SPARC_SOLARIS_THREADS)
typedef long __stl_atomic_t;
typedef size_t __stl_atomic_t;
static _STLP_VOLATILE __stl_atomic_t _S_heap_size;
Should I submit a bug about _S_heap_size being 32 bits on 64 bits platform problem?
What is the best way to solve the issue of node memory being kept inside the node allocator ?
Can someone document this behavior somewhere? Maybe in a FAQ?
What version do you use to do your analysis ?
I had a look to this issue, _S_heap_size is 32 bits on 64 bits platform only when using lock free implementation of node allocator. You are right, lock free implementation is the default on Windows 64 bits platforms
Normally the fact that _S_heap_size roll when an important quantity of memory has been allocated should not be a problem. _S_heap_size is only here to know if it is better to allocate a little bit more of memory than required or really more. The real problem is that __stlp_chunk_malloc is throwing an exception. Node allocator implementation is rather inconsistent on this point, sometimes it is expecting __stlp_chunk_malloc to return 0 in case of memory starvation, sometimes it is expecting it to throw. I already made modification to have it behave like a classic malloc and explicitly throws a bad_alloc exception in node allocator implementation when there is nothing else to do. Simulation of the problem on _S_heap_size show that with this code the system do not crash anymore and allocator simply allocates smaller memory chunk.
The fact that the STLport memory pool is keeping the memory all along the application life time is well known. I will check that it is correctly documented.
Making _S_heap_size a 64 bits type might be rather impacting. Atomic operations hidden behind the macros like _STLP_ATOMIC_ADD are expecting a given size of __stl_atomic_t.
And yes, definitely, you can add an entry in our bug tracking system.
Thanks for your investigations.
I use version 5.1.3
Why is it 32 bits? Indeed InterlockedIncrement64/InterlockedExchange64 are available on Server 2003, 2008 and Vista but not on XP 64.. Sadly, this beast is always missing something.
I usually prefer not to throw when something is not really an exception. So rather than throwing a bad_alloc exception, we should allocate chunks of maximum value when _S_heap_size becomes negative (2gig >> 16 as defined in the code).
About reclaiming memory.. I would suggest to automatically use the __new_alloc when a certain amount of nodes has been allocated. This way, performances won't be affected when using maps and sets on small to medium sized collections and the program would be more memory friendly on large ones. Of course, this would complicate the deallocate process. It would have to check if the deleted node is comprised in a mempool chunk.
Another solution would be to change semantic of _S_heap_size.. Rather than counting the total allocated mempool size, it could keep the last allocated chunk size. And, for instance, you could grow this value by a factor of 1 + 1/16 which would mimic current behavior.
size_t __bytes_to_get = 2 * __total_bytes + _S_round_up(static_cast<size_t>(_S_grow_size * 1.0625));
_S_grow_size = __bytes_to_get;
Asking for new platform support is never something you got for free. As you say yourself, even on windows platforms situation is not clear, you have 64 bits atomic operations for some systems but not for others. This is why I prefer to get platform dependency as limited as possible to make STLport more easily portable to a new system.
I don't think your new proposition of computing __bytes_to_get is similar to the current one. _S_heap_size contains the total amount of memory allocated since the application has started. When it is big, like in the situation that reveal the bug, it can make the allocation really bigger (128 Mo when we have already allocated 2 Go).
I think I am simply going to remove it, at least for the lock free implementation. You see, STLport only tries to be not too bad in allocation policy, node allocator is simply here to make things better when native allocation is bad. But if you need a really good allocator then there are dedicated project for that, take for instance http://www.hoard.org/. There are also better platforms, Linux for instance is very good for that, this is why for this platform STLport do not use its own memory pool.
Adopted solution is here:
Thanks for the report.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.